JHistint.jl - Julia Histopathology Interface

Julia interface for implementing the REST APIs available on the Cancer Slide Digital Archive (CDSA) portal for downloading histological images available in The Cancer Genome Atlas (TCGA). The Cancer Slide Digital Archive (CDSA) is a web platform for support, sharing, and analysis of digital pathological data. Currently, it hosts over 23,000 images associated with the data available on "The Cancer Genome Atlas" Data Portal. The library includes functions for managing image-processing algorithms for cellular and nuclei segmentation, constructing graph and the corresponding adjacency matrix, building tessellation and interfacing with J-Space.jl package to simulate the spatial growth and the genomic evolution of a cell population and the experiment of sequencing the genome of the sampled cells.

Link GitHub repository: JHistint.jl

Link GitHub repository, avaiable on spatial-input branch: J-Space.jl

CDSA Portal: Click Here

Repository containing the data mapped in the portal: Click Here

Guide to using the APIs: Click Here

Package Structure

  • The case and collection folders store metadata in .json format for individual cases and collections available on the TCGA Data Portal. The collection folder is structured as follows:
    • collectionlist.json = Stores access data (metadata) for collections (Projects in TCGA).
    • collection_name.json = Stores access data (metadata) for a single collection. The .json file is generated based on the collection chosen by the user.
  • The case folder is structured as follows:
    • collection_name.json = Stores all metadata related to cases associated with the collection selected by the user.
  • The slides folder stores histological images related to individual cases. The images are organized based on collection (TCGA-chol, TCGA-esca, etc.), and the individual case being analyzed (TCGA-2H-A9GF, TCGA-2H-A9GG, etc.). Within each folder related to the case, the slides are stored in compressed .zip files. The format of each individual slide is .tif. The folder names related to the cases correspond to the values of the Case ID field listed in the TCGA Data Portal. The names of the .zip files located in each folder refer to the Sample ID attribute associated with the patient. The slide name is given by concatenating the Slide ID and Slide UUID attributes that can be found in the lower section of the web page dedicated to the generic case TCGA-XX-YYYY.
Example: TCGA-02-0001-01C-01-TS1.zip  
  - 02 = refers to the TSS (Tissue Source Site).  
  - 0001 = refers to the code associated with the Participant, an alphanumeric string.  
  - 01 = refers to the Sample Type. The values associated with tumor samples are in the range 01-09. 10-19 indicates the range for non-diseased normal samples. 20-29 indicates samples currently under control.  
  - C = refers to the Vial field related to the ordering of the sample in the sample sequence. Values range from A-Z.  
  - 01 = refers to the Portion field related to the ordering of the analyzed portions associated with a sample. It takes values in the range 01-99.  
  - TS1 = refers to the Slide field related to the type of image. The values that can be assumed are TS (Top Slide), BS (Bottom Slide), and MS (Middle Slide). The alphanumeric value indicates the slide ordering.

JHistint Collections

The available collections are:

  • TCGA-BRCA = Breast Invasive Carcinoma (Breast)
  • TCGA-OV = Ovarian Serous Cystadenocarcinoma (Ovary)
  • TCGA-LUAD = Lung Adenocarcinoma (Bronchus and Lung)
  • TCGA-UCEC = Uterine Corpus Endometrial Carcinoma (Corpus uteri)
  • TCGA-GBM = Glioblastoma Multiforme (Brain)
  • TCGA-HSNC = Head and Neck Squamous Cell Carcinoma (Larynx, Lip, Tonsil, Gum, Other and unspecified parths of mouth)
  • TCGA-KIRC = Kidney Renal Clear Cell Carcinoma (Kidney)
  • TCGA-LGG = Brain Lower Grade Glioma (Brain)
  • TCGA-LUSC = Lung Squamous Cell Carcinoma (Bronchus and lung)
  • TCGA-TCHA = Thyroid Carcinoma (Thyroid gland)
  • TCGA-PRAD = Prostate Adenocarcinoma (Prostate gland)
  • TCGA-SKCM = Skin Cutaneous Melanoma (Skin)
  • TCGA-COAD = Colon Adenocarcinoma (Colon)
  • TCGA-STAD = Stomach Adenocarcinoma (Stomach)
  • TCGA-BLCA = Bladder Urothelial Carcinoma (Bladder)
  • TCGA-LIHC = Liver Hepatocellular Carcinoma (Liver and intrahepatic bile ducts)
  • TCGA-CESC = Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (Cervix uteri)
  • TCGA-KIRP = Kidney Renal Papillary Cell Carcinoma (Kidney)
  • TCGA-SARC = Sarcoma (Various)
  • TCGA-ESCA = Esophageal Carcinoma (Esophagus)
  • TCGA-PAAD = Pancreatic Adenocarcinoma (Pancreas)
  • TCGA-READ = Rectum Adenocarcinoma (Rectum)
  • TCGA-PCPG = Pheochromocytoma and Paraganglioma (Adrenal gland)
  • TCGA-TGCT = Testicular Germ Cell Tumors (Testis)
  • TCGA-THYM = Thymoma (Thymus)
  • TCGA-ACC = Adrenocortical Carcinoma -Adenomas and Adenocarcinomas (Adrenal gland)
  • TCGA-MESO = Mesothelioma (Heart, mediastinum and pleura)
  • TCGA-UVM = Uveal Melanoma (Eye and adnexa)
  • TCGA-KICH = Kidney Chromophobe (Kidney)
  • TCGA-UCS = Uterine Carcinosarcoma (Uterus, NOS)
  • TCGA-CHOL = Cholangiocarcinoma (Liver and intrahepatic bile ducts, Other and unspecified part of biliary track)
  • TCGA-DLBC = Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (Various)

To download a specific collection, just indicate the name of the collection: BRCA, OV, LUAD.

Package Installation

  • Step 1 - Install J-Space from spatial-input branch:
(@v1.8) pkg > add https://github.com/BIMIB-DISCo/J-Space.jl.git#spatial-input
  • Step 2 - Install JHistint from GitHub Repository:
(@v1.8) pkg > add https://github.com/niccolo99mandelli/JHistint.jl.git
julia > using JHistint

Package Installation Julia Registries (In Progress)

The JHistint package is available in the Julia Registries and can be installed as follows:

julia > using Pkg
julia > Pkg.add("JHistint")
julia > using JHistint

Otherwise, type ] in the Julia REPL and execute:

(@v1.8) pkg > add JHistint
julia > using JHistint

Download Slides Main Functions (JHistint.jl)

JHistint.download_single_collectionMethod
download_single_collection(collection_name::AbstractString)

Function for downloading histological slides associated with a collection available in TCGA.

Arguments

  • collection_name::AbstractString = Collection of TCGA data to download

the histological slides.

Notes

The function evaluates the collection_name argument, and in case of an invalid collection, considers the configuration in the Config.toml file. The value set in the package is default.

# Examples with valid input
julia> JHistint.download_single_collection("acc")
julia> JHistint.download_single_collection("bLca")
# Examples with invalid input
julia> JHistint.download_single_collection("ac")
julia> JHistint.download_single_collection("")
source
JHistint.download_all_collectionMethod
download_all_collection()

Function for downloading histological slides associated with all collections available in TCGA.

# Examples with valid input
julia> JHistint.download_all_collection()
source

Download Slides SOPHYSM Functions (JHistint.jl)

JHistint.download_single_collection_SOPHYSMMethod
download_single_collection_SOPHYSM(collection_name::AbstractString, path_to_save::AbstractString)

Function for downloading histological slides in SOPYHSM_app associated with a collection available in TCGA.

Arguments

  • collection_name::AbstractString = Collection of TCGA data to download the

histological slides.

  • path_to_save::AbstractString = Local folder path for saving

histological slides.

Notes

The function evaluates the collection_name argument, and in case of an invalid collection, considers the configuration in the Config.toml file. The value set in the package is default.

# Examples with valid input
julia> JHistint.download_single_collection_SOPHYSM("acc", "C:\...")
julia> JHistint.download_single_collection_SOPHYSM("bLca", "C:\...")
# Examples with invalid input
julia> JHistint.download_single_collection_SOPHYSM("ac", "C:\...")
julia> JHistint.download_single_collection_SOPHYSM("", "C:\...")
source
JHistint.download_all_collection_SOPHYSMMethod
download_all_collection_SOPHYSM(path_to_save::AbstractString)

Function for downloading histological slides associated with all collections available in TCGA.

Arguments

  • path_to_save::AbstractString = Local folder path for saving

histological slides.

# Examples with valid input
julia> JHistint.download_all_collection_SOPHYSM("C:\...")
source

Cell Segmentation Slides Main Functions (JHistint.jl)

JHistint.slide_cell_segmentation_without_downloadMethod
slide_cell_segmentation_without_download(collection_name::AbstractString)

Function for performing cell segmentation on histopathological slides present in the JHistint_DB database associated with the collection name provided as an argument. After generating the segmented slide, the function proceeds with constructing and saving the corresponding graph and adjacency matrix.

Arguments

  • collection_name::AbstractString = Collection of TCGA data to download

the histological slides.

Notes

The function utilizes the JHistint_DB database for performing cell segmentation on the histopathological slides associated with the provided collection name. It generates a segmented slide and constructs a corresponding graph and adjacency matrix. The output files are saved in a user-defined directory. The function may take a considerable amount of time to complete, depending on the size of the slides and the complexity of the segmentation algorithm. For each slide in the database, cell segmentation is performed using the apply_segmentation_without_download function, and the path where the result is saved is stored in the database using the load_seg_slide function. The segmentation process is defined in 4 steps:

  • LOAD SLIDE ... (slide_id)
  • APPLY SEGMENTATION ... (slide_id)
  • BUILD GRAPH ... (slide_id)
  • BUILD & SAVE ADJACENCY MATRIX ... (slide_id)
  • J-SPACE features ... (slide_id)

The adjacency matrix is saved in the same directory as the original image in text format. Finally, a confirmation message is printed for each segmented slide. Unlike the slide_cell_segmentation_with_download function, this function does not involve the creation and download of the segmented image.

# Examples with valid input
julia> JHistint.slide_cell_segmentation_without_download("acc")
julia> JHistint.slide_cell_segmentation_without_download("bLca")
# Examples with invalid input
julia> JHistint.slide_cell_segmentation_without_download("ac")
julia> JHistint.slide_cell_segmentation_without_download("")
source
JHistint.slide_cell_segmentation_with_downloadMethod
slide_cell_segmentation_with_download(collection_name::AbstractString)

Function for performing cell segmentation on histopathological slides present in the JHistint_DB database associated with the collection name provided as an argument. The function downloads the segmented slide, which is placed in the same directory as the original slide. After generating the segmented slide, the function proceeds with constructing and saving the corresponding graph and adjacency matrix.

Arguments

  • collection_name::AbstractString = TCGA data collection for which to

perform cell segmentation.

Notes

The function utilizes the JHistint_DB database for performing cell segmentation on the histopathological slides associated with the provided collection name. It generates a segmented slide and constructs a corresponding graph and adjacency matrix. The output files are saved in a user-defined directory. The function may take a considerable amount of time to complete, depending on the size of the slides and the complexity of the segmentation algorithm. For each slide in the database, cell segmentation is performed using the apply_segmentation_with_download function, and the path where the result is saved is stored in the database using the load_seg_slide function. The segmentation process is similar to that described in the slide_cell_segmentation_without_download function, with the added step of downloading the segmented image and placing it in the same directory as the original slide. The segmentation process is defined in 6 steps:

  • LOAD SLIDE ... (slide_id)
  • APPLY SEGMENTATION ... (slide_id)
  • BUILD SEGMENTED SLIDE ... (slide_id)
  • BUILD GRAPH ... (slide_id)
  • BUILD & SAVE ADJACENCY MATRIX ... (slide_id)
  • SAVE SEGMENTED SLIDE ... (slide_id)
  • J-SPACE features ... (slide_id)

The adjacency matrix is saved in text format in the same directory as both the original and segmented images. Finally, a confirmation message is printed for each segmented slide.

# Examples with valid input
julia> JHistint.slide_cell_segmentation_with_download("acc")
julia> JHistint.slide_cell_segmentation_with_download("bLca")
# Examples with invalid input
julia> JHistint.slide_cell_segmentation_with_download("ac")
julia> JHistint.slide_cell_segmentation_with_download("")
source

SOPHYSM Main Function (JHistint.jl)

JHistint.start_segmentation_SOPHYSM_tessellationMethod
start_segmentation_SOPHYSM_tessellation(filepath_input::AbstractString,
                                filepath_output::AbstractString,
                                thresholdGray::Float64,
                                thresholdMarker::Float64,
                                min_threshold::Float32,
                                max_threshold::Float32)

Initiates the SOPHYSM segmentation process for histological image using tessellation process.

Arguments

  • filepath_input::AbstractString: The file path to the input histological

image to be segmented.

  • filepath_output::AbstractString: The file path where the segmented results

and related data will be saved.

  • thresholdGray::Float64: The grayscale threshold used for initial

image processing.

  • thresholdMarker::Float64: The marker threshold for identifying

cellular structures.

  • min_threshold: Minimal threshold for considering segments area.
  • max_threshold: Maximal threshold for considering segments area.
source
JHistint.start_segmentation_SOPHYSM_graphMethod
start_segmentation_SOPHYSM_graph(filepath_input::AbstractString,
                                filepath_output::AbstractString,
                                thresholdGray::Float64,
                                thresholdMarker::Float64,
                                min_threshold::Float32,
                                max_threshold::Float32)

Initiates the SOPHYSM segmentation process for histological image using graph construction.

Arguments

  • filepath_input::AbstractString: The file path to the input histological

image to be segmented.

  • filepath_output::AbstractString: The file path where the segmented results

and related data will be saved.

  • thresholdGray::Float64: The grayscale threshold used for initial

image processing.

  • thresholdMarker::Float64: The marker threshold for identifying

cellular structures.

  • min_threshold: Minimal threshold for considering segments area.
  • max_threshold: Maximal threshold for considering segments area.
source

Support Functions for Cell Segmentation (segmentationManager.jl)

JHistint.apply_segmentation_without_downloadMethod
apply_segmentation_without_download(slide_info::Tuple{String, Vector{UInt8}, String})

The function performs the segmentation of a histological image, generates its corresponding graph, and translates it into a symmetric adjacency matrix with only 0s and 1s. Define, also, the dataframe associated with labels and edges.

Arguments

  • slide_info::Tuple{String, Vector{UInt8}, String}: A tuple containing the

slide ID, the image obtained from the DB, and the path of the original image file.

Return values

  • filepath_matrix: The path where the adjacency matrix is stored in .txt format.
  • matrix: The adjacency matrix constructed from the segmentation.

Notes

The function uses the watershed segmentation algorithm to segment the image into different groups of pixels. Segmentation is performed using a feature transformation of the image (feature_transform) and labeling of connected components. The distance between the different regions is then calculated, and an adjacency graph of the regions is constructed using the region_adjacency_graph function. The resulting graph is then transformed into an Int adjacency matrix using the weighted_graph_to_adjacency_matrix function and saved to the path of the original image.

source
JHistint.apply_segmentation_with_downloadMethod
apply_segmentation_with_download(slide_info::Tuple{String, Vector{UInt8}, String})

The function performs segmentation of a histological image, saves the segmented image in .png format, generates the corresponding graph, and translates it into an adjacency matrix. Define, also, the dataframe associated with labels and edges.

Arguments

  • slide_info::Tuple{String, Vector{UInt8}, String}: Tuple containing the

slide ID, the image itself obtained from the DB, and the path of the original image file.

Return values

  • filepath_seg: The path where the segmented image is stored in .tif format.
  • filepath_matrix: The path where the graph is stored in .txt format.
  • matrix: The adjacency matrix constructed from the segmentation.

Notes

The function uses the watershed segmentation algorithm to segment the image into different groups of pixels. Segmentation is performed using an image feature transformation (feature_transform) and connected component labeling. The distance between different regions is then calculated, and an adjacency graph of the regions is constructed using the region_adjacency_graph function. The obtained graph is transformed into an adjacency matrix using the weighted_graph_to_adjacency_matrix function, which is saved in the path of the original image. Finally, a segmented .png image is saved, and the path of the segmented slide file is returned. It performs also the visualizations of the graph with vertices and edges saved as "graphvertex.png" and "graphedges.png" in the output directory.

source
JHistint.get_random_colorMethod
get_random_color(seed)

Function to return a random 8-bit RGB format color, using a specified seed.

Arguments

  • seed: An integer used to initialize the random number generator.

If two calls to the function use the same seed, the same color will be generated.

Return value

The function returns a random 8-bit RGB format color.

source

SOPHYSM Support Function for Cell Segmentation (segmentationManager.jl)

JHistint.apply_segmentation_SOPHYSM_tessellationMethod
apply_segmentation_SOPHYSM_tessellation(filepath_input::AbstractString,
                                filepath_output::AbstractString,
                                thresholdGray::Float64,
                                thresholdMarker::Float64,
                                min_threshold::Float32,
                                max_threshold::Float32)

The function performs segmentation of a histological slide, saves the segmented image in .png format, generates the corresponding tessellation, and translates it into an adjacency matrix and build the corresponding dataframe.

Arguments

  • filepath_input::AbstractString: The file path to the input image.
  • filepath_output::AbstractString: The file path where the output files will be saved.
  • thresholdGray::Float64: The grayscale threshold for image binarization.
  • thresholdMarker::Float64: The threshold for marker-based segmentation.
  • min_threshold: Minimal threshold for considering segments area.
  • max_threshold: Maximal threshold for considering segments area.

Notes

The function uses the watershed segmentation algorithm to segment the image into different groups of pixels. Segmentation is performed using an image feature transformation (feature_transform) and connected component labeling. The apply_segmentation_SOPHYSM function reads an .tif image, applies the SOPHYSM segmentation algorithm, and generates the following outputs:

  1. Segmented image saved as "_seg.png" in the output directory.
  2. Dataframe containing label information saved as "dataframelabels.csv"

in the output directory. Also, the dataframe contaning extra information about the segment and label computed by the segmentation algorithm.

  1. Adjacency matrix saved as ".txt" in the output directory.
  2. Dataframe containing edge information saved as "dataframeedges.csv"

in the output directory.

  1. Visualizations of the graph with vertices and edges saved as

"graphvertex.png" and "graphedges.png" in the output directory.

source
JHistint.apply_segmentation_SOPHYSM_graphMethod
apply_segmentation_SOPHYSM_graph(filepath_input::AbstractString,
                                filepath_output::AbstractString,
                                thresholdGray::Float64,
                                thresholdMarker::Float64,
                                min_threshold::Float32,
                                max_threshold::Float32)

The function performs segmentation of a histological slide, saves the segmented image in .png format, generates the corresponding graph, and translates it into an adjacency matrix and build the corresponding dataframe.

Arguments

  • filepath_input::AbstractString: The file path to the input image.
  • filepath_output::AbstractString: The file path where the output files will be saved.
  • thresholdGray::Float64: The grayscale threshold for image binarization.
  • thresholdMarker::Float64: The threshold for marker-based segmentation.
  • min_threshold: Minimal threshold for considering segments area.
  • max_threshold: Maximal threshold for considering segments area.

Notes

The function uses the watershed segmentation algorithm to segment the image into different groups of pixels. Segmentation is performed using an image feature transformation (feature_transform) and connected component labeling. The apply_segmentation_SOPHYSM function reads an .tif image, applies the SOPHYSM segmentation algorithm, and generates the following outputs:

  1. Segmented image saved as "_seg.png" in the output directory.
  2. Dataframe containing label information saved as "dataframelabels.csv"

in the output directory. Also, the dataframe contaning extra information about the segment and label computed by the segmentation algorithm.

  1. Adjacency matrix saved as ".txt" in the output directory.
  2. Dataframe containing edge information saved as "dataframeedges.csv"

in the output directory.

  1. Visualizations of the graph with vertices and edges saved as

"graphvertex.png" and "graphedges.png" in the output directory.

source

Support Functions for Graph (graphManager.jl)

JHistint.region_adjacency_graph_JHistintMethod
region_adjacency_graph_JHistint(s::SegmentedImage, weight_fn::Function,
                            min_threshold::Float32, max_threshold::Float32)

Constructs a region adjacency graph (RAG) from the SegmentedImage. It returns the RAGalong with a Dict(label=>vertex) map and a dataframe containing the information about the label. weight_fn is used to assign weights to the edges.

Arguments :

  • s::SegmentedImage: The input segmented image containing regions.
  • weight_fn::Function: A function that calculates the weight between

two adjacent regions. The function should accept two region labels as arguments and return a numeric value representing the weight.

Return value :

  • G::SimpleWeightedGraph: The adjacency graph between regions with weights

on the edges.

  • vert_map::Dict{Int, Int}: A dictionary that maps region labels to nodes

in the graph.

  • df_label::DataFrame: A DataFrame containing information about regions,

including their identifiers, positions, colors, and areas.

Notes:

weight_fn(label1, label2): Returns a real number corresponding to the weight of the edge between label1 and label2.

source
JHistint.weighted_graph_to_adjacency_matrixMethod
weighted_graph_to_adjacency_matrix(G::SimpleWeightedGraph{Int64, Float64}, n::Int64)

Converts a weighted graph represented as a SimpleWeightedGraph into an unweighted boolean adjacency matrix.

Arguments:

  • G::SimpleWeightedGraph{Int64, Float64}: Weighted graph represented as a

SimpleWeightedGraph with integer vertex labels and floating-point edge weights.

  • n::Int64: Number of nodes in the adjacency matrix.

Return value:

  • adjacency_matrix: Matrix{Int64} boolean adjacency matrix.

Notes:

The function returns an n x n adjacency matrix representing the unweighted graph. If nodes i and j are adjacent,the adjacency matrix will contain a value of 1 at position (i,j) and (j,i). Otherwise, the adjacency matrix will contain a value of 0.

source
JHistint.weighted_graph_to_adjacency_matrix_weightMethod
weighted_graph_to_adjacency_matrix_weight(G::SimpleWeightedGraph{Int64, Float64}, n::Int64)

Converts a weighted graph represented as a SimpleWeightedGraph into an weighted adjacency matrix.

Arguments:

  • G::SimpleWeightedGraph{Int64, Float64}: Weighted graph represented as a

SimpleWeightedGraph with integer vertex labels and floating-point edge weights.

  • n::Int64: Number of nodes in the adjacency matrix.

Return value:

  • adjacency_matrix: Matrix{Float32} boolean adjacency matrix.

Notes:

The function returns an n x n adjacency matrix representing the weighted graph. If nodes i and j are adjacent,the adjacency matrix will contain a value associated with the edge weight at position (i,j) and (j,i). Otherwise, the adjacency matrix will contain a value of -1.

source
JHistint.build_dataframe_as_edgelistMethod
build_dataframe_as_edgelist(mat::Matrix{Int64}, label_list::Vector{Int64})

Builds a DataFrame representing an edge list from an input adjacency matrix mat and a list of labels label_list.

Arguments:

  • mat::Matrix{Int64}: The input adjacency matrix where elements represent

connections between nodes.

  • label_list::Vector{Int64}: A list of node labels to consider when

constructing the edge list.

Return value:

  • df::DataFrame: A DataFrame representing the edges in the graph,

with columns 'origin', 'destination', and 'weight' indicating the source node, target node, and edge weight, respectively.

Notes:

The build_dataframe_as_edgelist function constructs a DataFrame that represents the edges in a graph based on the adjacency matrix mat. It iterates through the upper triangular part of the matrix and adds edges to the DataFrame for non-zero values while considering only nodes with labels present in label_list.

source
JHistint.build_dataframe_as_edgelistMethod
build_dataframe_as_edgelist(mat::Matrix{Float32}, label_list::Vector{Int64})

Builds a DataFrame representing an edge list from an input adjacency matrix mat and a list of labels label_list.

Arguments:

  • mat::Matrix{Float32}: The input adjacency matrix where elements represent

connections between nodes. In this case, the matrix has Float value.

  • label_list::Vector{Int64}: A list of node labels to consider when

constructing the edge list.

Return value:

  • df::DataFrame: A DataFrame representing the edges in the graph,

with columns 'origin', 'destination', and 'weight' indicating the source node, target node, and edge weight, respectively.

Notes:

The build_dataframe_as_edgelist function constructs a DataFrame that represents the edges in a graph based on the adjacency matrix mat. It iterates through the upper triangular part of the matrix and adds edges to the DataFrame for values different from -1 while considering only nodes with labels present in label_list.

source
JHistint.save_adjacency_matrixMethod
save_adjacency_matrix(matrix::Matrix{Int64}, filepath_matrix::AbstractString)

Function to save an adjacency matrix represented as an integer matrix to a text file.

Arguments:

  • matrix::Matrix{Int64}: The integer matrix representing the adjacency matrix.
  • filepath_matrix::AbstractString: The file path represented as a string

indicating where to save the matrix.

Notes

The function opens the file specified by the filepath_matrix path in write mode and writes the matrix in adjacency matrix format, where each row represents the adjacent nodes of a node. The numbers in the matrix are separated by spaces.

source
JHistint.save_adjacency_matrixMethod
save_adjacency_matrix(matrix::Matrix{Float32}, filepath_matrix::AbstractString)

Function to save an adjacency matrix represented as a float matrix to a text file.

Arguments:

  • matrix::Matrix{Float32}: The float matrix representing the adjacency matrix.
  • filepath_matrix::AbstractString: The file path represented as a string

indicating where to save the matrix.

Notes

The function opens the file specified by the filepath_matrix path in write mode and writes the matrix in adjacency matrix format, where each row represents the adjacent nodes of a node. The numbers in the matrix are separated by spaces.

source
JHistint.extract_vertex_positionMethod
extract_vertex_position(G::MetaGraph)

Extracts the vertex positions from a MetaGraph G.

Arguments:

  • G::MetaGraph: The input MetaGraph containing vertices with

associated positions.

Return value:

  • position_array::Vector{Luxor.Point}: An array containing Luxor.Points

representing the positions of the vertices in G.

source
JHistint.extract_vertex_colorMethod
extract_vertex_color(G::MetaGraph)

Extracts the vertex colors from a MetaGraph G.

Arguments:

  • G::MetaGraph: The input MetaGraph containing vertices with associated colors.

Return value:

  • color_array::Vector{Any}: An array containing the extracted color

information associated with the vertices in G.

source

Support Functions for Tessellation (tessellationManager.jl)

JHistint.build_df_labelMethod
build_df_label(s::SegmentedImage, min_threshold::Float32, max_threshold::Float32)

Builds DataFrames containing information about regions in a segmented image, including labels, positions, colors, and areas, and separates them into noisy, filtered, and total regions based on pixel count.

Arguments :

  • s::SegmentedImage: The segmented image containing regions.
  • min_threshold: Minimal threshold for considering segments area.
  • max_threshold: Maximal threshold for considering segments area.

Return value :

  • df_label::DataFrame: A DataFrame containing information about filtered regions

, regions associated with cell or nuclei.

  • df_noisy_label::DataFrame: A DataFrame containing information about noisy regions.
  • df_total_label::DataFrame: A DataFrame containing information about all regions.
source
JHistint.add_column_is_cellMethod
add_column_is_cell(df_labels::DataFrame, df_noisy_labels::DataFrame, df_total_labels::DataFrame)

Adds a boolean column 'is_cell' to the total labels DataFrame, indicating whether each region is a cell or not.

Arguments:

  • df_labels::DataFrame: A DataFrame containing information about filtered

regions (default : areas > 3000 pixels).

  • df_noisy_labels::DataFrame: A DataFrame containing information about

noisy regions (default : areas between 300 and 3000 pixels).

  • df_total_labels::DataFrame: A DataFrame containing information about

all regions (default : areas > 300 pixels).

Return value:

  • df_total_labels::DataFrame: The df_total_labels DataFrame with

the additional is_cell column.

Notes:

source
JHistint.build_dataframe_edges_from_gridMethod
build_dataframe_edges_from_grid(edge_list::Vector{Any}, df_total_labels::DataFrame)

Builds a DataFrame containing edge information from a given list of grid-based edges and a DataFrame of total region labels.

Arguments:

  • edge_list::Vector{Any}: A list of grid-based edges represented as pairs

of indices.

  • df_total_labels::DataFrame: A DataFrame containing information about all

regions, including labels and areas.

Return value:

  • df::DataFrame: A DataFrame containing information about edges,

including origin, destination, and edge weight.

source
JHistint.build_graph_from_tessellationMethod
build_graph_from_tessellation(df_labels::DataFrame,
                           df_noisy_labels::DataFrame,
                           df_total_labels::DataFrame,
                           w::Int64, h::Int64,
                           filepath_total_tess::AbstractString,
                           filepath_cell_tess::AbstractString)

Builds a graph rapresentation of the segmented image based on Voronoi tessellations and saves visualizations.

Arguments:

  • df_labels::DataFrame: A DataFrame containing information about

nuclei and centroids.

  • df_noisy_labels::DataFrame: A DataFrame containing information

about noisy nuclei centroids.

  • df_total_labels::DataFrame: A DataFrame containing information

about all centroids.

  • w::Int64: Width of the tessellation region.
  • h::Int64: Height of the tessellation region.
  • filepath_total_tess::AbstractString: The file path to save the

visualization of the total tessellation.

  • filepath_cell_tess::AbstractString: The file path to save the

visualization of the cell-based tessellation.

Return value:

  • df_edges::DataFrame: A DataFrame containing edge information between regions.
  • edges::Vector{Any}: A vector of pairs representing the connected

regions based on Voronoi edges.

Notes:

The build_graph_from_tessellation function performs the following steps:

  1. Extracts centroid positions from the provided DataFrames.
  2. Performs Voronoi tessellations for both the total and cell-based centroids.
  3. Plots the tessellations, including centroids and labels.
  4. Saves the visualizations to the specified file paths.
  5. Constructs an edge DataFrame based on Voronoi edges.
source
JHistint.tess_dataframe_to_adjacency_matrix_weightMethod
tess_dataframe_to_adjacency_matrix_weight(df_total_labels::DataFrame, df_edges::DataFrame, edges::Vector{Any})

Converts a DataFrame representation of edges and region labels into an adjacency matrix with weighted edges.

Arguments:

  • df_total_labels::DataFrame: A DataFrame containing information

about all regions, including labels.

  • df_edges::DataFrame: A DataFrame containing edge information,

including origin, destination, and edge weight.

  • edges::Vector{Any}: A vector of pairs representing the connected regions.

Return value:

  • adjacency_matrix::Matrix{Int}: An adjacency matrix representing the

connectivity of regions with weighted edges.

source

Support Functions for Noise Reduction (noiseManager.jl)

JHistint.compute_centroid_total_cellsMethod
compute_centroid_total_cells(s::SegmentedImage,
                                    df_label::DataFrame,
                                    min_threshold::Float32)

Computes the centroids of total cells within regions in a segmented image s and associates them with labels in the provided DataFrame df_label.

Arguments:

  • s::SegmentedImage: The segmented image containing regions.
  • df_label::DataFrame: A DataFrame with information about the regions,

including labels and other attributes.

  • min_threshold: Minimal threshold for considering segments area.

Return value:

  • df_label::DataFrame: The input DataFrame df_label with an additional

centroid column containing the computed centroids of total cells.

Notes:

The compute_centroid_total_cells function iterates through the pixels in the segmented image s to identify total cells within regions. It calculates the centroids of these total cells and associates them with their corresponding labels in the df_label DataFrame. The function marks pixels as visited to avoid redundant calculations and applies a manual threshold to exclude noise by considering only regions with pixel counts greater than the specified threshold (default = 300 pixels).

source
JHistint.compute_centroid_cellsMethod
compute_centroid_cells(s::SegmentedImage,
                            df_label::DataFrame,
                            max_threshold::Float32)

Computes the centroids of only cells associated to nuclei within regions in a segmented image s and associates them with labels in the provided DataFrame df_label.

Arguments:

  • s::SegmentedImage: The segmented image containing regions.
  • df_label::DataFrame: A DataFrame with information about the regions,

including labels and other attributes.

  • max_threshold: Maximal threshold for considering segments area.

Return value:

  • df_label::DataFrame: The input DataFrame df_label with an additional

centroid column containing the computed centroids of cells.

source
JHistint.compute_centroid_noisy_cellsMethod
compute_centroid_noisy_cells(s::SegmentedImage,
                                    df_label::DataFrame,
                                    min_threshold::Float32,
                                    max_threshold::Float32)

Computes the centroids of extra or noisy within regions in a segmented image s and associates them with labels in the provided DataFrame df_label.

Arguments:

  • s::SegmentedImage: The segmented image containing regions.
  • df_label::DataFrame: A DataFrame with information about the regions,

including labels and other attributes.

  • min_threshold: Minimal threshold for considering segments area.
  • max_threshold: Maximal threshold for considering segments area.

Return value:

  • df_label::DataFrame: The input DataFrame df_label with an additional

centroid column containing the computed centroids of extra cells.

source
JHistint.filter_dataframe_cellsMethod
filter_dataframe_cells(df_label::DataFrame, max_threshold::Float32)

Filters a DataFrame containing information about regions to retain only cells with areas greater than a specified threshold.

Arguments:

  • df_label::DataFrame: The input DataFrame containing region information.
  • max_threshold: Maximal threshold for considering segments area.

Return value:

  • df_filtered::DataFrame: A filtered DataFrame containing information

about the retained cells, including labels, positions, colors, areas, and centroids.

Notes:

The filter_dataframe_cells function takes a DataFrame df_label as input, which should contain information about regions, including labels, positions, colors, areas, and centroids. It filters this DataFrame to retain only those regions (cells) with areas greater a specified threshold (default = 3000 pixels).

source
JHistint.filter_dataframe_extrasMethod
filter_dataframe_extras(df_label::DataFrame,
                                min_threshold::Float32,
                                max_threshold::Float32)

Filters a DataFrame containing information about regions to retain only extra elements (not cells) with areas between two specified thresholds. In the default case these are 300 and 3000 pixels.

Arguments:

  • df_label::DataFrame: The input DataFrame containing region information.
  • min_threshold: Minimal threshold for considering segments area.
  • max_threshold: Maximal threshold for considering segments area.

Return value:

  • df_filtered::DataFrame: A filtered DataFrame containing information

about the retained extra elements, including labels, positions, colors, areas, and centroids.

Notes:

The filter_dataframe_extras function takes a DataFrame df_label as input, which should contain information about regions, including labels, positions, colors, areas, and centroids. It filters this DataFrame to retain only those regions that are considered "extras" (not cells) and have areas between two specified thresholds.

source

Support Functions for DataBase (dbManager.jl)

JHistint.insert_record_DBMethod
insert_record_DB(col_name::AbstractString,
                      cas_name::AbstractString,
                      tcga_case_id::AbstractString,
                      sin_cas_name::AbstractString,
                      tcga_slide_id::AbstractString,
                      link_slide::AbstractString,
                      filepath_zip::AbstractString,
                      filepath_svs::AbstractString)

Function for storing in the JHistint_DB database the information associated with each slide downloaded from the Cancer Digital Slide Archive (CDSA).

Argomenti

  • col_name::AbstractString = Collection name.
  • cas_name::AbstractString = Case name, which corresponds to the CASE-NAME

displayed by the package.

  • tcga_case_id::AbstractString = ID used by TCGA to identify the case.
  • sin_cas_name::AbstractString = Name of the individual slide, which

corresponds to the SLIDE-ID displayed by the package.

  • tcga_slide_id::AbstractString = ID used by TCGA to identify the slide.
  • link_slide::AbstractString = Link to the APIs for downloading the slide.
  • filepath_zip::AbstractString = Path where the .zip file is stored.
  • filepath_svs::AbstractString = Path where the .tif file is stored.

Notes

The JHistint_DB database is used for storing the information associated with each slide downloaded from the CDSA. The function takes a dictionary containing the information associated with the slide and stores it in the database. Data available in the JHistint_DB database for each slide:

  • collection_name TEXT = Name of the collection.
  • case_name TEXT = Name of the case.
  • TCGA_caseID TEXT = ID used by TCGA to identify the case.
  • slide_ID TEXT = Name of the individual slide case.
  • TCGA_slideID TEXT UNIQUE = ID used by TCGA to identify the slide, UNIQUE

prevents duplicates from being generated.

  • slide_path_folder_zip TEXT = Path where the .zip file is stored.
  • slide_path_folder_svs TEXT = Path where the .tif file is stored.
  • slide_path_api TEXT = Link to the API for downloading the slide.
  • slide_path_folder_seg TEXT = Path where the segmented .tif file is stored.
  • slide_svs BLOB = Histopathological slide (image).
  • slide_info_TSS TEXT = Slide information - Tissue Source Site.
  • slide_info_participant_code TEXT = Slide information - Participant Code,

alphanumeric string.

  • slide_info_sample_type TEXT = Slide information - Sample Type. The values

associated with tumor samples are in the range 01-09. 10-19 indicates the range for non-diseased normal samples. 20-29 indicates samples currently under control.

  • slide_info_vial TEXT = Slide information - Vial. Related to the ordering of

the sample in the sequence of samples. The values range from A-Z.

  • slide_info_portion TEXT = Slide information - Portion. Related to the

ordering of the analyzed portions associated with a sample. Takes values in the range 01-99.

  • slide_info_type TEXT = Slide information - Image Type. The possible

values are TS (Top Slide), BS (Bottom Slide), and MS (Middle Slide). The alphanumeric value indicates the ordering of the slide.

  • slide_path_folder_matrix TEXT = Path where the adjacency matrix .txt

file is stored.

  • matrix_data BLOB = Adjacency matrix.
source
JHistint.query_extract_slide_svsMethod
query_extract_slide_svs(collection_name::AbstractString)

The function queries the JHistint_DB and extracts the list of slides associated with the collection name provided as an argument.

Arguments

  • collection_name::AbstractString: Name of the slide collection to search

for in the JHistint_DB.

Return value

  • slide_list: List of tuples, each of which contains the ID of the slide,

the .svs file of the slide, and the path of the folder containing the .svs file.

source
JHistint.load_seg_slideMethod
load_seg_slide(filepath_seg::AbstractString, filepath_matrix::AbstractString, matrix::Matrix{Int64}, slide_id::AbstractString)

The function updates the JHistint_DB with the path of the segmented image file, the path of the adjacency matrix file in text format, and the matrix itself.

Arguments

  • filepath_seg::AbstractString: Path of the segmented

image file to add to the DB.

  • filepath_matrix::AbstractString: Path of the adjacency matrix file.
  • matrix::Matrix{Int64}: Adjacency matrix.
  • slide_id::AbstractString: ID of the slide to update with

the segmented image information.

source

SOPHYSM Support Functions for DataBase (dbManager.jl)

JHistint.insert_record_DB_SOPHYSMMethod
insert_record_DB_SOPHYSM(col_name::AbstractString,
                      cas_name::AbstractString,
                      tcga_case_id::AbstractString,
                      sin_cas_name::AbstractString,
                      tcga_slide_id::AbstractString,
                      link_slide::AbstractString,
                      filepath_zip::AbstractString,
                      filepath_svs::AbstractString,
                      path_download_db::AbstractString)

Function for storing in the JHistint_DB database the information associated with each slide downloaded from the Cancer Digital Slide Archive (CDSA). The function is different from the standard model. It stores only informations of the downloaded Slides.

Argomenti

  • col_name::AbstractString = Collection name.
  • cas_name::AbstractString = Case name, which corresponds to the CASE-NAME

displayed by the package.

  • tcga_case_id::AbstractString = ID used by TCGA to identify the case.
  • sin_cas_name::AbstractString = Name of the individual slide, which

corresponds to the SLIDE-ID displayed by the package.

  • tcga_slide_id::AbstractString = ID used by TCGA to identify the slide.
  • link_slide::AbstractString = Link to the APIs for downloading the slide.
  • filepath_zip::AbstractString = Path where the .zip file is stored.
  • filepath_svs::AbstractString = Path where the .tif file is stored.
  • path_download_db::AbstractString = Path where the DB file is stored.

Notes

The JHistint_DB database is used for storing the information associated with each slide downloaded from the CDSA. The function takes a dictionary containing the information associated with the slide and stores it in the database. Data available in the JHistint_DB database for each slide:

  • collection_name TEXT = Name of the collection.
  • case_name TEXT = Name of the case.
  • TCGA_caseID TEXT = ID used by TCGA to identify the case.
  • slide_ID TEXT = Name of the individual slide case.
  • TCGA_slideID TEXT UNIQUE = ID used by TCGA to identify the slide,

UNIQUE prevents duplicates from being generated.

  • slide_path_folder_zip TEXT = Path where the .zip file is stored.
  • slide_path_folder_svs TEXT = Path where the .tif file is stored.
  • slide_path_api TEXT = Link to the API for downloading the slide.
  • slide_path_folder_seg TEXT = Path where the segmented .tif file is stored.
  • slide_svs BLOB = Histopathological slide (image).
  • slide_info_TSS TEXT = Slide information - Tissue Source Site.
  • slide_info_participant_code TEXT = Slide information - Participant Code,

alphanumeric string.

  • slide_info_sample_type TEXT = Slide information - Sample Type.

The values associated with tumor samples are in the range 01-09. 10-19 indicates the range for non-diseased normal samples. 20-29 indicates samples currently under control.

  • slide_info_vial TEXT = Slide information - Vial. Related to the ordering

of the sample in the sequence of samples. The values range from A-Z.

  • slide_info_portion TEXT = Slide information - Portion. Related to the

ordering of the analyzed portions associated with a sample. Takes values in the range 01-99.

  • slide_info_type TEXT = Slide information - Image Type. The possible

values are TS (Top Slide), BS (Bottom Slide), and MS (Middle Slide). The alphanumeric value indicates the ordering of the slide.

  • slide_path_folder_matrix TEXT = Path where the adjacency matrix

.txt file is stored.

  • matrix_data BLOB = Adjacency matrix.
source

API Support Functions (apiManager.jl)

JHistint.download_collection_valuesMethod
download_collection_values(filepath::AbstractString)

Function for downloading data from collections available in TCGA.

Arguments

  • filepath::AbstractString = Path where to save the obtained .json file

from the API available in CDSA.

Notes

The API requires the definition of parentType and parentId. parentId specifies the identifier of the collection. The collection of images associated with TCGA is identified by the code: 5b9ef8e3e62914002e454c39. The use of limit=0 sets the absence of limits in the queried file, ensuring the complete download of the file. The API belongs to the category for managing the folders stored in the repository. The downloaded file is .json.

source
JHistint.extract_collection_valuesMethod
extract_collection_values(filepath::AbstractString)

Function to extract the values of data collections from the .json file downloaded by the download_collection_values function.

Arguments

  • filepath::AbstractString = Path where the collectionlist.json

file is stored.

Return value

  • collection_values::Array{String} = List of data collections

available in TCGA.

source
JHistint.download_project_infosMethod
download_project_infos(filepath::AbstractString, collection_name::AbstractString)

Function to download metadata associated with the selected collection at startup.

Arguments

  • filepath::AbstractString = Path to save the .json file associated with

the collection. The file is indicated with the wording collection_name.json.

  • collection_name::AbstractString = Name of the collection from which to

download the slides.

Notes

The API requires the definition of parentType, parentId, and name. The name attribute identifies the name of the collection from which you want to retrieve data (e.g., chol, esca, etc.). The API belongs to the category for managing the folders stored in the repository. The downloaded file is .json.

source
JHistint.extract_project_idMethod
extract_project_id(filepath::AbstractString)

Function to extract the id value from the metadata of the collection selected at startup.

Arguments

  • filepath::AbstractString = Path where the collection_name.json file

is stored.

Return value

  • project_id = id of the collection.
source
JHistint.getCasesForProjectMethod
getCasesForProject(filepath_case::AbstractString, project_id::AbstractString)

Function to download metadata associated with the cases of the selected collection at startup.

Arguments

  • filepath::AbstractString = Path where to save the .json file

associated with the cases of the collection. The file is indicated with the term collection_name.json.

  • project_id::AbstractString = id of the collection.

Return values

  • casesID_values = List of id of all the cases in the collection.
  • casesNAME_values = List of name of all the cases in the collection.

Notes

The API requires the definition of parentType and parentId. The parentType attribute is set to folder given the structure of the repository. The parentId is set by defining the identifier of the chosen collection. The downloaded file is .json.

source
JHistint.download_zipMethod
download_zip(link::AbstractString, filepath::AbstractString)

Function for downloading histological slides in .zip format associated with the cases of the selected collection at startup.

Arguments

  • link::AbstractString = URL to access the API for slide download.
  • filepath::AbstractString = Path to save the .zip file.
source

ZIP Support Functions (zipManager.jl)

JHistint.extract_slideMethod
extract_slide(filepath_zip::AbstractString)

Function to extract the contents of .zip files downloaded from CDSA.

Arguments

  • filepath_zip::AbstractString = Path where the .zip file for

the individual case is saved.

source