JHistint.jl - Julia Histopathology Interface
Julia interface for implementing the REST APIs available on the Cancer Slide Digital Archive (CDSA) portal for downloading histological images available in The Cancer Genome Atlas (TCGA). The Cancer Slide Digital Archive (CDSA) is a web platform for support, sharing, and analysis of digital pathological data. Currently, it hosts over 23,000 images associated with the data available on "The Cancer Genome Atlas" Data Portal. The library includes functions for managing image-processing algorithms for cellular and nuclei segmentation, constructing graph and the corresponding adjacency matrix, building tessellation and interfacing with J-Space.jl package to simulate the spatial growth and the genomic evolution of a cell population and the experiment of sequencing the genome of the sampled cells.
Link GitHub repository: JHistint.jl
Link GitHub repository, avaiable on spatial-input branch: J-Space.jl
CDSA Portal: Click Here
Repository containing the data mapped in the portal: Click Here
Guide to using the APIs: Click Here
Package Structure
- The
caseandcollectionfolders store metadata in.jsonformat for individual cases and collections available on the TCGA Data Portal. Thecollectionfolder is structured as follows:collectionlist.json= Stores access data (metadata) for collections (Projects in TCGA).collection_name.json= Stores access data (metadata) for a single collection. The.jsonfile is generated based on the collection chosen by the user.
- The
casefolder is structured as follows:collection_name.json= Stores all metadata related to cases associated with the collection selected by the user.
- The
slidesfolder stores histological images related to individual cases. The images are organized based on collection (TCGA-chol,TCGA-esca, etc.), and the individual case being analyzed (TCGA-2H-A9GF,TCGA-2H-A9GG, etc.). Within each folder related to the case, the slides are stored in compressed.zipfiles. The format of each individual slide is.tif. The folder names related to the cases correspond to the values of theCase IDfield listed in the TCGA Data Portal. The names of the.zipfiles located in each folder refer to theSample IDattribute associated with the patient. The slide name is given by concatenating theSlide IDandSlide UUIDattributes that can be found in the lower section of the web page dedicated to the generic caseTCGA-XX-YYYY.
Example: TCGA-02-0001-01C-01-TS1.zip
- 02 = refers to the TSS (Tissue Source Site).
- 0001 = refers to the code associated with the Participant, an alphanumeric string.
- 01 = refers to the Sample Type. The values associated with tumor samples are in the range 01-09. 10-19 indicates the range for non-diseased normal samples. 20-29 indicates samples currently under control.
- C = refers to the Vial field related to the ordering of the sample in the sample sequence. Values range from A-Z.
- 01 = refers to the Portion field related to the ordering of the analyzed portions associated with a sample. It takes values in the range 01-99.
- TS1 = refers to the Slide field related to the type of image. The values that can be assumed are TS (Top Slide), BS (Bottom Slide), and MS (Middle Slide). The alphanumeric value indicates the slide ordering.JHistint Collections
The available collections are:
- TCGA-BRCA = Breast Invasive Carcinoma (Breast)
- TCGA-OV = Ovarian Serous Cystadenocarcinoma (Ovary)
- TCGA-LUAD = Lung Adenocarcinoma (Bronchus and Lung)
- TCGA-UCEC = Uterine Corpus Endometrial Carcinoma (Corpus uteri)
- TCGA-GBM = Glioblastoma Multiforme (Brain)
- TCGA-HSNC = Head and Neck Squamous Cell Carcinoma (Larynx, Lip, Tonsil, Gum, Other and unspecified parths of mouth)
- TCGA-KIRC = Kidney Renal Clear Cell Carcinoma (Kidney)
- TCGA-LGG = Brain Lower Grade Glioma (Brain)
- TCGA-LUSC = Lung Squamous Cell Carcinoma (Bronchus and lung)
- TCGA-TCHA = Thyroid Carcinoma (Thyroid gland)
- TCGA-PRAD = Prostate Adenocarcinoma (Prostate gland)
- TCGA-SKCM = Skin Cutaneous Melanoma (Skin)
- TCGA-COAD = Colon Adenocarcinoma (Colon)
- TCGA-STAD = Stomach Adenocarcinoma (Stomach)
- TCGA-BLCA = Bladder Urothelial Carcinoma (Bladder)
- TCGA-LIHC = Liver Hepatocellular Carcinoma (Liver and intrahepatic bile ducts)
- TCGA-CESC = Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (Cervix uteri)
- TCGA-KIRP = Kidney Renal Papillary Cell Carcinoma (Kidney)
- TCGA-SARC = Sarcoma (Various)
- TCGA-ESCA = Esophageal Carcinoma (Esophagus)
- TCGA-PAAD = Pancreatic Adenocarcinoma (Pancreas)
- TCGA-READ = Rectum Adenocarcinoma (Rectum)
- TCGA-PCPG = Pheochromocytoma and Paraganglioma (Adrenal gland)
- TCGA-TGCT = Testicular Germ Cell Tumors (Testis)
- TCGA-THYM = Thymoma (Thymus)
- TCGA-ACC = Adrenocortical Carcinoma -Adenomas and Adenocarcinomas (Adrenal gland)
- TCGA-MESO = Mesothelioma (Heart, mediastinum and pleura)
- TCGA-UVM = Uveal Melanoma (Eye and adnexa)
- TCGA-KICH = Kidney Chromophobe (Kidney)
- TCGA-UCS = Uterine Carcinosarcoma (Uterus, NOS)
- TCGA-CHOL = Cholangiocarcinoma (Liver and intrahepatic bile ducts, Other and unspecified part of biliary track)
- TCGA-DLBC = Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (Various)
To download a specific collection, just indicate the name of the collection: BRCA, OV, LUAD.
Package Installation
- Step 1 - Install
J-Spacefromspatial-inputbranch:
(@v1.8) pkg > add https://github.com/BIMIB-DISCo/J-Space.jl.git#spatial-input- Step 2 - Install
JHistintfrom GitHub Repository:
(@v1.8) pkg > add https://github.com/niccolo99mandelli/JHistint.jl.git
julia > using JHistintPackage Installation Julia Registries (In Progress)
The JHistint package is available in the Julia Registries and can be installed as follows:
julia > using Pkg
julia > Pkg.add("JHistint")
julia > using JHistintOtherwise, type ] in the Julia REPL and execute:
(@v1.8) pkg > add JHistint
julia > using JHistintDownload Slides Main Functions (JHistint.jl)
JHistint.download_single_collection — Methoddownload_single_collection(collection_name::AbstractString)Function for downloading histological slides associated with a collection available in TCGA.
Arguments
collection_name::AbstractString= Collection of TCGA data to download
the histological slides.
Notes
The function evaluates the collection_name argument, and in case of an invalid collection, considers the configuration in the Config.toml file. The value set in the package is default.
# Examples with valid input
julia> JHistint.download_single_collection("acc")
julia> JHistint.download_single_collection("bLca")# Examples with invalid input
julia> JHistint.download_single_collection("ac")
julia> JHistint.download_single_collection("")JHistint.download_all_collection — Methoddownload_all_collection()Function for downloading histological slides associated with all collections available in TCGA.
# Examples with valid input
julia> JHistint.download_all_collection()Download Slides SOPHYSM Functions (JHistint.jl)
JHistint.download_single_collection_SOPHYSM — Methoddownload_single_collection_SOPHYSM(collection_name::AbstractString, path_to_save::AbstractString)Function for downloading histological slides in SOPYHSM_app associated with a collection available in TCGA.
Arguments
collection_name::AbstractString= Collection of TCGA data to download the
histological slides.
path_to_save::AbstractString= Local folder path for saving
histological slides.
Notes
The function evaluates the collection_name argument, and in case of an invalid collection, considers the configuration in the Config.toml file. The value set in the package is default.
# Examples with valid input
julia> JHistint.download_single_collection_SOPHYSM("acc", "C:\...")
julia> JHistint.download_single_collection_SOPHYSM("bLca", "C:\...")# Examples with invalid input
julia> JHistint.download_single_collection_SOPHYSM("ac", "C:\...")
julia> JHistint.download_single_collection_SOPHYSM("", "C:\...")JHistint.download_all_collection_SOPHYSM — Methoddownload_all_collection_SOPHYSM(path_to_save::AbstractString)Function for downloading histological slides associated with all collections available in TCGA.
Arguments
path_to_save::AbstractString= Local folder path for saving
histological slides.
# Examples with valid input
julia> JHistint.download_all_collection_SOPHYSM("C:\...")Cell Segmentation Slides Main Functions (JHistint.jl)
JHistint.slide_cell_segmentation_without_download — Methodslide_cell_segmentation_without_download(collection_name::AbstractString)Function for performing cell segmentation on histopathological slides present in the JHistint_DB database associated with the collection name provided as an argument. After generating the segmented slide, the function proceeds with constructing and saving the corresponding graph and adjacency matrix.
Arguments
collection_name::AbstractString= Collection of TCGA data to download
the histological slides.
Notes
The function utilizes the JHistint_DB database for performing cell segmentation on the histopathological slides associated with the provided collection name. It generates a segmented slide and constructs a corresponding graph and adjacency matrix. The output files are saved in a user-defined directory. The function may take a considerable amount of time to complete, depending on the size of the slides and the complexity of the segmentation algorithm. For each slide in the database, cell segmentation is performed using the apply_segmentation_without_download function, and the path where the result is saved is stored in the database using the load_seg_slide function. The segmentation process is defined in 4 steps:
- LOAD SLIDE ... (slide_id)
- APPLY SEGMENTATION ... (slide_id)
- BUILD GRAPH ... (slide_id)
- BUILD & SAVE ADJACENCY MATRIX ... (slide_id)
- J-SPACE features ... (slide_id)
The adjacency matrix is saved in the same directory as the original image in text format. Finally, a confirmation message is printed for each segmented slide. Unlike the slide_cell_segmentation_with_download function, this function does not involve the creation and download of the segmented image.
# Examples with valid input
julia> JHistint.slide_cell_segmentation_without_download("acc")
julia> JHistint.slide_cell_segmentation_without_download("bLca")# Examples with invalid input
julia> JHistint.slide_cell_segmentation_without_download("ac")
julia> JHistint.slide_cell_segmentation_without_download("")JHistint.slide_cell_segmentation_with_download — Methodslide_cell_segmentation_with_download(collection_name::AbstractString)Function for performing cell segmentation on histopathological slides present in the JHistint_DB database associated with the collection name provided as an argument. The function downloads the segmented slide, which is placed in the same directory as the original slide. After generating the segmented slide, the function proceeds with constructing and saving the corresponding graph and adjacency matrix.
Arguments
collection_name::AbstractString= TCGA data collection for which to
perform cell segmentation.
Notes
The function utilizes the JHistint_DB database for performing cell segmentation on the histopathological slides associated with the provided collection name. It generates a segmented slide and constructs a corresponding graph and adjacency matrix. The output files are saved in a user-defined directory. The function may take a considerable amount of time to complete, depending on the size of the slides and the complexity of the segmentation algorithm. For each slide in the database, cell segmentation is performed using the apply_segmentation_with_download function, and the path where the result is saved is stored in the database using the load_seg_slide function. The segmentation process is similar to that described in the slide_cell_segmentation_without_download function, with the added step of downloading the segmented image and placing it in the same directory as the original slide. The segmentation process is defined in 6 steps:
- LOAD SLIDE ... (slide_id)
- APPLY SEGMENTATION ... (slide_id)
- BUILD SEGMENTED SLIDE ... (slide_id)
- BUILD GRAPH ... (slide_id)
- BUILD & SAVE ADJACENCY MATRIX ... (slide_id)
- SAVE SEGMENTED SLIDE ... (slide_id)
- J-SPACE features ... (slide_id)
The adjacency matrix is saved in text format in the same directory as both the original and segmented images. Finally, a confirmation message is printed for each segmented slide.
# Examples with valid input
julia> JHistint.slide_cell_segmentation_with_download("acc")
julia> JHistint.slide_cell_segmentation_with_download("bLca")# Examples with invalid input
julia> JHistint.slide_cell_segmentation_with_download("ac")
julia> JHistint.slide_cell_segmentation_with_download("")SOPHYSM Main Function (JHistint.jl)
JHistint.start_segmentation_SOPHYSM_tessellation — Methodstart_segmentation_SOPHYSM_tessellation(filepath_input::AbstractString,
filepath_output::AbstractString,
thresholdGray::Float64,
thresholdMarker::Float64,
min_threshold::Float32,
max_threshold::Float32)Initiates the SOPHYSM segmentation process for histological image using tessellation process.
Arguments
filepath_input::AbstractString: The file path to the input histological
image to be segmented.
filepath_output::AbstractString: The file path where the segmented results
and related data will be saved.
thresholdGray::Float64: The grayscale threshold used for initial
image processing.
thresholdMarker::Float64: The marker threshold for identifying
cellular structures.
min_threshold: Minimal threshold for considering segments area.max_threshold: Maximal threshold for considering segments area.
JHistint.start_segmentation_SOPHYSM_graph — Methodstart_segmentation_SOPHYSM_graph(filepath_input::AbstractString,
filepath_output::AbstractString,
thresholdGray::Float64,
thresholdMarker::Float64,
min_threshold::Float32,
max_threshold::Float32)Initiates the SOPHYSM segmentation process for histological image using graph construction.
Arguments
filepath_input::AbstractString: The file path to the input histological
image to be segmented.
filepath_output::AbstractString: The file path where the segmented results
and related data will be saved.
thresholdGray::Float64: The grayscale threshold used for initial
image processing.
thresholdMarker::Float64: The marker threshold for identifying
cellular structures.
min_threshold: Minimal threshold for considering segments area.max_threshold: Maximal threshold for considering segments area.
Support Functions for Cell Segmentation (segmentationManager.jl)
JHistint.apply_segmentation_without_download — Methodapply_segmentation_without_download(slide_info::Tuple{String, Vector{UInt8}, String})The function performs the segmentation of a histological image, generates its corresponding graph, and translates it into a symmetric adjacency matrix with only 0s and 1s. Define, also, the dataframe associated with labels and edges.
Arguments
slide_info::Tuple{String, Vector{UInt8}, String}: A tuple containing the
slide ID, the image obtained from the DB, and the path of the original image file.
Return values
filepath_matrix: The path where the adjacency matrix is stored in.txtformat.matrix: The adjacency matrix constructed from the segmentation.
Notes
The function uses the watershed segmentation algorithm to segment the image into different groups of pixels. Segmentation is performed using a feature transformation of the image (feature_transform) and labeling of connected components. The distance between the different regions is then calculated, and an adjacency graph of the regions is constructed using the region_adjacency_graph function. The resulting graph is then transformed into an Int adjacency matrix using the weighted_graph_to_adjacency_matrix function and saved to the path of the original image.
JHistint.apply_segmentation_with_download — Methodapply_segmentation_with_download(slide_info::Tuple{String, Vector{UInt8}, String})The function performs segmentation of a histological image, saves the segmented image in .png format, generates the corresponding graph, and translates it into an adjacency matrix. Define, also, the dataframe associated with labels and edges.
Arguments
slide_info::Tuple{String, Vector{UInt8}, String}: Tuple containing the
slide ID, the image itself obtained from the DB, and the path of the original image file.
Return values
filepath_seg: The path where the segmented image is stored in.tifformat.filepath_matrix: The path where the graph is stored in.txtformat.matrix: The adjacency matrix constructed from the segmentation.
Notes
The function uses the watershed segmentation algorithm to segment the image into different groups of pixels. Segmentation is performed using an image feature transformation (feature_transform) and connected component labeling. The distance between different regions is then calculated, and an adjacency graph of the regions is constructed using the region_adjacency_graph function. The obtained graph is transformed into an adjacency matrix using the weighted_graph_to_adjacency_matrix function, which is saved in the path of the original image. Finally, a segmented .png image is saved, and the path of the segmented slide file is returned. It performs also the visualizations of the graph with vertices and edges saved as "graphvertex.png" and "graphedges.png" in the output directory.
JHistint.get_random_color — Methodget_random_color(seed)Function to return a random 8-bit RGB format color, using a specified seed.
Arguments
seed: An integer used to initialize the random number generator.
If two calls to the function use the same seed, the same color will be generated.
Return value
The function returns a random 8-bit RGB format color.
SOPHYSM Support Function for Cell Segmentation (segmentationManager.jl)
JHistint.apply_segmentation_SOPHYSM_tessellation — Methodapply_segmentation_SOPHYSM_tessellation(filepath_input::AbstractString,
filepath_output::AbstractString,
thresholdGray::Float64,
thresholdMarker::Float64,
min_threshold::Float32,
max_threshold::Float32)The function performs segmentation of a histological slide, saves the segmented image in .png format, generates the corresponding tessellation, and translates it into an adjacency matrix and build the corresponding dataframe.
Arguments
filepath_input::AbstractString: The file path to the input image.filepath_output::AbstractString: The file path where the output files will be saved.thresholdGray::Float64: The grayscale threshold for image binarization.thresholdMarker::Float64: The threshold for marker-based segmentation.min_threshold: Minimal threshold for considering segments area.max_threshold: Maximal threshold for considering segments area.
Notes
The function uses the watershed segmentation algorithm to segment the image into different groups of pixels. Segmentation is performed using an image feature transformation (feature_transform) and connected component labeling. The apply_segmentation_SOPHYSM function reads an .tif image, applies the SOPHYSM segmentation algorithm, and generates the following outputs:
- Segmented image saved as "_seg.png" in the output directory.
- Dataframe containing label information saved as "dataframelabels.csv"
in the output directory. Also, the dataframe contaning extra information about the segment and label computed by the segmentation algorithm.
- Adjacency matrix saved as ".txt" in the output directory.
- Dataframe containing edge information saved as "dataframeedges.csv"
in the output directory.
- Visualizations of the graph with vertices and edges saved as
"graphvertex.png" and "graphedges.png" in the output directory.
JHistint.apply_segmentation_SOPHYSM_graph — Methodapply_segmentation_SOPHYSM_graph(filepath_input::AbstractString,
filepath_output::AbstractString,
thresholdGray::Float64,
thresholdMarker::Float64,
min_threshold::Float32,
max_threshold::Float32)The function performs segmentation of a histological slide, saves the segmented image in .png format, generates the corresponding graph, and translates it into an adjacency matrix and build the corresponding dataframe.
Arguments
filepath_input::AbstractString: The file path to the input image.filepath_output::AbstractString: The file path where the output files will be saved.thresholdGray::Float64: The grayscale threshold for image binarization.thresholdMarker::Float64: The threshold for marker-based segmentation.min_threshold: Minimal threshold for considering segments area.max_threshold: Maximal threshold for considering segments area.
Notes
The function uses the watershed segmentation algorithm to segment the image into different groups of pixels. Segmentation is performed using an image feature transformation (feature_transform) and connected component labeling. The apply_segmentation_SOPHYSM function reads an .tif image, applies the SOPHYSM segmentation algorithm, and generates the following outputs:
- Segmented image saved as "_seg.png" in the output directory.
- Dataframe containing label information saved as "dataframelabels.csv"
in the output directory. Also, the dataframe contaning extra information about the segment and label computed by the segmentation algorithm.
- Adjacency matrix saved as ".txt" in the output directory.
- Dataframe containing edge information saved as "dataframeedges.csv"
in the output directory.
- Visualizations of the graph with vertices and edges saved as
"graphvertex.png" and "graphedges.png" in the output directory.
Support Functions for Graph (graphManager.jl)
JHistint.region_adjacency_graph_JHistint — Methodregion_adjacency_graph_JHistint(s::SegmentedImage, weight_fn::Function,
min_threshold::Float32, max_threshold::Float32)Constructs a region adjacency graph (RAG) from the SegmentedImage. It returns the RAGalong with a Dict(label=>vertex) map and a dataframe containing the information about the label. weight_fn is used to assign weights to the edges.
Arguments :
s::SegmentedImage: The input segmented image containing regions.weight_fn::Function: A function that calculates the weight between
two adjacent regions. The function should accept two region labels as arguments and return a numeric value representing the weight.
Return value :
G::SimpleWeightedGraph: The adjacency graph between regions with weights
on the edges.
vert_map::Dict{Int, Int}: A dictionary that maps region labels to nodes
in the graph.
df_label::DataFrame: A DataFrame containing information about regions,
including their identifiers, positions, colors, and areas.
Notes:
weight_fn(label1, label2): Returns a real number corresponding to the weight of the edge between label1 and label2.
JHistint.weighted_graph_to_adjacency_matrix — Methodweighted_graph_to_adjacency_matrix(G::SimpleWeightedGraph{Int64, Float64}, n::Int64)Converts a weighted graph represented as a SimpleWeightedGraph into an unweighted boolean adjacency matrix.
Arguments:
G::SimpleWeightedGraph{Int64, Float64}: Weighted graph represented as a
SimpleWeightedGraph with integer vertex labels and floating-point edge weights.
n::Int64: Number of nodes in the adjacency matrix.
Return value:
adjacency_matrix:Matrix{Int64}boolean adjacency matrix.
Notes:
The function returns an n x n adjacency matrix representing the unweighted graph. If nodes i and j are adjacent,the adjacency matrix will contain a value of 1 at position (i,j) and (j,i). Otherwise, the adjacency matrix will contain a value of 0.
JHistint.weighted_graph_to_adjacency_matrix_weight — Methodweighted_graph_to_adjacency_matrix_weight(G::SimpleWeightedGraph{Int64, Float64}, n::Int64)Converts a weighted graph represented as a SimpleWeightedGraph into an weighted adjacency matrix.
Arguments:
G::SimpleWeightedGraph{Int64, Float64}: Weighted graph represented as a
SimpleWeightedGraph with integer vertex labels and floating-point edge weights.
n::Int64: Number of nodes in the adjacency matrix.
Return value:
adjacency_matrix:Matrix{Float32}boolean adjacency matrix.
Notes:
The function returns an n x n adjacency matrix representing the weighted graph. If nodes i and j are adjacent,the adjacency matrix will contain a value associated with the edge weight at position (i,j) and (j,i). Otherwise, the adjacency matrix will contain a value of -1.
JHistint.build_dataframe_as_edgelist — Methodbuild_dataframe_as_edgelist(mat::Matrix{Int64}, label_list::Vector{Int64})Builds a DataFrame representing an edge list from an input adjacency matrix mat and a list of labels label_list.
Arguments:
mat::Matrix{Int64}: The input adjacency matrix where elements represent
connections between nodes.
label_list::Vector{Int64}: A list of node labels to consider when
constructing the edge list.
Return value:
df::DataFrame: A DataFrame representing the edges in the graph,
with columns 'origin', 'destination', and 'weight' indicating the source node, target node, and edge weight, respectively.
Notes:
The build_dataframe_as_edgelist function constructs a DataFrame that represents the edges in a graph based on the adjacency matrix mat. It iterates through the upper triangular part of the matrix and adds edges to the DataFrame for non-zero values while considering only nodes with labels present in label_list.
JHistint.build_dataframe_as_edgelist — Methodbuild_dataframe_as_edgelist(mat::Matrix{Float32}, label_list::Vector{Int64})Builds a DataFrame representing an edge list from an input adjacency matrix mat and a list of labels label_list.
Arguments:
mat::Matrix{Float32}: The input adjacency matrix where elements represent
connections between nodes. In this case, the matrix has Float value.
label_list::Vector{Int64}: A list of node labels to consider when
constructing the edge list.
Return value:
df::DataFrame: A DataFrame representing the edges in the graph,
with columns 'origin', 'destination', and 'weight' indicating the source node, target node, and edge weight, respectively.
Notes:
The build_dataframe_as_edgelist function constructs a DataFrame that represents the edges in a graph based on the adjacency matrix mat. It iterates through the upper triangular part of the matrix and adds edges to the DataFrame for values different from -1 while considering only nodes with labels present in label_list.
JHistint.save_adjacency_matrix — Methodsave_adjacency_matrix(matrix::Matrix{Int64}, filepath_matrix::AbstractString)Function to save an adjacency matrix represented as an integer matrix to a text file.
Arguments:
matrix::Matrix{Int64}: The integer matrix representing the adjacency matrix.filepath_matrix::AbstractString: The file path represented as a string
indicating where to save the matrix.
Notes
The function opens the file specified by the filepath_matrix path in write mode and writes the matrix in adjacency matrix format, where each row represents the adjacent nodes of a node. The numbers in the matrix are separated by spaces.
JHistint.save_adjacency_matrix — Methodsave_adjacency_matrix(matrix::Matrix{Float32}, filepath_matrix::AbstractString)Function to save an adjacency matrix represented as a float matrix to a text file.
Arguments:
matrix::Matrix{Float32}: The float matrix representing the adjacency matrix.filepath_matrix::AbstractString: The file path represented as a string
indicating where to save the matrix.
Notes
The function opens the file specified by the filepath_matrix path in write mode and writes the matrix in adjacency matrix format, where each row represents the adjacent nodes of a node. The numbers in the matrix are separated by spaces.
JHistint.extract_vertex_position — Methodextract_vertex_position(G::MetaGraph)Extracts the vertex positions from a MetaGraph G.
Arguments:
G::MetaGraph: The input MetaGraph containing vertices with
associated positions.
Return value:
position_array::Vector{Luxor.Point}: An array containingLuxor.Points
representing the positions of the vertices in G.
JHistint.extract_vertex_color — Methodextract_vertex_color(G::MetaGraph)Extracts the vertex colors from a MetaGraph G.
Arguments:
G::MetaGraph: The input MetaGraph containing vertices with associated colors.
Return value:
color_array::Vector{Any}: An array containing the extracted color
information associated with the vertices in G.
Support Functions for Tessellation (tessellationManager.jl)
JHistint.build_df_label — Methodbuild_df_label(s::SegmentedImage, min_threshold::Float32, max_threshold::Float32)Builds DataFrames containing information about regions in a segmented image, including labels, positions, colors, and areas, and separates them into noisy, filtered, and total regions based on pixel count.
Arguments :
s::SegmentedImage: The segmented image containing regions.min_threshold: Minimal threshold for considering segments area.max_threshold: Maximal threshold for considering segments area.
Return value :
df_label::DataFrame: A DataFrame containing information about filtered regions
, regions associated with cell or nuclei.
df_noisy_label::DataFrame: A DataFrame containing information about noisy regions.df_total_label::DataFrame: A DataFrame containing information about all regions.
JHistint.add_column_is_cell — Methodadd_column_is_cell(df_labels::DataFrame, df_noisy_labels::DataFrame, df_total_labels::DataFrame)Adds a boolean column 'is_cell' to the total labels DataFrame, indicating whether each region is a cell or not.
Arguments:
df_labels::DataFrame: A DataFrame containing information about filtered
regions (default : areas > 3000 pixels).
df_noisy_labels::DataFrame: A DataFrame containing information about
noisy regions (default : areas between 300 and 3000 pixels).
df_total_labels::DataFrame: A DataFrame containing information about
all regions (default : areas > 300 pixels).
Return value:
df_total_labels::DataFrame: Thedf_total_labelsDataFrame with
the additional is_cell column.
Notes:
JHistint.build_dataframe_edges_from_grid — Methodbuild_dataframe_edges_from_grid(edge_list::Vector{Any}, df_total_labels::DataFrame)Builds a DataFrame containing edge information from a given list of grid-based edges and a DataFrame of total region labels.
Arguments:
edge_list::Vector{Any}: A list of grid-based edges represented as pairs
of indices.
df_total_labels::DataFrame: A DataFrame containing information about all
regions, including labels and areas.
Return value:
df::DataFrame: A DataFrame containing information about edges,
including origin, destination, and edge weight.
JHistint.build_graph_from_tessellation — Methodbuild_graph_from_tessellation(df_labels::DataFrame,
df_noisy_labels::DataFrame,
df_total_labels::DataFrame,
w::Int64, h::Int64,
filepath_total_tess::AbstractString,
filepath_cell_tess::AbstractString)Builds a graph rapresentation of the segmented image based on Voronoi tessellations and saves visualizations.
Arguments:
df_labels::DataFrame: A DataFrame containing information about
nuclei and centroids.
df_noisy_labels::DataFrame: A DataFrame containing information
about noisy nuclei centroids.
df_total_labels::DataFrame: A DataFrame containing information
about all centroids.
w::Int64: Width of the tessellation region.h::Int64: Height of the tessellation region.filepath_total_tess::AbstractString: The file path to save the
visualization of the total tessellation.
filepath_cell_tess::AbstractString: The file path to save the
visualization of the cell-based tessellation.
Return value:
df_edges::DataFrame: A DataFrame containing edge information between regions.edges::Vector{Any}: A vector of pairs representing the connected
regions based on Voronoi edges.
Notes:
The build_graph_from_tessellation function performs the following steps:
- Extracts centroid positions from the provided DataFrames.
- Performs Voronoi tessellations for both the total and cell-based centroids.
- Plots the tessellations, including centroids and labels.
- Saves the visualizations to the specified file paths.
- Constructs an edge DataFrame based on Voronoi edges.
JHistint.tess_dataframe_to_adjacency_matrix_weight — Methodtess_dataframe_to_adjacency_matrix_weight(df_total_labels::DataFrame, df_edges::DataFrame, edges::Vector{Any})Converts a DataFrame representation of edges and region labels into an adjacency matrix with weighted edges.
Arguments:
df_total_labels::DataFrame: A DataFrame containing information
about all regions, including labels.
df_edges::DataFrame: A DataFrame containing edge information,
including origin, destination, and edge weight.
edges::Vector{Any}: A vector of pairs representing the connected regions.
Return value:
adjacency_matrix::Matrix{Int}: An adjacency matrix representing the
connectivity of regions with weighted edges.
Support Functions for Noise Reduction (noiseManager.jl)
JHistint.compute_centroid_total_cells — Methodcompute_centroid_total_cells(s::SegmentedImage,
df_label::DataFrame,
min_threshold::Float32)Computes the centroids of total cells within regions in a segmented image s and associates them with labels in the provided DataFrame df_label.
Arguments:
s::SegmentedImage: The segmented image containing regions.df_label::DataFrame: A DataFrame with information about the regions,
including labels and other attributes.
min_threshold: Minimal threshold for considering segments area.
Return value:
df_label::DataFrame: The input DataFramedf_labelwith an additional
centroid column containing the computed centroids of total cells.
Notes:
The compute_centroid_total_cells function iterates through the pixels in the segmented image s to identify total cells within regions. It calculates the centroids of these total cells and associates them with their corresponding labels in the df_label DataFrame. The function marks pixels as visited to avoid redundant calculations and applies a manual threshold to exclude noise by considering only regions with pixel counts greater than the specified threshold (default = 300 pixels).
JHistint.compute_centroid_cells — Methodcompute_centroid_cells(s::SegmentedImage,
df_label::DataFrame,
max_threshold::Float32)Computes the centroids of only cells associated to nuclei within regions in a segmented image s and associates them with labels in the provided DataFrame df_label.
Arguments:
s::SegmentedImage: The segmented image containing regions.df_label::DataFrame: A DataFrame with information about the regions,
including labels and other attributes.
max_threshold: Maximal threshold for considering segments area.
Return value:
df_label::DataFrame: The input DataFramedf_labelwith an additional
centroid column containing the computed centroids of cells.
JHistint.compute_centroid_noisy_cells — Methodcompute_centroid_noisy_cells(s::SegmentedImage,
df_label::DataFrame,
min_threshold::Float32,
max_threshold::Float32)Computes the centroids of extra or noisy within regions in a segmented image s and associates them with labels in the provided DataFrame df_label.
Arguments:
s::SegmentedImage: The segmented image containing regions.df_label::DataFrame: A DataFrame with information about the regions,
including labels and other attributes.
min_threshold: Minimal threshold for considering segments area.max_threshold: Maximal threshold for considering segments area.
Return value:
df_label::DataFrame: The input DataFramedf_labelwith an additional
centroid column containing the computed centroids of extra cells.
JHistint.filter_dataframe_cells — Methodfilter_dataframe_cells(df_label::DataFrame, max_threshold::Float32)Filters a DataFrame containing information about regions to retain only cells with areas greater than a specified threshold.
Arguments:
df_label::DataFrame: The input DataFrame containing region information.max_threshold: Maximal threshold for considering segments area.
Return value:
df_filtered::DataFrame: A filtered DataFrame containing information
about the retained cells, including labels, positions, colors, areas, and centroids.
Notes:
The filter_dataframe_cells function takes a DataFrame df_label as input, which should contain information about regions, including labels, positions, colors, areas, and centroids. It filters this DataFrame to retain only those regions (cells) with areas greater a specified threshold (default = 3000 pixels).
JHistint.filter_dataframe_extras — Methodfilter_dataframe_extras(df_label::DataFrame,
min_threshold::Float32,
max_threshold::Float32)Filters a DataFrame containing information about regions to retain only extra elements (not cells) with areas between two specified thresholds. In the default case these are 300 and 3000 pixels.
Arguments:
df_label::DataFrame: The input DataFrame containing region information.min_threshold: Minimal threshold for considering segments area.max_threshold: Maximal threshold for considering segments area.
Return value:
df_filtered::DataFrame: A filtered DataFrame containing information
about the retained extra elements, including labels, positions, colors, areas, and centroids.
Notes:
The filter_dataframe_extras function takes a DataFrame df_label as input, which should contain information about regions, including labels, positions, colors, areas, and centroids. It filters this DataFrame to retain only those regions that are considered "extras" (not cells) and have areas between two specified thresholds.
Support Functions for DataBase (dbManager.jl)
JHistint.insert_record_DB — Methodinsert_record_DB(col_name::AbstractString,
cas_name::AbstractString,
tcga_case_id::AbstractString,
sin_cas_name::AbstractString,
tcga_slide_id::AbstractString,
link_slide::AbstractString,
filepath_zip::AbstractString,
filepath_svs::AbstractString)Function for storing in the JHistint_DB database the information associated with each slide downloaded from the Cancer Digital Slide Archive (CDSA).
Argomenti
col_name::AbstractString= Collection name.cas_name::AbstractString= Case name, which corresponds to theCASE-NAME
displayed by the package.
tcga_case_id::AbstractString= ID used by TCGA to identify the case.sin_cas_name::AbstractString= Name of the individual slide, which
corresponds to the SLIDE-ID displayed by the package.
tcga_slide_id::AbstractString= ID used by TCGA to identify the slide.link_slide::AbstractString= Link to the APIs for downloading the slide.filepath_zip::AbstractString= Path where the.zipfile is stored.filepath_svs::AbstractString= Path where the.tiffile is stored.
Notes
The JHistint_DB database is used for storing the information associated with each slide downloaded from the CDSA. The function takes a dictionary containing the information associated with the slide and stores it in the database. Data available in the JHistint_DB database for each slide:
collection_name TEXT= Name of the collection.case_name TEXT= Name of the case.TCGA_caseID TEXT= ID used by TCGA to identify the case.slide_ID TEXT= Name of the individual slide case.TCGA_slideID TEXT UNIQUE= ID used by TCGA to identify the slide,UNIQUE
prevents duplicates from being generated.
slide_path_folder_zip TEXT= Path where the.zipfile is stored.slide_path_folder_svs TEXT= Path where the.tiffile is stored.slide_path_api TEXT= Link to the API for downloading the slide.slide_path_folder_seg TEXT= Path where the segmented.tiffile is stored.slide_svs BLOB= Histopathological slide (image).slide_info_TSS TEXT= Slide information - Tissue Source Site.slide_info_participant_code TEXT= Slide information - Participant Code,
alphanumeric string.
slide_info_sample_type TEXT= Slide information - Sample Type. The values
associated with tumor samples are in the range 01-09. 10-19 indicates the range for non-diseased normal samples. 20-29 indicates samples currently under control.
slide_info_vial TEXT= Slide information - Vial. Related to the ordering of
the sample in the sequence of samples. The values range from A-Z.
slide_info_portion TEXT= Slide information - Portion. Related to the
ordering of the analyzed portions associated with a sample. Takes values in the range 01-99.
slide_info_type TEXT= Slide information - Image Type. The possible
values are TS (Top Slide), BS (Bottom Slide), and MS (Middle Slide). The alphanumeric value indicates the ordering of the slide.
slide_path_folder_matrix TEXT= Path where the adjacency matrix.txt
file is stored.
matrix_data BLOB= Adjacency matrix.
JHistint.query_extract_slide_svs — Methodquery_extract_slide_svs(collection_name::AbstractString)The function queries the JHistint_DB and extracts the list of slides associated with the collection name provided as an argument.
Arguments
collection_name::AbstractString: Name of the slide collection to search
for in the JHistint_DB.
Return value
slide_list: List of tuples, each of which contains the ID of the slide,
the .svs file of the slide, and the path of the folder containing the .svs file.
JHistint.load_seg_slide — Methodload_seg_slide(filepath_seg::AbstractString, filepath_matrix::AbstractString, matrix::Matrix{Int64}, slide_id::AbstractString)The function updates the JHistint_DB with the path of the segmented image file, the path of the adjacency matrix file in text format, and the matrix itself.
Arguments
filepath_seg::AbstractString: Path of the segmented
image file to add to the DB.
filepath_matrix::AbstractString: Path of the adjacency matrix file.matrix::Matrix{Int64}: Adjacency matrix.slide_id::AbstractString: ID of the slide to update with
the segmented image information.
SOPHYSM Support Functions for DataBase (dbManager.jl)
JHistint.insert_record_DB_SOPHYSM — Methodinsert_record_DB_SOPHYSM(col_name::AbstractString,
cas_name::AbstractString,
tcga_case_id::AbstractString,
sin_cas_name::AbstractString,
tcga_slide_id::AbstractString,
link_slide::AbstractString,
filepath_zip::AbstractString,
filepath_svs::AbstractString,
path_download_db::AbstractString)Function for storing in the JHistint_DB database the information associated with each slide downloaded from the Cancer Digital Slide Archive (CDSA). The function is different from the standard model. It stores only informations of the downloaded Slides.
Argomenti
col_name::AbstractString= Collection name.cas_name::AbstractString= Case name, which corresponds to theCASE-NAME
displayed by the package.
tcga_case_id::AbstractString= ID used by TCGA to identify the case.sin_cas_name::AbstractString= Name of the individual slide, which
corresponds to the SLIDE-ID displayed by the package.
tcga_slide_id::AbstractString= ID used by TCGA to identify the slide.link_slide::AbstractString= Link to the APIs for downloading the slide.filepath_zip::AbstractString= Path where the.zipfile is stored.filepath_svs::AbstractString= Path where the.tiffile is stored.path_download_db::AbstractString= Path where the DB file is stored.
Notes
The JHistint_DB database is used for storing the information associated with each slide downloaded from the CDSA. The function takes a dictionary containing the information associated with the slide and stores it in the database. Data available in the JHistint_DB database for each slide:
collection_name TEXT= Name of the collection.case_name TEXT= Name of the case.TCGA_caseID TEXT= ID used by TCGA to identify the case.slide_ID TEXT= Name of the individual slide case.TCGA_slideID TEXT UNIQUE= ID used by TCGA to identify the slide,
UNIQUE prevents duplicates from being generated.
slide_path_folder_zip TEXT= Path where the.zipfile is stored.slide_path_folder_svs TEXT= Path where the.tiffile is stored.slide_path_api TEXT= Link to the API for downloading the slide.slide_path_folder_seg TEXT= Path where the segmented.tiffile is stored.slide_svs BLOB= Histopathological slide (image).slide_info_TSS TEXT= Slide information - Tissue Source Site.slide_info_participant_code TEXT= Slide information - Participant Code,
alphanumeric string.
slide_info_sample_type TEXT= Slide information - Sample Type.
The values associated with tumor samples are in the range 01-09. 10-19 indicates the range for non-diseased normal samples. 20-29 indicates samples currently under control.
slide_info_vial TEXT= Slide information - Vial. Related to the ordering
of the sample in the sequence of samples. The values range from A-Z.
slide_info_portion TEXT= Slide information - Portion. Related to the
ordering of the analyzed portions associated with a sample. Takes values in the range 01-99.
slide_info_type TEXT= Slide information - Image Type. The possible
values are TS (Top Slide), BS (Bottom Slide), and MS (Middle Slide). The alphanumeric value indicates the ordering of the slide.
slide_path_folder_matrix TEXT= Path where the adjacency matrix
.txt file is stored.
matrix_data BLOB= Adjacency matrix.
API Support Functions (apiManager.jl)
JHistint.download_collection_values — Methoddownload_collection_values(filepath::AbstractString)Function for downloading data from collections available in TCGA.
Arguments
filepath::AbstractString= Path where to save the obtained.jsonfile
from the API available in CDSA.
Notes
The API requires the definition of parentType and parentId. parentId specifies the identifier of the collection. The collection of images associated with TCGA is identified by the code: 5b9ef8e3e62914002e454c39. The use of limit=0 sets the absence of limits in the queried file, ensuring the complete download of the file. The API belongs to the category for managing the folders stored in the repository. The downloaded file is .json.
JHistint.extract_collection_values — Methodextract_collection_values(filepath::AbstractString)Function to extract the values of data collections from the .json file downloaded by the download_collection_values function.
Arguments
filepath::AbstractString= Path where thecollectionlist.json
file is stored.
Return value
collection_values::Array{String}= List of data collections
available in TCGA.
JHistint.download_project_infos — Methoddownload_project_infos(filepath::AbstractString, collection_name::AbstractString)Function to download metadata associated with the selected collection at startup.
Arguments
filepath::AbstractString= Path to save the.jsonfile associated with
the collection. The file is indicated with the wording collection_name.json.
collection_name::AbstractString= Name of the collection from which to
download the slides.
Notes
The API requires the definition of parentType, parentId, and name. The name attribute identifies the name of the collection from which you want to retrieve data (e.g., chol, esca, etc.). The API belongs to the category for managing the folders stored in the repository. The downloaded file is .json.
JHistint.extract_project_id — Methodextract_project_id(filepath::AbstractString)Function to extract the id value from the metadata of the collection selected at startup.
Arguments
filepath::AbstractString= Path where thecollection_name.jsonfile
is stored.
Return value
project_id=idof the collection.
JHistint.getCasesForProject — MethodgetCasesForProject(filepath_case::AbstractString, project_id::AbstractString)Function to download metadata associated with the cases of the selected collection at startup.
Arguments
filepath::AbstractString= Path where to save the.jsonfile
associated with the cases of the collection. The file is indicated with the term collection_name.json.
project_id::AbstractString=idof the collection.
Return values
casesID_values= List ofidof all the cases in the collection.casesNAME_values= List ofnameof all the cases in the collection.
Notes
The API requires the definition of parentType and parentId. The parentType attribute is set to folder given the structure of the repository. The parentId is set by defining the identifier of the chosen collection. The downloaded file is .json.
JHistint.download_zip — Methoddownload_zip(link::AbstractString, filepath::AbstractString)Function for downloading histological slides in .zip format associated with the cases of the selected collection at startup.
Arguments
link::AbstractString= URL to access the API for slide download.filepath::AbstractString= Path to save the.zipfile.
ZIP Support Functions (zipManager.jl)
JHistint.extract_slide — Methodextract_slide(filepath_zip::AbstractString)Function to extract the contents of .zip files downloaded from CDSA.
Arguments
filepath_zip::AbstractString= Path where the.zipfile for
the individual case is saved.