JHistint.jl - Julia Histopathology Interface
Julia interface for implementing the REST APIs available on the Cancer Slide Digital Archive (CDSA) portal for downloading histological images available in The Cancer Genome Atlas (TCGA). The Cancer Slide Digital Archive (CDSA) is a web platform for support, sharing, and analysis of digital pathological data. Currently, it hosts over 23,000 images associated with the data available on "The Cancer Genome Atlas" Data Portal. The library includes functions for managing image-processing algorithms for cellular and nuclei segmentation, constructing graph and the corresponding adjacency matrix, building tessellation and interfacing with J-Space.jl package to simulate the spatial growth and the genomic evolution of a cell population and the experiment of sequencing the genome of the sampled cells.
Link GitHub repository: JHistint.jl
Link GitHub repository, avaiable on spatial-input
branch: J-Space.jl
CDSA Portal: Click Here
Repository containing the data mapped in the portal: Click Here
Guide to using the APIs: Click Here
Package Structure
- The
case
andcollection
folders store metadata in.json
format for individual cases and collections available on the TCGA Data Portal. Thecollection
folder is structured as follows:collectionlist.json
= Stores access data (metadata) for collections (Projects in TCGA).collection_name.json
= Stores access data (metadata) for a single collection. The.json
file is generated based on the collection chosen by the user.
- The
case
folder is structured as follows:collection_name.json
= Stores all metadata related to cases associated with the collection selected by the user.
- The
slides
folder stores histological images related to individual cases. The images are organized based on collection (TCGA-chol
,TCGA-esca
, etc.), and the individual case being analyzed (TCGA-2H-A9GF
,TCGA-2H-A9GG
, etc.). Within each folder related to the case, the slides are stored in compressed.zip
files. The format of each individual slide is.tif
. The folder names related to the cases correspond to the values of theCase ID
field listed in the TCGA Data Portal. The names of the.zip
files located in each folder refer to theSample ID
attribute associated with the patient. The slide name is given by concatenating theSlide ID
andSlide UUID
attributes that can be found in the lower section of the web page dedicated to the generic caseTCGA-XX-YYYY
.
Example: TCGA-02-0001-01C-01-TS1.zip
- 02 = refers to the TSS (Tissue Source Site).
- 0001 = refers to the code associated with the Participant, an alphanumeric string.
- 01 = refers to the Sample Type. The values associated with tumor samples are in the range 01-09. 10-19 indicates the range for non-diseased normal samples. 20-29 indicates samples currently under control.
- C = refers to the Vial field related to the ordering of the sample in the sample sequence. Values range from A-Z.
- 01 = refers to the Portion field related to the ordering of the analyzed portions associated with a sample. It takes values in the range 01-99.
- TS1 = refers to the Slide field related to the type of image. The values that can be assumed are TS (Top Slide), BS (Bottom Slide), and MS (Middle Slide). The alphanumeric value indicates the slide ordering.
JHistint Collections
The available collections are:
- TCGA-BRCA = Breast Invasive Carcinoma (Breast)
- TCGA-OV = Ovarian Serous Cystadenocarcinoma (Ovary)
- TCGA-LUAD = Lung Adenocarcinoma (Bronchus and Lung)
- TCGA-UCEC = Uterine Corpus Endometrial Carcinoma (Corpus uteri)
- TCGA-GBM = Glioblastoma Multiforme (Brain)
- TCGA-HSNC = Head and Neck Squamous Cell Carcinoma (Larynx, Lip, Tonsil, Gum, Other and unspecified parths of mouth)
- TCGA-KIRC = Kidney Renal Clear Cell Carcinoma (Kidney)
- TCGA-LGG = Brain Lower Grade Glioma (Brain)
- TCGA-LUSC = Lung Squamous Cell Carcinoma (Bronchus and lung)
- TCGA-TCHA = Thyroid Carcinoma (Thyroid gland)
- TCGA-PRAD = Prostate Adenocarcinoma (Prostate gland)
- TCGA-SKCM = Skin Cutaneous Melanoma (Skin)
- TCGA-COAD = Colon Adenocarcinoma (Colon)
- TCGA-STAD = Stomach Adenocarcinoma (Stomach)
- TCGA-BLCA = Bladder Urothelial Carcinoma (Bladder)
- TCGA-LIHC = Liver Hepatocellular Carcinoma (Liver and intrahepatic bile ducts)
- TCGA-CESC = Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (Cervix uteri)
- TCGA-KIRP = Kidney Renal Papillary Cell Carcinoma (Kidney)
- TCGA-SARC = Sarcoma (Various)
- TCGA-ESCA = Esophageal Carcinoma (Esophagus)
- TCGA-PAAD = Pancreatic Adenocarcinoma (Pancreas)
- TCGA-READ = Rectum Adenocarcinoma (Rectum)
- TCGA-PCPG = Pheochromocytoma and Paraganglioma (Adrenal gland)
- TCGA-TGCT = Testicular Germ Cell Tumors (Testis)
- TCGA-THYM = Thymoma (Thymus)
- TCGA-ACC = Adrenocortical Carcinoma -Adenomas and Adenocarcinomas (Adrenal gland)
- TCGA-MESO = Mesothelioma (Heart, mediastinum and pleura)
- TCGA-UVM = Uveal Melanoma (Eye and adnexa)
- TCGA-KICH = Kidney Chromophobe (Kidney)
- TCGA-UCS = Uterine Carcinosarcoma (Uterus, NOS)
- TCGA-CHOL = Cholangiocarcinoma (Liver and intrahepatic bile ducts, Other and unspecified part of biliary track)
- TCGA-DLBC = Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (Various)
To download a specific collection, just indicate the name of the collection: BRCA
, OV
, LUAD
.
Package Installation
- Step 1 - Install
J-Space
fromspatial-input
branch:
(@v1.8) pkg > add https://github.com/BIMIB-DISCo/J-Space.jl.git#spatial-input
- Step 2 - Install
JHistint
from GitHub Repository:
(@v1.8) pkg > add https://github.com/niccolo99mandelli/JHistint.jl.git
julia > using JHistint
Package Installation Julia Registries (In Progress)
The JHistint
package is available in the Julia Registries and can be installed as follows:
julia > using Pkg
julia > Pkg.add("JHistint")
julia > using JHistint
Otherwise, type ]
in the Julia REPL and execute:
(@v1.8) pkg > add JHistint
julia > using JHistint
Download Slides Main Functions (JHistint.jl)
JHistint.download_single_collection
— Methoddownload_single_collection(collection_name::AbstractString)
Function for downloading histological slides associated with a collection available in TCGA.
Arguments
collection_name::AbstractString
= Collection of TCGA data to download
the histological slides.
Notes
The function evaluates the collection_name
argument, and in case of an invalid collection, considers the configuration in the Config.toml
file. The value set in the package is default
.
# Examples with valid input
julia> JHistint.download_single_collection("acc")
julia> JHistint.download_single_collection("bLca")
# Examples with invalid input
julia> JHistint.download_single_collection("ac")
julia> JHistint.download_single_collection("")
JHistint.download_all_collection
— Methoddownload_all_collection()
Function for downloading histological slides associated with all collections available in TCGA.
# Examples with valid input
julia> JHistint.download_all_collection()
Download Slides SOPHYSM Functions (JHistint.jl)
JHistint.download_single_collection_SOPHYSM
— Methoddownload_single_collection_SOPHYSM(collection_name::AbstractString, path_to_save::AbstractString)
Function for downloading histological slides in SOPYHSM_app associated with a collection available in TCGA.
Arguments
collection_name::AbstractString
= Collection of TCGA data to download the
histological slides.
path_to_save::AbstractString
= Local folder path for saving
histological slides.
Notes
The function evaluates the collection_name
argument, and in case of an invalid collection, considers the configuration in the Config.toml
file. The value set in the package is default
.
# Examples with valid input
julia> JHistint.download_single_collection_SOPHYSM("acc", "C:\...")
julia> JHistint.download_single_collection_SOPHYSM("bLca", "C:\...")
# Examples with invalid input
julia> JHistint.download_single_collection_SOPHYSM("ac", "C:\...")
julia> JHistint.download_single_collection_SOPHYSM("", "C:\...")
JHistint.download_all_collection_SOPHYSM
— Methoddownload_all_collection_SOPHYSM(path_to_save::AbstractString)
Function for downloading histological slides associated with all collections available in TCGA.
Arguments
path_to_save::AbstractString
= Local folder path for saving
histological slides.
# Examples with valid input
julia> JHistint.download_all_collection_SOPHYSM("C:\...")
Cell Segmentation Slides Main Functions (JHistint.jl)
JHistint.slide_cell_segmentation_without_download
— Methodslide_cell_segmentation_without_download(collection_name::AbstractString)
Function for performing cell segmentation on histopathological slides present in the JHistint_DB
database associated with the collection name provided as an argument. After generating the segmented slide, the function proceeds with constructing and saving the corresponding graph and adjacency matrix.
Arguments
collection_name::AbstractString
= Collection of TCGA data to download
the histological slides.
Notes
The function utilizes the JHistint_DB
database for performing cell segmentation on the histopathological slides associated with the provided collection name. It generates a segmented slide and constructs a corresponding graph and adjacency matrix. The output files are saved in a user-defined directory. The function may take a considerable amount of time to complete, depending on the size of the slides and the complexity of the segmentation algorithm. For each slide in the database, cell segmentation is performed using the apply_segmentation_without_download
function, and the path where the result is saved is stored in the database using the load_seg_slide
function. The segmentation process is defined in 4 steps:
- LOAD SLIDE ... (slide_id)
- APPLY SEGMENTATION ... (slide_id)
- BUILD GRAPH ... (slide_id)
- BUILD & SAVE ADJACENCY MATRIX ... (slide_id)
- J-SPACE features ... (slide_id)
The adjacency matrix is saved in the same directory as the original image in text format. Finally, a confirmation message is printed for each segmented slide. Unlike the slide_cell_segmentation_with_download
function, this function does not involve the creation and download of the segmented image.
# Examples with valid input
julia> JHistint.slide_cell_segmentation_without_download("acc")
julia> JHistint.slide_cell_segmentation_without_download("bLca")
# Examples with invalid input
julia> JHistint.slide_cell_segmentation_without_download("ac")
julia> JHistint.slide_cell_segmentation_without_download("")
JHistint.slide_cell_segmentation_with_download
— Methodslide_cell_segmentation_with_download(collection_name::AbstractString)
Function for performing cell segmentation on histopathological slides present in the JHistint_DB
database associated with the collection name provided as an argument. The function downloads the segmented slide, which is placed in the same directory as the original slide. After generating the segmented slide, the function proceeds with constructing and saving the corresponding graph and adjacency matrix.
Arguments
collection_name::AbstractString
= TCGA data collection for which to
perform cell segmentation.
Notes
The function utilizes the JHistint_DB
database for performing cell segmentation on the histopathological slides associated with the provided collection name. It generates a segmented slide and constructs a corresponding graph and adjacency matrix. The output files are saved in a user-defined directory. The function may take a considerable amount of time to complete, depending on the size of the slides and the complexity of the segmentation algorithm. For each slide in the database, cell segmentation is performed using the apply_segmentation_with_download
function, and the path where the result is saved is stored in the database using the load_seg_slide
function. The segmentation process is similar to that described in the slide_cell_segmentation_without_download
function, with the added step of downloading the segmented image and placing it in the same directory as the original slide. The segmentation process is defined in 6 steps:
- LOAD SLIDE ... (slide_id)
- APPLY SEGMENTATION ... (slide_id)
- BUILD SEGMENTED SLIDE ... (slide_id)
- BUILD GRAPH ... (slide_id)
- BUILD & SAVE ADJACENCY MATRIX ... (slide_id)
- SAVE SEGMENTED SLIDE ... (slide_id)
- J-SPACE features ... (slide_id)
The adjacency matrix is saved in text format in the same directory as both the original and segmented images. Finally, a confirmation message is printed for each segmented slide.
# Examples with valid input
julia> JHistint.slide_cell_segmentation_with_download("acc")
julia> JHistint.slide_cell_segmentation_with_download("bLca")
# Examples with invalid input
julia> JHistint.slide_cell_segmentation_with_download("ac")
julia> JHistint.slide_cell_segmentation_with_download("")
SOPHYSM Main Function (JHistint.jl)
JHistint.start_segmentation_SOPHYSM_tessellation
— Methodstart_segmentation_SOPHYSM_tessellation(filepath_input::AbstractString,
filepath_output::AbstractString,
thresholdGray::Float64,
thresholdMarker::Float64,
min_threshold::Float32,
max_threshold::Float32)
Initiates the SOPHYSM segmentation process for histological image using tessellation process.
Arguments
filepath_input::AbstractString
: The file path to the input histological
image to be segmented.
filepath_output::AbstractString
: The file path where the segmented results
and related data will be saved.
thresholdGray::Float64
: The grayscale threshold used for initial
image processing.
thresholdMarker::Float64
: The marker threshold for identifying
cellular structures.
min_threshold
: Minimal threshold for considering segments area.max_threshold
: Maximal threshold for considering segments area.
JHistint.start_segmentation_SOPHYSM_graph
— Methodstart_segmentation_SOPHYSM_graph(filepath_input::AbstractString,
filepath_output::AbstractString,
thresholdGray::Float64,
thresholdMarker::Float64,
min_threshold::Float32,
max_threshold::Float32)
Initiates the SOPHYSM segmentation process for histological image using graph construction.
Arguments
filepath_input::AbstractString
: The file path to the input histological
image to be segmented.
filepath_output::AbstractString
: The file path where the segmented results
and related data will be saved.
thresholdGray::Float64
: The grayscale threshold used for initial
image processing.
thresholdMarker::Float64
: The marker threshold for identifying
cellular structures.
min_threshold
: Minimal threshold for considering segments area.max_threshold
: Maximal threshold for considering segments area.
Support Functions for Cell Segmentation (segmentationManager.jl)
JHistint.apply_segmentation_without_download
— Methodapply_segmentation_without_download(slide_info::Tuple{String, Vector{UInt8}, String})
The function performs the segmentation of a histological image, generates its corresponding graph, and translates it into a symmetric adjacency matrix with only 0s and 1s. Define, also, the dataframe associated with labels and edges.
Arguments
slide_info::Tuple{String, Vector{UInt8}, String}
: A tuple containing the
slide ID, the image obtained from the DB, and the path of the original image file.
Return values
filepath_matrix
: The path where the adjacency matrix is stored in.txt
format.matrix
: The adjacency matrix constructed from the segmentation.
Notes
The function uses the watershed segmentation algorithm to segment the image into different groups of pixels. Segmentation is performed using a feature transformation of the image (feature_transform
) and labeling of connected components. The distance between the different regions is then calculated, and an adjacency graph of the regions is constructed using the region_adjacency_graph
function. The resulting graph is then transformed into an Int adjacency matrix using the weighted_graph_to_adjacency_matrix
function and saved to the path of the original image.
JHistint.apply_segmentation_with_download
— Methodapply_segmentation_with_download(slide_info::Tuple{String, Vector{UInt8}, String})
The function performs segmentation of a histological image, saves the segmented image in .png
format, generates the corresponding graph, and translates it into an adjacency matrix. Define, also, the dataframe associated with labels and edges.
Arguments
slide_info::Tuple{String, Vector{UInt8}, String}
: Tuple containing the
slide ID, the image itself obtained from the DB, and the path of the original image file.
Return values
filepath_seg
: The path where the segmented image is stored in.tif
format.filepath_matrix
: The path where the graph is stored in.txt
format.matrix
: The adjacency matrix constructed from the segmentation.
Notes
The function uses the watershed segmentation algorithm to segment the image into different groups of pixels. Segmentation is performed using an image feature transformation (feature_transform
) and connected component labeling. The distance between different regions is then calculated, and an adjacency graph of the regions is constructed using the region_adjacency_graph
function. The obtained graph is transformed into an adjacency matrix using the weighted_graph_to_adjacency_matrix
function, which is saved in the path of the original image. Finally, a segmented .png
image is saved, and the path of the segmented slide file is returned. It performs also the visualizations of the graph with vertices and edges saved as "graphvertex.png" and "graphedges.png" in the output directory.
JHistint.get_random_color
— Methodget_random_color(seed)
Function to return a random 8-bit RGB format color, using a specified seed.
Arguments
seed
: An integer used to initialize the random number generator.
If two calls to the function use the same seed, the same color will be generated.
Return value
The function returns a random 8-bit RGB format color.
SOPHYSM Support Function for Cell Segmentation (segmentationManager.jl)
JHistint.apply_segmentation_SOPHYSM_tessellation
— Methodapply_segmentation_SOPHYSM_tessellation(filepath_input::AbstractString,
filepath_output::AbstractString,
thresholdGray::Float64,
thresholdMarker::Float64,
min_threshold::Float32,
max_threshold::Float32)
The function performs segmentation of a histological slide, saves the segmented image in .png
format, generates the corresponding tessellation, and translates it into an adjacency matrix and build the corresponding dataframe.
Arguments
filepath_input::AbstractString
: The file path to the input image.filepath_output::AbstractString
: The file path where the output files will be saved.thresholdGray::Float64
: The grayscale threshold for image binarization.thresholdMarker::Float64
: The threshold for marker-based segmentation.min_threshold
: Minimal threshold for considering segments area.max_threshold
: Maximal threshold for considering segments area.
Notes
The function uses the watershed segmentation algorithm to segment the image into different groups of pixels. Segmentation is performed using an image feature transformation (feature_transform
) and connected component labeling. The apply_segmentation_SOPHYSM
function reads an .tif
image, applies the SOPHYSM segmentation algorithm, and generates the following outputs:
- Segmented image saved as "_seg.png" in the output directory.
- Dataframe containing label information saved as "dataframelabels.csv"
in the output directory. Also, the dataframe contaning extra information about the segment and label computed by the segmentation algorithm.
- Adjacency matrix saved as ".txt" in the output directory.
- Dataframe containing edge information saved as "dataframeedges.csv"
in the output directory.
- Visualizations of the graph with vertices and edges saved as
"graphvertex.png" and "graphedges.png" in the output directory.
JHistint.apply_segmentation_SOPHYSM_graph
— Methodapply_segmentation_SOPHYSM_graph(filepath_input::AbstractString,
filepath_output::AbstractString,
thresholdGray::Float64,
thresholdMarker::Float64,
min_threshold::Float32,
max_threshold::Float32)
The function performs segmentation of a histological slide, saves the segmented image in .png
format, generates the corresponding graph, and translates it into an adjacency matrix and build the corresponding dataframe.
Arguments
filepath_input::AbstractString
: The file path to the input image.filepath_output::AbstractString
: The file path where the output files will be saved.thresholdGray::Float64
: The grayscale threshold for image binarization.thresholdMarker::Float64
: The threshold for marker-based segmentation.min_threshold
: Minimal threshold for considering segments area.max_threshold
: Maximal threshold for considering segments area.
Notes
The function uses the watershed segmentation algorithm to segment the image into different groups of pixels. Segmentation is performed using an image feature transformation (feature_transform
) and connected component labeling. The apply_segmentation_SOPHYSM
function reads an .tif
image, applies the SOPHYSM segmentation algorithm, and generates the following outputs:
- Segmented image saved as "_seg.png" in the output directory.
- Dataframe containing label information saved as "dataframelabels.csv"
in the output directory. Also, the dataframe contaning extra information about the segment and label computed by the segmentation algorithm.
- Adjacency matrix saved as ".txt" in the output directory.
- Dataframe containing edge information saved as "dataframeedges.csv"
in the output directory.
- Visualizations of the graph with vertices and edges saved as
"graphvertex.png" and "graphedges.png" in the output directory.
Support Functions for Graph (graphManager.jl)
JHistint.region_adjacency_graph_JHistint
— Methodregion_adjacency_graph_JHistint(s::SegmentedImage, weight_fn::Function,
min_threshold::Float32, max_threshold::Float32)
Constructs a region adjacency graph (RAG) from the SegmentedImage
. It returns the RAGalong with a Dict(label=>vertex) map and a dataframe containing the information about the label. weight_fn
is used to assign weights to the edges.
Arguments :
s::SegmentedImage
: The input segmented image containing regions.weight_fn::Function
: A function that calculates the weight between
two adjacent regions. The function should accept two region labels as arguments and return a numeric value representing the weight.
Return value :
G::SimpleWeightedGraph
: The adjacency graph between regions with weights
on the edges.
vert_map::Dict{Int, Int}
: A dictionary that maps region labels to nodes
in the graph.
df_label::DataFrame
: A DataFrame containing information about regions,
including their identifiers, positions, colors, and areas.
Notes:
weight_fn(label1, label2): Returns a real number corresponding to the weight of the edge between label1 and label2.
JHistint.weighted_graph_to_adjacency_matrix
— Methodweighted_graph_to_adjacency_matrix(G::SimpleWeightedGraph{Int64, Float64}, n::Int64)
Converts a weighted graph represented as a SimpleWeightedGraph
into an unweighted boolean adjacency matrix.
Arguments:
G::SimpleWeightedGraph{Int64, Float64}
: Weighted graph represented as a
SimpleWeightedGraph
with integer vertex labels and floating-point edge weights.
n::Int64
: Number of nodes in the adjacency matrix.
Return value:
adjacency_matrix
:Matrix{Int64}
boolean adjacency matrix.
Notes:
The function returns an n
x n
adjacency matrix representing the unweighted graph. If nodes i
and j
are adjacent,the adjacency matrix will contain a value of 1
at position (i,j)
and (j,i)
. Otherwise, the adjacency matrix will contain a value of 0
.
JHistint.weighted_graph_to_adjacency_matrix_weight
— Methodweighted_graph_to_adjacency_matrix_weight(G::SimpleWeightedGraph{Int64, Float64}, n::Int64)
Converts a weighted graph represented as a SimpleWeightedGraph
into an weighted adjacency matrix.
Arguments:
G::SimpleWeightedGraph{Int64, Float64}
: Weighted graph represented as a
SimpleWeightedGraph
with integer vertex labels and floating-point edge weights.
n::Int64
: Number of nodes in the adjacency matrix.
Return value:
adjacency_matrix
:Matrix{Float32}
boolean adjacency matrix.
Notes:
The function returns an n
x n
adjacency matrix representing the weighted graph. If nodes i
and j
are adjacent,the adjacency matrix will contain a value associated with the edge weight at position (i,j)
and (j,i)
. Otherwise, the adjacency matrix will contain a value of -1
.
JHistint.build_dataframe_as_edgelist
— Methodbuild_dataframe_as_edgelist(mat::Matrix{Int64}, label_list::Vector{Int64})
Builds a DataFrame representing an edge list from an input adjacency matrix mat
and a list of labels label_list
.
Arguments:
mat::Matrix{Int64}
: The input adjacency matrix where elements represent
connections between nodes.
label_list::Vector{Int64}
: A list of node labels to consider when
constructing the edge list.
Return value:
df::DataFrame
: A DataFrame representing the edges in the graph,
with columns 'origin', 'destination', and 'weight' indicating the source node, target node, and edge weight, respectively.
Notes:
The build_dataframe_as_edgelist
function constructs a DataFrame that represents the edges in a graph based on the adjacency matrix mat
. It iterates through the upper triangular part of the matrix and adds edges to the DataFrame for non-zero values while considering only nodes with labels present in label_list
.
JHistint.build_dataframe_as_edgelist
— Methodbuild_dataframe_as_edgelist(mat::Matrix{Float32}, label_list::Vector{Int64})
Builds a DataFrame representing an edge list from an input adjacency matrix mat
and a list of labels label_list
.
Arguments:
mat::Matrix{Float32}
: The input adjacency matrix where elements represent
connections between nodes. In this case, the matrix has Float value.
label_list::Vector{Int64}
: A list of node labels to consider when
constructing the edge list.
Return value:
df::DataFrame
: A DataFrame representing the edges in the graph,
with columns 'origin', 'destination', and 'weight' indicating the source node, target node, and edge weight, respectively.
Notes:
The build_dataframe_as_edgelist
function constructs a DataFrame that represents the edges in a graph based on the adjacency matrix mat
. It iterates through the upper triangular part of the matrix and adds edges to the DataFrame for values different from -1
while considering only nodes with labels present in label_list
.
JHistint.save_adjacency_matrix
— Methodsave_adjacency_matrix(matrix::Matrix{Int64}, filepath_matrix::AbstractString)
Function to save an adjacency matrix represented as an integer matrix to a text file.
Arguments:
matrix::Matrix{Int64}
: The integer matrix representing the adjacency matrix.filepath_matrix::AbstractString
: The file path represented as a string
indicating where to save the matrix.
Notes
The function opens the file specified by the filepath_matrix
path in write mode and writes the matrix in adjacency matrix format, where each row represents the adjacent nodes of a node. The numbers in the matrix are separated by spaces.
JHistint.save_adjacency_matrix
— Methodsave_adjacency_matrix(matrix::Matrix{Float32}, filepath_matrix::AbstractString)
Function to save an adjacency matrix represented as a float matrix to a text file.
Arguments:
matrix::Matrix{Float32}
: The float matrix representing the adjacency matrix.filepath_matrix::AbstractString
: The file path represented as a string
indicating where to save the matrix.
Notes
The function opens the file specified by the filepath_matrix
path in write mode and writes the matrix in adjacency matrix format, where each row represents the adjacent nodes of a node. The numbers in the matrix are separated by spaces.
JHistint.extract_vertex_position
— Methodextract_vertex_position(G::MetaGraph)
Extracts the vertex positions from a MetaGraph G
.
Arguments:
G::MetaGraph
: The input MetaGraph containing vertices with
associated positions.
Return value:
position_array::Vector{Luxor.Point}
: An array containingLuxor.Points
representing the positions of the vertices in G
.
JHistint.extract_vertex_color
— Methodextract_vertex_color(G::MetaGraph)
Extracts the vertex colors from a MetaGraph G
.
Arguments:
G::MetaGraph
: The input MetaGraph containing vertices with associated colors.
Return value:
color_array::Vector{Any}
: An array containing the extracted color
information associated with the vertices in G
.
Support Functions for Tessellation (tessellationManager.jl)
JHistint.build_df_label
— Methodbuild_df_label(s::SegmentedImage, min_threshold::Float32, max_threshold::Float32)
Builds DataFrames containing information about regions in a segmented image, including labels, positions, colors, and areas, and separates them into noisy, filtered, and total regions based on pixel count.
Arguments :
s::SegmentedImage
: The segmented image containing regions.min_threshold
: Minimal threshold for considering segments area.max_threshold
: Maximal threshold for considering segments area.
Return value :
df_label::DataFrame
: A DataFrame containing information about filtered regions
, regions associated with cell or nuclei.
df_noisy_label::DataFrame
: A DataFrame containing information about noisy regions.df_total_label::DataFrame
: A DataFrame containing information about all regions.
JHistint.add_column_is_cell
— Methodadd_column_is_cell(df_labels::DataFrame, df_noisy_labels::DataFrame, df_total_labels::DataFrame)
Adds a boolean column 'is_cell' to the total labels DataFrame, indicating whether each region is a cell or not.
Arguments:
df_labels::DataFrame
: A DataFrame containing information about filtered
regions (default : areas > 3000 pixels).
df_noisy_labels::DataFrame
: A DataFrame containing information about
noisy regions (default : areas between 300 and 3000 pixels).
df_total_labels::DataFrame
: A DataFrame containing information about
all regions (default : areas > 300 pixels).
Return value:
df_total_labels::DataFrame
: Thedf_total_labels
DataFrame with
the additional is_cell
column.
Notes:
JHistint.build_dataframe_edges_from_grid
— Methodbuild_dataframe_edges_from_grid(edge_list::Vector{Any}, df_total_labels::DataFrame)
Builds a DataFrame containing edge information from a given list of grid-based edges and a DataFrame of total region labels.
Arguments:
edge_list::Vector{Any}
: A list of grid-based edges represented as pairs
of indices.
df_total_labels::DataFrame
: A DataFrame containing information about all
regions, including labels and areas.
Return value:
df::DataFrame
: A DataFrame containing information about edges,
including origin, destination, and edge weight.
JHistint.build_graph_from_tessellation
— Methodbuild_graph_from_tessellation(df_labels::DataFrame,
df_noisy_labels::DataFrame,
df_total_labels::DataFrame,
w::Int64, h::Int64,
filepath_total_tess::AbstractString,
filepath_cell_tess::AbstractString)
Builds a graph rapresentation of the segmented image based on Voronoi tessellations and saves visualizations.
Arguments:
df_labels::DataFrame
: A DataFrame containing information about
nuclei and centroids.
df_noisy_labels::DataFrame
: A DataFrame containing information
about noisy nuclei centroids.
df_total_labels::DataFrame
: A DataFrame containing information
about all centroids.
w::Int64
: Width of the tessellation region.h::Int64
: Height of the tessellation region.filepath_total_tess::AbstractString
: The file path to save the
visualization of the total tessellation.
filepath_cell_tess::AbstractString
: The file path to save the
visualization of the cell-based tessellation.
Return value:
df_edges::DataFrame
: A DataFrame containing edge information between regions.edges::Vector{Any}
: A vector of pairs representing the connected
regions based on Voronoi edges.
Notes:
The build_graph_from_tessellation
function performs the following steps:
- Extracts centroid positions from the provided DataFrames.
- Performs Voronoi tessellations for both the total and cell-based centroids.
- Plots the tessellations, including centroids and labels.
- Saves the visualizations to the specified file paths.
- Constructs an edge DataFrame based on Voronoi edges.
JHistint.tess_dataframe_to_adjacency_matrix_weight
— Methodtess_dataframe_to_adjacency_matrix_weight(df_total_labels::DataFrame, df_edges::DataFrame, edges::Vector{Any})
Converts a DataFrame representation of edges and region labels into an adjacency matrix with weighted edges.
Arguments:
df_total_labels::DataFrame
: A DataFrame containing information
about all regions, including labels.
df_edges::DataFrame
: A DataFrame containing edge information,
including origin, destination, and edge weight.
edges::Vector{Any}
: A vector of pairs representing the connected regions.
Return value:
adjacency_matrix::Matrix{Int}
: An adjacency matrix representing the
connectivity of regions with weighted edges.
Support Functions for Noise Reduction (noiseManager.jl)
JHistint.compute_centroid_total_cells
— Methodcompute_centroid_total_cells(s::SegmentedImage,
df_label::DataFrame,
min_threshold::Float32)
Computes the centroids of total cells within regions in a segmented image s
and associates them with labels in the provided DataFrame df_label
.
Arguments:
s::SegmentedImage
: The segmented image containing regions.df_label::DataFrame
: A DataFrame with information about the regions,
including labels and other attributes.
min_threshold
: Minimal threshold for considering segments area.
Return value:
df_label::DataFrame
: The input DataFramedf_label
with an additional
centroid
column containing the computed centroids of total cells.
Notes:
The compute_centroid_total_cells
function iterates through the pixels in the segmented image s
to identify total cells within regions. It calculates the centroids of these total cells and associates them with their corresponding labels in the df_label
DataFrame. The function marks pixels as visited to avoid redundant calculations and applies a manual threshold to exclude noise by considering only regions with pixel counts greater than the specified threshold (default = 300 pixels).
JHistint.compute_centroid_cells
— Methodcompute_centroid_cells(s::SegmentedImage,
df_label::DataFrame,
max_threshold::Float32)
Computes the centroids of only cells associated to nuclei within regions in a segmented image s
and associates them with labels in the provided DataFrame df_label
.
Arguments:
s::SegmentedImage
: The segmented image containing regions.df_label::DataFrame
: A DataFrame with information about the regions,
including labels and other attributes.
max_threshold
: Maximal threshold for considering segments area.
Return value:
df_label::DataFrame
: The input DataFramedf_label
with an additional
centroid
column containing the computed centroids of cells.
JHistint.compute_centroid_noisy_cells
— Methodcompute_centroid_noisy_cells(s::SegmentedImage,
df_label::DataFrame,
min_threshold::Float32,
max_threshold::Float32)
Computes the centroids of extra or noisy within regions in a segmented image s
and associates them with labels in the provided DataFrame df_label
.
Arguments:
s::SegmentedImage
: The segmented image containing regions.df_label::DataFrame
: A DataFrame with information about the regions,
including labels and other attributes.
min_threshold
: Minimal threshold for considering segments area.max_threshold
: Maximal threshold for considering segments area.
Return value:
df_label::DataFrame
: The input DataFramedf_label
with an additional
centroid
column containing the computed centroids of extra cells.
JHistint.filter_dataframe_cells
— Methodfilter_dataframe_cells(df_label::DataFrame, max_threshold::Float32)
Filters a DataFrame containing information about regions to retain only cells with areas greater than a specified threshold.
Arguments:
df_label::DataFrame
: The input DataFrame containing region information.max_threshold
: Maximal threshold for considering segments area.
Return value:
df_filtered::DataFrame
: A filtered DataFrame containing information
about the retained cells, including labels, positions, colors, areas, and centroids.
Notes:
The filter_dataframe_cells
function takes a DataFrame df_label
as input, which should contain information about regions, including labels, positions, colors, areas, and centroids. It filters this DataFrame to retain only those regions (cells) with areas greater a specified threshold (default = 3000 pixels).
JHistint.filter_dataframe_extras
— Methodfilter_dataframe_extras(df_label::DataFrame,
min_threshold::Float32,
max_threshold::Float32)
Filters a DataFrame containing information about regions to retain only extra elements (not cells) with areas between two specified thresholds. In the default case these are 300 and 3000 pixels.
Arguments:
df_label::DataFrame
: The input DataFrame containing region information.min_threshold
: Minimal threshold for considering segments area.max_threshold
: Maximal threshold for considering segments area.
Return value:
df_filtered::DataFrame
: A filtered DataFrame containing information
about the retained extra elements, including labels, positions, colors, areas, and centroids.
Notes:
The filter_dataframe_extras
function takes a DataFrame df_label
as input, which should contain information about regions, including labels, positions, colors, areas, and centroids. It filters this DataFrame to retain only those regions that are considered "extras" (not cells) and have areas between two specified thresholds.
Support Functions for DataBase (dbManager.jl)
JHistint.insert_record_DB
— Methodinsert_record_DB(col_name::AbstractString,
cas_name::AbstractString,
tcga_case_id::AbstractString,
sin_cas_name::AbstractString,
tcga_slide_id::AbstractString,
link_slide::AbstractString,
filepath_zip::AbstractString,
filepath_svs::AbstractString)
Function for storing in the JHistint_DB
database the information associated with each slide downloaded from the Cancer Digital Slide Archive (CDSA).
Argomenti
col_name::AbstractString
= Collection name.cas_name::AbstractString
= Case name, which corresponds to theCASE-NAME
displayed by the package.
tcga_case_id::AbstractString
= ID used by TCGA to identify the case.sin_cas_name::AbstractString
= Name of the individual slide, which
corresponds to the SLIDE-ID
displayed by the package.
tcga_slide_id::AbstractString
= ID used by TCGA to identify the slide.link_slide::AbstractString
= Link to the APIs for downloading the slide.filepath_zip::AbstractString
= Path where the.zip
file is stored.filepath_svs::AbstractString
= Path where the.tif
file is stored.
Notes
The JHistint_DB
database is used for storing the information associated with each slide downloaded from the CDSA. The function takes a dictionary containing the information associated with the slide and stores it in the database. Data available in the JHistint_DB
database for each slide:
collection_name TEXT
= Name of the collection.case_name TEXT
= Name of the case.TCGA_caseID TEXT
= ID used by TCGA to identify the case.slide_ID TEXT
= Name of the individual slide case.TCGA_slideID TEXT UNIQUE
= ID used by TCGA to identify the slide,UNIQUE
prevents duplicates from being generated.
slide_path_folder_zip TEXT
= Path where the.zip
file is stored.slide_path_folder_svs TEXT
= Path where the.tif
file is stored.slide_path_api TEXT
= Link to the API for downloading the slide.slide_path_folder_seg TEXT
= Path where the segmented.tif
file is stored.slide_svs BLOB
= Histopathological slide (image).slide_info_TSS TEXT
= Slide information - Tissue Source Site.slide_info_participant_code TEXT
= Slide information - Participant Code,
alphanumeric string.
slide_info_sample_type TEXT
= Slide information - Sample Type. The values
associated with tumor samples are in the range 01-09. 10-19 indicates the range for non-diseased normal samples. 20-29 indicates samples currently under control.
slide_info_vial TEXT
= Slide information - Vial. Related to the ordering of
the sample in the sequence of samples. The values range from A-Z.
slide_info_portion TEXT
= Slide information - Portion. Related to the
ordering of the analyzed portions associated with a sample. Takes values in the range 01-99.
slide_info_type TEXT
= Slide information - Image Type. The possible
values are TS (Top Slide), BS (Bottom Slide), and MS (Middle Slide). The alphanumeric value indicates the ordering of the slide.
slide_path_folder_matrix TEXT
= Path where the adjacency matrix.txt
file is stored.
matrix_data BLOB
= Adjacency matrix.
JHistint.query_extract_slide_svs
— Methodquery_extract_slide_svs(collection_name::AbstractString)
The function queries the JHistint_DB
and extracts the list of slides associated with the collection name provided as an argument.
Arguments
collection_name::AbstractString
: Name of the slide collection to search
for in the JHistint_DB
.
Return value
slide_list
: List of tuples, each of which contains the ID of the slide,
the .svs
file of the slide, and the path of the folder containing the .svs
file.
JHistint.load_seg_slide
— Methodload_seg_slide(filepath_seg::AbstractString, filepath_matrix::AbstractString, matrix::Matrix{Int64}, slide_id::AbstractString)
The function updates the JHistint_DB
with the path of the segmented image file, the path of the adjacency matrix file in text format, and the matrix itself.
Arguments
filepath_seg::AbstractString
: Path of the segmented
image file to add to the DB.
filepath_matrix::AbstractString
: Path of the adjacency matrix file.matrix::Matrix{Int64}
: Adjacency matrix.slide_id::AbstractString
: ID of the slide to update with
the segmented image information.
SOPHYSM Support Functions for DataBase (dbManager.jl)
JHistint.insert_record_DB_SOPHYSM
— Methodinsert_record_DB_SOPHYSM(col_name::AbstractString,
cas_name::AbstractString,
tcga_case_id::AbstractString,
sin_cas_name::AbstractString,
tcga_slide_id::AbstractString,
link_slide::AbstractString,
filepath_zip::AbstractString,
filepath_svs::AbstractString,
path_download_db::AbstractString)
Function for storing in the JHistint_DB
database the information associated with each slide downloaded from the Cancer Digital Slide Archive (CDSA). The function is different from the standard model. It stores only informations of the downloaded Slides.
Argomenti
col_name::AbstractString
= Collection name.cas_name::AbstractString
= Case name, which corresponds to theCASE-NAME
displayed by the package.
tcga_case_id::AbstractString
= ID used by TCGA to identify the case.sin_cas_name::AbstractString
= Name of the individual slide, which
corresponds to the SLIDE-ID
displayed by the package.
tcga_slide_id::AbstractString
= ID used by TCGA to identify the slide.link_slide::AbstractString
= Link to the APIs for downloading the slide.filepath_zip::AbstractString
= Path where the.zip
file is stored.filepath_svs::AbstractString
= Path where the.tif
file is stored.path_download_db::AbstractString
= Path where the DB file is stored.
Notes
The JHistint_DB
database is used for storing the information associated with each slide downloaded from the CDSA. The function takes a dictionary containing the information associated with the slide and stores it in the database. Data available in the JHistint_DB
database for each slide:
collection_name TEXT
= Name of the collection.case_name TEXT
= Name of the case.TCGA_caseID TEXT
= ID used by TCGA to identify the case.slide_ID TEXT
= Name of the individual slide case.TCGA_slideID TEXT UNIQUE
= ID used by TCGA to identify the slide,
UNIQUE
prevents duplicates from being generated.
slide_path_folder_zip TEXT
= Path where the.zip
file is stored.slide_path_folder_svs TEXT
= Path where the.tif
file is stored.slide_path_api TEXT
= Link to the API for downloading the slide.slide_path_folder_seg TEXT
= Path where the segmented.tif
file is stored.slide_svs BLOB
= Histopathological slide (image).slide_info_TSS TEXT
= Slide information - Tissue Source Site.slide_info_participant_code TEXT
= Slide information - Participant Code,
alphanumeric string.
slide_info_sample_type TEXT
= Slide information - Sample Type.
The values associated with tumor samples are in the range 01-09. 10-19 indicates the range for non-diseased normal samples. 20-29 indicates samples currently under control.
slide_info_vial TEXT
= Slide information - Vial. Related to the ordering
of the sample in the sequence of samples. The values range from A-Z.
slide_info_portion TEXT
= Slide information - Portion. Related to the
ordering of the analyzed portions associated with a sample. Takes values in the range 01-99.
slide_info_type TEXT
= Slide information - Image Type. The possible
values are TS (Top Slide), BS (Bottom Slide), and MS (Middle Slide). The alphanumeric value indicates the ordering of the slide.
slide_path_folder_matrix TEXT
= Path where the adjacency matrix
.txt
file is stored.
matrix_data BLOB
= Adjacency matrix.
API Support Functions (apiManager.jl)
JHistint.download_collection_values
— Methoddownload_collection_values(filepath::AbstractString)
Function for downloading data from collections available in TCGA.
Arguments
filepath::AbstractString
= Path where to save the obtained.json
file
from the API available in CDSA.
Notes
The API requires the definition of parentType
and parentId
. parentId
specifies the identifier of the collection. The collection of images associated with TCGA is identified by the code: 5b9ef8e3e62914002e454c39
. The use of limit=0
sets the absence of limits in the queried file, ensuring the complete download of the file. The API belongs to the category for managing the folders stored in the repository. The downloaded file is .json
.
JHistint.extract_collection_values
— Methodextract_collection_values(filepath::AbstractString)
Function to extract the values of data collections from the .json
file downloaded by the download_collection_values
function.
Arguments
filepath::AbstractString
= Path where thecollectionlist.json
file is stored.
Return value
collection_values::Array{String}
= List of data collections
available in TCGA.
JHistint.download_project_infos
— Methoddownload_project_infos(filepath::AbstractString, collection_name::AbstractString)
Function to download metadata associated with the selected collection at startup.
Arguments
filepath::AbstractString
= Path to save the.json
file associated with
the collection. The file is indicated with the wording collection_name.json
.
collection_name::AbstractString
= Name of the collection from which to
download the slides.
Notes
The API requires the definition of parentType
, parentId
, and name
. The name
attribute identifies the name of the collection from which you want to retrieve data (e.g., chol, esca, etc.). The API belongs to the category for managing the folders stored in the repository. The downloaded file is .json
.
JHistint.extract_project_id
— Methodextract_project_id(filepath::AbstractString)
Function to extract the id
value from the metadata of the collection selected at startup.
Arguments
filepath::AbstractString
= Path where thecollection_name.json
file
is stored.
Return value
project_id
=id
of the collection.
JHistint.getCasesForProject
— MethodgetCasesForProject(filepath_case::AbstractString, project_id::AbstractString)
Function to download metadata associated with the cases of the selected collection at startup.
Arguments
filepath::AbstractString
= Path where to save the.json
file
associated with the cases of the collection. The file is indicated with the term collection_name.json
.
project_id::AbstractString
=id
of the collection.
Return values
casesID_values
= List ofid
of all the cases in the collection.casesNAME_values
= List ofname
of all the cases in the collection.
Notes
The API requires the definition of parentType
and parentId
. The parentType
attribute is set to folder
given the structure of the repository. The parentId
is set by defining the identifier of the chosen collection. The downloaded file is .json
.
JHistint.download_zip
— Methoddownload_zip(link::AbstractString, filepath::AbstractString)
Function for downloading histological slides in .zip
format associated with the cases of the selected collection at startup.
Arguments
link::AbstractString
= URL to access the API for slide download.filepath::AbstractString
= Path to save the.zip
file.
ZIP Support Functions (zipManager.jl)
JHistint.extract_slide
— Methodextract_slide(filepath_zip::AbstractString)
Function to extract the contents of .zip
files downloaded from CDSA.
Arguments
filepath_zip::AbstractString
= Path where the.zip
file for
the individual case is saved.