The CytobankAPI package is designed to make interacting with Cytobank API endpoints easy via R. This document is an accompanying overview of the package to learn concepts and see basic examples. View the Cytobank API Endpoint Documentation for a comprehensive list of API endpoints for Cytobank.

Within the CytobankAPI package, there are endpoints to interact with advanced analyses via R. This documentation is an overview of the different ways to utilize advanced analyses. To find more general documentation on using the CytobankAPI package, view the Cytobank quickstart guide.

All advanced analyses are encapsulated within an object. This guide will be an overview of advanced analyses object structures:

Advanced Analyses Objects: What are advanced analyses objects
Interactions: How to interact with the advanced analyses objects

1 Advanced Analyses Objects

1.1 Representation

Every advanced analysis is represented as an object. Creating a new advanced analysis will return an object that is passed to all of their other respective advanced analysis endpoints.

Important information to note:

Each advanced analysis object returned can be edited directly
Each advanced analysis object is a representation, but does not necessarily mean it is the same as what is seen in the GUI. In order to get the most current settings, utilize the show endpoint, or update the advanced analysis with the current settings using the update endpoint
When the update method is called on the advanced analysis object, the existing record for the run on the Cytobank is overwritten according to the settings in the object

viSNE_analysis <- visne.show(cyto_session, experiment_id=22, visne_id=214)
viSNE_analysis@name
#>  [1] "My viSNE analysis example"

# Update the viSNE analysis object name directly
viSNE_analysis@name <- "My updated viSNE analysis name"
# Update the viSNE analysis using the 'visne.update' endpoint
updated_viSNE <- visne.update(cyto_session, viSNE_analysis)
updated_viSNE@name
#>  [1] "My updated viSNE analysis name"

1.2 Common features

There are common features for all advanced analyses:

Name: The name of the advanced analysis
Compensation ID: The compensation ID used for the advanced analysis
Channels: The channels being analyzed within the algorithm (clustering or for general analysis)
Source Experiment: The experiment the advanced analysis belongs to (all advanced analyses belong to an experiment)
Status: The state of the advanced analysis (new, running, done, canceled, etc.)
Available FCS files, channels, and populations: Available data that is useful for the advanced analysis (this is retrieved by the fcs_files.list, panels.list, and populations.list endpoints)

1.3 Unique features for each advanced analysis method

There are special settings that pertain to each advanced analysis algorithm. These settings affect how the advanced analysis algorithm is ran. For each advanced analysis, you can view their respective settings and slots as shown below.

CITRUS_object <- citrus.new(cyto_session, experiment_id, citrus_name="My new Cytobank CITRUS analysis")

slotNames(CITRUS_object)
#>  [1] "citrus_id"                "population_id"           
#>  [3] "file_grouping"            "association_models"      
#>  [5] "cluster_characterization" "statistics_channels"     
#>  [7] "event_sampling_method"    "events_per_file"         
#>  [9] "minimum_cluster_size"     "cross_validation_folds"  
#> [11] "false_discovery_rate"     "normalize_scales"        
#> [13] "plot_theme"               "attachment_id"           
#> [15] "channels"                 "compensation_id"         
#> [17] "name"                     "source_experiment"       
#> [19] "status"                   ".available_channels"     
#> [21] ".available_files"         ".available_populations"

Learn more about CITRUS settings.

FlowSOM_object <- flowsom.new(cyto_session, experiment_id, flowsom_name="My new Cytobank FlowSOM analysis")

slotNames(FlowSOM_object)
#>  [1] "author"                                 
#>  [2] "type"                                   
#>  [3] "flowsom_id"                             
#>  [4] "selected_population_name"               
#>  [5] "population_id"                          
#>  [6] "num_fcs_files"                          
#>  [7] "fcs_files"                              
#>  [8] "event_sampling_method"                  
#>  [9] "desired_events_per_file"                
#> [10] "desired_total_events"                   
#> [11] "sampled_event_total"                    
#> [12] "num_events_to_actually_sample"          
#> [13] "random_seed"                            
#> [14] "som_creation_method"                    
#> [15] "clustering_method"                      
#> [16] "expected_metaclusters"                  
#> [17] "expected_clusters"                      
#> [18] "iterations"                             
#> [19] "normalize_scales"                       
#> [20] "created_experiment"                     
#> [21] "attachment_id"                          
#> [22] "auto_seed"                              
#> [23] "external_som_analysis_info"             
#> [24] "external_som_analysis_id"               
#> [25] "channels_to_plot"                       
#> [26] "cluster_size_type"                      
#> [27] "fixed_cluster_size"                     
#> [28] "gate_set_names_to_label"                
#> [29] "max_relative_cluster_size"              
#> [30] "output_file_type"                       
#> [31] "show_background_on_legend"              
#> [32] "show_background_on_channel_colored_msts"
#> [33] "show_background_on_population_pies"     
#> [34] "final_result"                           
#> [35] "completed"                              
#> [36] "canceled"                               
#> [37] "channels"                               
#> [38] "compensation_id"                        
#> [39] "name"                                   
#> [40] "source_experiment"                      
#> [41] "status"                                 
#> [42] ".available_channels"                    
#> [43] ".available_files"                       
#> [44] ".available_populations"

Learn more about FlowSOM settings.

SPADE_object <- spade.new(cyto_session, experiment_id=22, spade_name="My new Cytobank SPADE analysis")

slotNames(SPADE_object)
#>  [1] "created_experiment"         "down_sampled_events_target"
#>  [3] "down_sampled_events_type"   "fold_change_groups"        
#>  [5] "population_id"              "spade_id"                  
#>  [7] "target_number_nodes"        "channels"                  
#>  [9] "compensation_id"            "name"                      
#> [11] "source_experiment"          "status"                    
#> [13] ".available_channels"        ".available_files"          
#> [15] ".available_populations"

Learn more about SPADE settings.

viSNE_object <- visne.new(cyto_session, experiment_id, visne_name="My new Cytobank viSNE analysis")

slotNames(viSNE_object)
#>  [1] "created_experiment"     "iterations"            
#>  [3] "perplexity"             "population_selections" 
#>  [5] "sampling_total_count"   "sampling_target_type"  
#>  [7] "seed"                   "theta"                 
#>  [9] "visne_id"               "channels"              
#> [11] "compensation_id"        "name"                  
#> [13] "source_experiment"      "status"                
#> [15] ".available_channels"    ".available_files"      
#> [17] ".available_populations"

Learn more about viSNE settings.

2 Interacting with advanced analyses objects

See each section below for instructions on how to interact with the object for each advanced analysis.

2.1 CITRUS

2.1.1 Updating general CITRUS settings

Directly update CITRUS settings via their slot names.

The following slots can be updated directly:

population_id
channels
- channels can be set as a list of channel IDs or a list of channel long names (long names must correlate to a unique short channel name)
file_grouping (see the next section for how to update CITRUS file grouping)
compensation_id
association_models
cluster_characterization
event_sampling_method
events_per_file
minimum_cluster_size
cross_validation_folds
false_discovery_rate
normalize_scales
plot_theme

# Set a new plot theme, association models, and compensation
CITRUS_object@plot_theme <- "black"
CITRUS_object@association_models <- c("pamr", "glmnet")
CITRUS_object@compensation_id <- 22

# Bulk update the changes made to the CITRUS object
CITRUS_object <- citrus.update(cyto_session, CITRUS_object)

2.1.2 Updating CITRUS file grouping

The core functionality of CITRUS is establishing biological explanations for why samples between two or more groups differ from each other. CITRUS file grouping is used to categorize different files into these groups. There is 1 important setting to pay attention to:

group_name: The group that each file is associated with
- The minimum number of samples per group is three
  - However, for more robust statistical analysis and to avoid spurious results, at least eight samples are recommended per group (see documentation for more information here)
- There must be at least 2 groups in order to run a CITRUS analysis

Directly update CITRUS file grouping data.

# Set 'file1.fcs' through 'file4.fcs' to 'Group 1' and 'file5.fcs' through 'file8.fcs' to 'Group 2'
CITRUS_object@file_grouping[CITRUS_object@file_grouping$id <= 44856,]$group_name <- "Group 1"
CITRUS_object@file_grouping[is.element(c(44857, 44858, 44859, 44860), CITRUS_object@file_grouping$id),]$group_name <- "Group 2"

View(CITRUS_object@file_grouping)

id	name	group_name
44853	file1.fcs	Group 1
44854	file2.fcs	Group 1
44855	file3.fcs	Group 1
44856	file4.fcs	Group 1
44857	file5.fcs	Group 2
44858	file6.fcs	Group 2
44859	file7.fcs	Group 2
44860	file8.fcs	Group 2
44861	file9.fcs	Unassigned

Learn more about CITRUS file grouping.

2.2 FlowSOM

2.2.1 Updating general FlowSOM settings

Directly update FlowSOM settings via their slot names.

The following slots can be updated directly:

Required settings
- channels
  - channels can be set as a list of channel IDs or a list of channel long names (long names must correlate to a unique short channel name)
- fcs_files
Event sampling settings
- event_sampling_method
- desired_events_per_file
- desired_total_events
Optional basic settings
- clustering_method
- compensation_id
- expected_clusters
- expected_metaclusters
- iterations
- normalize_scales
- population_id
- random_seed
- som_creation_method
- external_som_analysis_id
Optional advanced output settings
- channels_to_plot
- cluster_size_type
- fixed_cluster_size
- gate_set_names_to_label
- max_relative_cluster_size
- output_file_type
- show_background_on_legend
- show_background_on_channel_colored_msts
- show_background_on_population_pies

If the required channels and fcs_files slots are not present, updates will not occur to the FlowSOM analysis.

# Set a clustering method, target number of nodes, and compensation
FlowSOM_object@clustering_method <- "kmeans"
FlowSOM_object@num_expected_clusters <- 144
FlowSOM_object@compensation_id <- 22

# Update FCS file selection to the first 5 files
FlowSOM_object@fcs_files <- FlowSOM_object@.available_files$id[1:4]

# Update channel selection
FlowSOM_object@channels <- list("CD3", "CD4")

# Bulk update the changes made to the FlowSOM object
FlowSOM_object <- flowsom.update(cyto_session, FlowSOM_object)

2.3 SPADE

2.3.1 Updating general SPADE settings

Directly update SPADE settings via their slot names.

The following slots can be updated directly:

population_id
channels
- channels can be set as a list of channel IDs or a list of channel long names (long names must correlate to a unique short channel name)
compensation
target_number_nodes
down_sampled_events_target
fold_change_groups (see the next section for how to update SPADE fold change groups)

# Set a new population, target number of nodes, and compensation
SPADE_object@population_id <- 2
SPADE_object@target_number_nodes <- 150
SPADE_object@compensation_id <- 22

# Update channels
channel_ids_list <- list(2, 3, 5, 8)
SPADE_object@channels <- channel_ids_list

# Update channels by long channel names
channel_names_list <- list("channel1", "channel2", "channel3")
SPADE_object@channels <- channel_names_list

# Bulk update the changes made to the SPADE object
SPADE_object <- spade.update(cyto_session, SPADE_object)

SPADE_object@population_id
#> [1] 2
SPADE_object@target_number_nodes
#> [1] 150
SPADE_object@compensation_id
#> [1] 22
SPADE_object@channels
#> [[1]]
#> [1] "channel1"
#>
#> [[2]]
#> ...

2.3.2 Updating SPADE fold change groups

SPADE fold change groups are used to categorize different files into separate collections that will be compared amongst each other. There are 2 important settings to pay attention to:

group_name: The group a specific file belongs to
baseline: The file(s) used as the baseline in order to calculate fold change

Directly update SPADE fold change groups data.

# Set 'file6.fcs' and 'file7.fcs' as the baseline for 'Group 1'
SPADE_object@fold_change_groups[grep("my_file6|my_file7", 
    SPADE_object@fold_change_groups$name),]$baseline <- TRUE

# Set 'file2.fcs', 'file4.fcs', and 'file8.fcs' as part of 'Group 2', and set 'file2.fcs' as the baseline
SPADE_object@fold_change_groups[grep("file2|file4|file8", 
    SPADE_object@fold_change_groups$name),]$group_name <- "Group 2"
SPADE_object@fold_change_groups[SPADE_object@fold_change_groups$name=="file2.fcs",]$baseline <- TRUE

View(SPADE_object@fold_change_groups)

id	name	baseline	group_name
44853	file1.fcs	FALSE	Group 1
44854	file2.fcs	TRUE	Group 2
44855	file3.fcs	FALSE	Group 1
44856	file4.fcs	FALSE	Group 2
44857	file5.fcs	FALSE	Group 1
44858	file6.fcs	TRUE	Group 1
44859	file7.fcs	TRUE	Group 1
44860	file8.fcs	FALSE	Group 2

Learn more about SPADE fold change groups.

2.4 viSNE

2.4.1 Updating general viSNE settings

Directly update viSNE settings via their slot names.

The following slots can be updated directly:

sampling_target_type
sampling_total_count
channels
- channels can be set as a list of channel IDs or a list of channel long names (long names must correlate to a unique short channel name)
compensation_id
iterations
perplexity
theta
seed

The following slots must be updated via helper functions:

population_selections (see the next section for how to update viSNE population selections)
- visne.helper.set_populations

# Set a new sampling target type, sampling total count, and compensation
viSNE_object@sampling_target_type <- "equal"
viSNE_object@sampling_total_count <- 150000
viSNE_object@compensation_id <- 22

# Bulk update the changes made to the viSNE object
viSNE_object <- visne.update(cyto_session, viSNE_object)

2.4.2 Updating viSNE population selections

Adding viSNE population selections is slightly more difficult because the same file can be used in the analysis in combination with multiple populations. Because of this complexity, the visne.helper.set_populations helper function is used to set files for a selected population.

Parameters for visne.helper.set_populations:

visne: The viSNE object to set populations for
population_id: The gate set ID for the specified population (different than the actual population ID, and can be obtained by looking at the .available_populations slot)
fcs_files: A vector/list of FCS files to set for the population

Set files for a specific population through the visne.helper.set_populations helper function.

Setting files for a specific population will overwrite the files previously set for the population in question.

# Set files for different populations
viSNE_object <- visne.helper.set_populations(viSNE_object, population_id=1, fcs_files=c(44853))
viSNE_object <- visne.helper.set_populations(viSNE_object, population_id=2, fcs_files=c(44867,44868))
viSNE_object <- visne.helper.set_populations(viSNE_object, population_id=4, fcs_files=unlist(visne@.available_files[grep("file4|file5|file6", visne@.available_files$filename),]$id))
# Overwrite 'population_id=2' FCS file selection, note that 'file1.fcs' and 'file2.fcs' are in both 'Population 1', as well as 'Population 2'
viSNE_object <- visne.helper.set_populations(viSNE_object, population_id=2, fcs_files=c(44854,44855, 44853, 44867))

# Update the changes made to viSNE population selections
viSNE_object <- visne.update(cyto_session, viSNE_object)

View(viSNE_object@population_selections)

id	name	samplingCount	eventCount	populationId	populationName
44853	file1.fcs	NA	NA	1	Population 1
44856	file4.fcs	NA	NA	4	Population 4
44857	file5.fcs	NA	NA	4	Population 4
44858	file6.fcs	NA	NA	4	Population 4
44854	file2.fcs	NA	NA	2	Population 2
44855	file3.fcs	NA	NA	2	Population 2
44853	file1.fcs	NA	NA	2	Population 2
44856	file4.fcs	NA	NA	2	Population 2

Learn more about selecting viSNE populations.

CytobankAPI advanced analysis guide

1 Advanced Analyses Objects

1.1 Representation

1.2 Common features

1.3 Unique features for each advanced analysis method

2 Interacting with advanced analyses objects

2.1 CITRUS

2.1.1 Updating general CITRUS settings

2.1.2 Updating CITRUS file grouping

2.2 FlowSOM

2.2.1 Updating general FlowSOM settings

2.3 SPADE

2.3.1 Updating general SPADE settings

2.3.2 Updating SPADE fold change groups

2.4 viSNE

2.4.1 Updating general viSNE settings

2.4.2 Updating viSNE population selections