pathfindR.data for storing pathfindR datavisualize_active_subnetworks() for visualizing graphs of active subnetworkscombine_pathfindR_results() and combined_results_graph() for comparison of 2 pathfindR results and term-gene graph of the combined results, respectivelyget_pin_file() for obtaining organism-specific PIN data (only from BioGRID for now)get_gene_sets_list() for obtaining organism-specific gene sets list from KEGG, Reactome and MSigDBterm_gene_heatmap() to create heatmap visualizations of enriched terms and the involved input genes. Rows are enriched terms and columns are involved input genes. If genes_df is provided, colors of the tiles indicate the change valuesUpSet_plot() to create UpSet plots of enriched termscell_markers_gsets and cell_markers_descriptionsparallel::makeCluster() in run_pathfindR() (#45)download_kegg_png() (#37, @rix133)RA_comparison_output of pathfindR results on another RA-related dataset (GSE84074)visualize_hsa_KEGG(), fixed the issue where >1 entrez ids were returned for a gene symbol (the first one is kept)visualize_hsa_KEGG(), implemented a tryCatch to avoid any issues when KEGGREST::color.pathway.by.objects() might fail (#28)visualize_hsa_KEGG(), now limiting the number of genes passes onto KEGGREST::color.pathway.by.objects() to < 60 (because the KEGG API now limits the number?)term_gene_heatmap() (i.e. when genes_df is not provided) to binary colored heatmap (by default, “green” and “red”, controlled by low and high) by up-/down- regulation statusget_pin_file() and get_gene_sets_list() and fixed a minor issue in the vignette (#46)create_kappa_matrix() when chance is 1, the metric is turned into 0class(.) == * in cluster_graph_vis()max_to_plot to visualize_hsa_KEGG() and to run_pathfindR(). This argument controls the number of pathways to be visualized (default is NULL, i.e. no filter). This was implemented not to slow down the runtime of run_pathfindR() as downloading the png files is slow.enriched_ters.RmdDESCRIPTION was updatedannotate_pathway_DEGs(), calculate_pw_scores(), cluster_pathways(), fuzzy_pw_clustering(), hierarchical_pw_clustering(), visualize_pw_interactions() and visualize_pws() were renamed to annotate_term_DEGs(), score_terms(), cluster_enriched_terms(), fuzzy_term_clustering(), hierarchical_term_clustering(), visualize_term_interactions() and visualize_terms() respectivelyenriched_pathways.Rmd was renamed to enriched_terms.Rmdterm_gene_graph(), which creates a graph of enriched terms - involved genesenrichment() and enrichment_analyses() to get enrichment results fasterfetch_gene_set() for obtaining gene set data more easilymin_gset_size, max_gset_size in fetch_gene_set() and run_pathfindR())gaCrossover during active subnetwork search which controls the probability of a crossover in GA (default = 1, i.e. always perform crossover)testthatcreate_kappa_matrix())mmu_kegg_genes & mmu_kegg_descriptions: mmu KEGG gene sets datamyeloma_input & myeloma_output: example mmu input and output datasig_gene_thr in subnetwork filtering via filterActiveSnws() now serves the threshold proportion of significant genes in the active subnetwork. e.g., if there are 100 significant genes and sig_gene_thr = 0.03, subnetwork that contain at least 3 (100 x 0.03) significant genes will be accepted for further analysispathview dependency by implementing colored pathway diagram visualization function using KEGGREST and KEGGgraphhierarchical_term_clustering(), redefined the distance measure as 1 - kappa statisticcluster_graph_vis() (during the calculations for additional node colors)cluster_graph_vis()active_snw_search(), unnecessary warnings during active subnetwork search were removedenrichment_chart(), supplying fuzzy clustered results no longer raises an errorinput_testing() and input_processing() to ensure that both the initial input data frame and the processed input data frame for active subnetwork search contain at least 2 genes (to fix the corner case encountered in issue #17)enrichment_chart(), ensuring that bubble sizes displayed in the legend (proportional to # of DEGs) are integersenrichment_chart(), added the arguments num_bubbles (default is 4) to control number of bubbles displayed in the legend and even_breaks (default is TRUE) to indicate if even increments of breaks are requiredterm_gene_graph() (create the igraph object as an undirected graph for better auto layout)visualize_term_interactions(). The legend no longer displays “Non-input Active Snw. Genes” if they were not providedhuman_genes in run_pathfindR() and input_processing() was renamed as convert2aliastop_terms to enrichment_chart(), controlling the number top enriched terms to plot (default is 10)run_pathfindR into individual functions: active_snw_search, enrichment_analyses, summarize_enrichment_results, annotate_pathway_DEGs, visualize_pws.pathmap as visualize_hsa_KEGG, updated the function to produce different visualizations for inputs with binary change values (ordered) and no change values (the input_processing function, assigns a change value of 100 to all).visualize_pw_interactions, which creates PNG files visualizing the interactions (in the selected PIN) of genes involved in the given pathways.create_kappa_matrix, hierarchical_pw_clustering, fuzzy_pw_clustering and cluster_pathways.cluster_graph_vis for visualizing graph diagrams of clustering results.score_quan_thr and sig_gene_thr for run_pathfindR were not being utilized.run_pathfindR, added message at the end of run, reporting the number enriched pathways.run_pathfindR now creates a variable org_dir that is the “path/to/original/working/directory”. org_dir is used in multiple functions to return to the original working directory if anything fails. This changes the previous behavior where if a function stopped with an error the directory was changed to “..”, i.e. the parent directory. This change was adapted so that the user is returned to the original working directory if they supply a recursive output folder (output_dir, e.g. “./ALL_RESULTS/RESULT_A”).input_processing, added the argument human_genes to only perform alias symbol conversion when human gene symbols are provided. - Updated the Rmd files used to create the report HTML filesGO-All, all annotations in the GO database (BP+MF+CC)pathfindR - An R Package for Pathway Enrichment Analysis Utilizing Active Subnetworks to reflect the new functionalities.plot_scores, added the argument label_cases to indicate whether or not to label the cases in the pathway scoring heatmap plot. Also added the argument case_control_titles which allows the user to change the default “Case” and “Control” headers. Also added the arguments low and high used to change the low and high end colors of the scoring color gradient.plot_scores, reversed the color gradient to match the coloring scheme used by pathview (i.e. red for positive values, green for negative values)parseActiveSnwSearch, replaced score_thr by score_quan_thr. This was done so that the scoring filter for active subnetworks could be performed based on the distribution of the current active subnetworks and not using a constant empirical score value threshold.parseActiveSnwSearch, increased sig_gene_thr from 2 to 10 as we observed in most of the cases, this resulted in faster runs with comparable results.choose_clusters, added the argument p_val_threshold to be used as p value threshold for filtering the enriched pathways prior to clustering.pathview. ## Minor changes and bug fixeschoose_clusters, added option to use pathway names instead of pathway ids when visualizing the clustering dendrogram and heatmap.run_pathfindR. For this, the gene_sets argument should be set to “Custom” and custom_genes and custom_pathways should be provided.calculate_pw_scores where if there was one DEG, subsetting the experiment matrix failedcalculate_pw_scores. If there is none, the pathway is skipped.calculate_pw_scores, if cases are provided, the pathways are reordered before plotting the heat map and returning the matrix according to their activity in cases. This way, “up” pathways are grouped together, same for “down” pathways.calculate_pwd, if a pathway has perfect overlap with other pathways, change the correlation value with 1 instead of NA.choose_clusters, if result_df has less than 3 pathways, do not perform clustering.run_pathfindR checks whether the output directory (output_dir) already exists and if it exists, now appends “(1)” to output_dir and displays a warning message. This was implemented to prevent writing over existing results.run_pathfindR, recursive creation for the output directory (output_dir) is now supported.run_pathfindR, if no pathways are found, the function returns an empty data frame instead of raising an error.Implemented the (per subject) pathway scoring function calculate_pw_scores and the function to plot the heatmap of pathway scores per subject plot_scores.
Added the auto parameter to choose_clusters. When auto == TRUE (default), the function chooses the optimal number of clusters k automatically, as the value which maximizes the average silhouette width. It then returns a data frame with the cluster assignments and the representative/member statuses of each pathway.
Added the Fold_Enrichment column to the resulting data frame of enrichment, and as a corollary to the resulting data frame of run_pathfindR.
Added the option bubble to plot a bubble chart displaying the enrichment results in run_pathfindR using the helper function enrichment_chart. To plot the bubble chart set bubble = TRUE in run_pathfindR or use enrichment_chart(your_result_df).
Add the parameter silent_option to run_pathfindR. When silent_option == TRUE (default), the console outputs during active subnetwork search are printed to a file named “console_out.txt”. If silent_option == FALSE, the output is printed on the screen. Default was set to TRUE because multiple console outputs are simultaneously printed when running in parallel.
Added the list_active_snw_genes parameter to run_pathfindR. When list_active_snw_genes == TRUE, the function adds the column non_DEG_Active_Snw_Genes, which reports the non-DEG active subnetwork genes for the active subnetwork which was enriched for the given pathway with the lowest p value.
Added the data RA_clustered, which is the example output of the clustering workflow.
In the function, run_pathfindR added the option to specify the argument output_dir which specifies the directory to be created under the current working directory for storing the result HTML files. output_dir is “pathfindR_Results” by default.
run_pathfindR now checks whether the output directory (output_dir) already exists and if it exists, stops and displays an error message. This was implemented to prevent writing over existing results.
genes_table.html now contains a second table displaying the input gene symbols for which there were no interactions in the PIN.
gene_sets option in run_pathfindR to chose between different gene sets. Available gene sets are KEGG, Reactome, BioCarta and Gene Ontology gene sets (GO-BP, GO-CC and GO-MF)cluster_pathways automatically recognizes the ID type and chooses the gene sets accordinglyinput_processinginput_processing, genes for which no interactions are found in the PIN are now removed before active subnetwork searchinput_processingrun_pathfindR returns to the user’s working directory.