The mosaic.find package provides a function (mosaic_find()) designed to find rhythmic and non-rhythmic trends in multi-omics time course data using model selection and joint modeling, a method called MOSAIC (Multi-Omics Selection with Amplitude Independent Criteria). To read more about our work on this project and cite us, see MOSAIC: A Joint Modeling Methodology for Combined Circadian and Non-Circadian Analysis of Multi-Omics Data by H. De los Santos et al. (2020)
Further, for users who prefer an interface more than coding, as well as built-in visualizations, our GitHub repository can be found here. There, you can find a Shiny application for finding trends and automatically visualizing results, with features such as summary visualizations, heat maps, gene expression plots (with or without replicates visualized), and parameter density graphs. It will also have the most up-to-date features. A FAQ for possible user errors can also be found there.
In this vignette, we’ll walk through an example of how to use mosaic.find, and how to choose from the several different built-in methods of preprocessing.
We’ll start by loading our library, which contains the mosaic_find() function. It also has two example datsets: expressions_rna, which contains synthetic RNA expression data; and expressions_pro, which contains synthetic protein expression data. We’ll be using both throughout this vignette. Here we’ll look at the first few rows and columns of our dataset.
## Sample_Name TP_2 TP_2.1 TP_2.2 TP_4
## 1 Sample_1 -0.73939338 -0.77703064 -0.4005528 -0.9975816
## 2 Sample_2 3.75402084 4.16813069 3.8756120 -0.1310355
## 3 Sample_3 2.22714192 2.33298725 2.1916027 2.4385300
## 4 Sample_4 0.51375623 -0.08142932 -0.5781523 -0.1006701
## 5 Sample_5 -0.01933717 -0.26711040 -0.6958891 -0.8056555
## 6 Sample_6 -3.02732977 -3.41256591 -2.9703498 -3.7241991
## Sample_Name TP_2 TP_2.1 TP_2.2 TP_4
## 1 Sample_1 -0.5936787 -3.3199488 -2.142538 -3.0727607
## 2 Sample_2 1.7327705 1.6354914 1.598475 2.0855611
## 3 Sample_3 -0.9830408 -0.9886902 -1.439174 -0.4246837
## 4 Sample_4 -2.9620445 -2.6023412 -2.402504 -2.8248408
## 5 Sample_5 -2.2724832 -2.3953960 -2.115315 -3.1454384
## 6 Sample_6 2.1182184 3.4492056 5.188103 5.8263585
Note the data format: its first column first column has gene labels/names, and all other columns have numerical expression data for RNA and protein, respectively. This expression data is ordered by time point then by replicate, and has evenly spaced time points. Any missing data has cells left blank (NA). Also note that the sample names/labels are the same between our RNA and protein datasets, to indicate corresponding expression data. Any expressions with a name not found in the other dataset will not be reflected in the results. In order to use the mosaic_find() function, data must be in this format.
Now, let’s look at one the expressions, Sample 2. Here we plot RNA in blue and protein in red. each of the replicates in a different color, then plot the difference between them in gray.
library(ggplot2)
# making a function to plot expression, since we're going to use it again
# samp: sample name that we want to visualize
# expressions_rna, expressions_pro: our original RNA and protein expression data frames
# results: data frame of mosaic results
plot_exp <- function(samp, expressions_rna, expressions_pro, results = data.frame()){
# color information
rna_blue_high <- "#268beb"
pro_red_high <- "#d61e1e"
rna_blue_low <- "#9ebfde"
pro_red_low <- "#d49d9d"
# making a function to create automatic palettes for rna and protein vizualizations
rna_col_func <- colorRampPalette(c(rna_blue_low, rna_blue_high))
pro_col_func <- colorRampPalette(c(pro_red_low, pro_red_high))
tp <- seq(2,48,by=2) # our time points
num_reps <- 3 # number of replicates
# set to plot either the mosaic results or the original data
if (nrow(results) == 0){
# we're plotting the original data
dat_rna <- expressions_rna[,-1]
dat_pro <- expressions_pro[,-1]
# logicals for the position of our desired sample in each dataset
gene_log_rna <- expressions_rna$Sample_Name == samp
gene_log_pro <- expressions_pro$Sample_Name == samp
} else {
# last number that is a parameter in the results dataset
end_num <- 33
# we're plotting the processed original data and the fitted data
dat_rna <- results[,(end_num+1):((end_num+1)+((length(tp)*num_reps)-1))]
fit_rna <- results[,(end_num+(length(tp)*num_reps)+1):((end_num+(length(tp)*num_reps)+1)+(length(tp)-1))]
dat_pro <- results[,(end_num+(length(tp)*num_reps)+1+length(tp)):(end_num+(length(tp)*num_reps*2)+1+length(tp)-1)]
fit_pro <- results[,(end_num+(length(tp)*num_reps*2)+1+length(tp)):ncol(results)]
# logicals for the position of our desired sample
gene_log_rna <- gene_log_pro <- results$Gene_Name == samp
}
# our visualization data frame
gg.df <- data.frame(
# maximum over time points
"rna_max" = sapply(seq(1,ncol(dat_rna), by = num_reps), function(x) max(dat_rna[gene_log_rna,x:(x+num_reps-1)], na.rm = T)),
"pro_max" = sapply(seq(1,ncol(dat_pro), by = num_reps), function(x) max(dat_pro[gene_log_pro,x:(x+num_reps-1)], na.rm = T)),
# minimum over time points
"rna_min" = sapply(seq(1,ncol(dat_rna), by = num_reps), function(x) min(dat_rna[gene_log_rna,x:(x+num_reps-1)], na.rm = T)),
"pro_min" = sapply(seq(1,ncol(dat_pro), by = num_reps), function(x) min(dat_pro[gene_log_pro,x:(x+num_reps-1)], na.rm = T)),
"timen" = tp # time points
)
# colors for lines and shading
col_vect <- c(
"Original RNA" = rna_blue_low,
"Original Protein" = pro_red_low
)
# only plot individual replicates if we're only plotting the original data
if (nrow(results) == 0){
# add replicate colors
# add two to provide more variability
rna_rep_col <- rna_col_func(num_reps+2)
pro_rep_col <- pro_col_func(num_reps+2)
# add replicates to df and color scale
for (i in 1:num_reps){
gg.df[,paste0("rna_rep",i)] <- as.numeric(dat_rna[gene_log_rna,seq(i, ncol(dat_rna), by=num_reps)])
gg.df[,paste0("pro_rep",i)] <- as.numeric(dat_pro[gene_log_pro,seq(i, ncol(dat_pro), by=num_reps)])
col_vect[paste0("RNA, Replicate ",i)] <- rna_rep_col[i+1]
col_vect[paste0("Protein, Replicate ",i)] <- pro_rep_col[i+1]
}
} else {
# otherwise, we plot the model fits
gg.df$rna_fit <- as.numeric(fit_rna[gene_log_rna,])
gg.df$pro_fit <- as.numeric(fit_pro[gene_log_pro,])
# add the colors
col_vect["Fit RNA"] <- rna_blue_high
col_vect["Fit Protein"] <- pro_red_high
}
# name breaks for the colors in the legend
col_name_breaks <- c(names(col_vect)[grepl("RNA", names(col_vect), fixed = T)],
names(col_vect)[grepl("Protein", names(col_vect), fixed = T)])
# visualize, with shading for each omics type
p <- ggplot(gg.df, aes(x = timen))+
# setting up scales and labels
scale_fill_manual("", values = col_vect, breaks = col_name_breaks)+
scale_color_manual("", values = col_vect, breaks = col_name_breaks)+
ylab("Expression")+
xlab("Time (Hours)")+
theme_bw()+
theme(
plot.title = element_text(hjust = .5),
legend.position = "bottom",
legend.direction = "horizontal"
)+
scale_x_continuous(expand = c(0, 0))+
# add shading for original rna and protein data
geom_ribbon(aes(ymax = rna_max, ymin = rna_min, fill = "Original RNA"), alpha = .5)+
geom_ribbon(aes(ymax = pro_max, ymin = pro_min, fill = "Original Protein"), alpha = .5)+
ggtitle(samp)+ # title
guides( # colors
color = guide_legend(nrow = 2, byrow = T),
fill = guide_legend(nrow = 2)
)+
NULL
# only plot individual replicates if we're only plotting the original data
if (nrow(results) == 0){
# add each of the replicates
for (i in 1:num_reps){
p <- p +
geom_line(aes_string(y = paste0("rna_rep",i), color = shQuote(paste0("RNA, Replicate ",i))))+
geom_line(aes_string(y = paste0("pro_rep",i), color = shQuote(paste0("Protein, Replicate ",i))))+
NULL
}
} else { # otherwise, plot the fits
p <- p +
geom_line(aes(y = rna_fit, color = "Fit RNA"), size = 1)+
geom_line(aes(y = pro_fit, color = "Fit Protein"), size = 1)+
NULL
}
return(p)
}
suppressWarnings(plot_exp("Sample_2", expressions_rna, expressions_pro)) # to ignore warnings for missing values
We can see that this sample manifests very differently depending on the omics type! In RNA, Sample 2 seems to be a rhythmic gene with damping, while in protein, Sample 2 is clearly linear. This difference in expression makes mosaic_find() the perfect function for our dataset.
So we begin by assigning our parameters and running the mosaic_find() function. In this first run, in addition to looking for trends, we want to look for rhythms between 20 and 28 hours, with no preprocessing, assigning these results to a new dataframe.
# run mosaic_find()
results <- mosaic_find(
rna = expressions_rna, # rna data
pro = expressions_pro, # protein data
begin = 2, # first time point (hours)
end = 48, # last time point (hours)
resol = 2, # resolution (hours)
num_reps = 3, # number of replicates
paired = F, # unpaired, or unrelated, replicates
low = 20, # lower limit when looking for rhythms (hours)
high = 28 # lower limit when looking for rhythms (hours)
)
# look at the results!
head(results[,1:16], 5)
## Gene_Name Best_Model_RNA Best_Model_Protein P_Value_RNA
## 1 Sample_1 Linear Exponential 7.343098e-28
## 2 Sample_10 Exponential Linear 6.159862e-13
## 3 Sample_100 Exponential Exponential 1.280193e-13
## 4 Sample_101 Linear Linear 1.005577e-24
## 5 Sample_102 Linear Linear 2.928965e-24
## BH_Adj_P_Value_RNA P_Value_Protein BH_Adj_P_Value_Protein P_Value_Joint
## 1 2.475202e-27 3.399446e-32 8.545638e-31 NA
## 2 7.347748e-13 2.936618e-20 6.013551e-20 NA
## 3 1.530111e-13 2.839392e-31 2.041703e-30 NA
## 4 1.704368e-24 1.296587e-12 1.568452e-12 NA
## 5 4.673880e-24 2.429743e-03 2.603296e-03 NA
## BH_Adj_P_Value_Joint P_Value_Linear_Slope_RNA
## 1 NA 8.727573e-56
## 2 NA NA
## 3 NA NA
## 4 NA 3.223569e-43
## 5 NA 1.656604e-40
## BH_Adj_P_Value_Linear_Slope_RNA P_Value_Linear_Slope_Protein
## 1 4.403457e-55 NA
## 2 NA 4.840777e-29
## 3 NA NA
## 4 4.835353e-43 1.993804e-15
## 5 2.270162e-40 1.431576e-03
## BH_Adj_P_Value_Linear_Slope_Protein AC_Coefficient_RNA
## 1 NA NA
## 2 1.063316e-28 NA
## 3 NA NA
## 4 2.641277e-15 NA
## 5 1.627943e-03 NA
## Oscillation_Type_RNA Period_RNA
## 1 <NA> NA
## 2 <NA> NA
## 3 <NA> NA
## 4 <NA> NA
## 5 <NA> NA
Looks like we’ve sucessfully run MOSAIC! Now we can see that the results data frame has information about which models fit each of the expressions in our omics types, the parameters for those models, and p-values. You’ll also notice that it reordered our sample names. For more information on the models and parameters available, please see MOSAIC: A Joint Modeling Methodology for Combined Circadian and Non-Circadian Analysis of Multi-Omics Data.
So, first, let’s take a look at the distributions of our models in each of our omics types!
# making a function, since we'll be using this again later
plot_dist <- function(results){
# first, we only want to consider significant expressions
sig_rna <- results$BH_Adj_P_Value_RNA < .05
sig_pro <- results$BH_Adj_P_Value_Protein < .05
# then get the significant models for each omics type
mod_rna <- results$Best_Model_RNA[sig_rna]
mod_pro <- results$Best_Model_Protein[sig_pro]
# form into the data frame for plotting
gg.df <- data.frame(
"Model" = c(mod_rna, mod_pro),
"Omics_Type" = c(rep("RNA", length(mod_rna)), rep("Protein",length(mod_pro)))
)
# colors
col_proper <- c("Linear" = "#b32515", # red
"Exponential" = "#e39f22", # orange
"ECHO" = "#3461eb", # blue
"ECHO Joint" = "#5c7de0", # light blue
"ECHO Linear" = "#7b2bcc", # purple
"ECHO Linear Joint" = "#945fc9" # light purple
)
# make a bar chart of distributions for each model type
ggplot(gg.df, aes(Omics_Type, fill = Model))+
geom_bar()+
theme_bw()+
ylab("Total Significant Expressions")+
xlab("Omics Type")+
theme_bw()+
theme(
plot.title = element_text(hjust = .5),
legend.position = "bottom",
legend.direction = "horizontal"
)+
ggtitle("MOSAIC Model Distributions")+ # title
guides( # colors
fill = guide_legend(nrow = 2)
)+
scale_y_continuous(expand = expand_scale(mult = c(0, .1)))+
scale_fill_manual("Model", values = col_proper)+
NULL
}
# plot our model distributions!
plot_dist(results)
So we can see that each of our omics types has roughly similar distributions, with mostly non-oscillatory models, Linear and Protein, making up most of the distribution, but with roughly 200 of the genes remaining oscillatory.
Let’s next look at how the fit turned out for our initial sample. Here, we add the fitted values to our plot and print the parameters to the console.
# plot our fitted data!
suppressWarnings(plot_exp("Sample_2", expressions_rna, expressions_pro, results)) # to ignore warnings for missing values
## Gene_Name Best_Model_RNA Best_Model_Protein P_Value_RNA
## 112 Sample_2 ECHO Linear 5.751267e-22
## BH_Adj_P_Value_RNA P_Value_Protein BH_Adj_P_Value_Protein
## 112 7.70259e-22 6.170966e-27 3.137779e-26
## P_Value_Joint BH_Adj_P_Value_Joint P_Value_Linear_Slope_RNA
## 112 NA NA NA
## BH_Adj_P_Value_Linear_Slope_RNA P_Value_Linear_Slope_Protein
## 112 NA 6.323223e-51
## BH_Adj_P_Value_Linear_Slope_Protein AC_Coefficient_RNA
## 112 8.125342e-49 0.1305151
## Oscillation_Type_RNA Period_RNA Hours_Shifted_RNA
## 112 Damped 28 25.88371
## Initial_Amplitude_RNA Radian_Frequency_RNA Phase_Shift_RNA
## 112 9.291564 0.2243995 0.4748946
## Growth_Rate_RNA Slope_RNA Equilibrium_Value_RNA AC_Coefficient_Protein
## 112 NA NA -0.9723169 NA
## Oscillation_Type_Protein Period_Protein Hours_Shifted_Protein
## 112 <NA> NA NA
## Initial_Amplitude_Protein Radian_Frequency_Protein Phase_Shift_Protein
## 112 NA NA NA
## Growth_Rate_Protein Slope_Protein Equilibrium_Value_Protein
## 112 NA 0.08934725 1.732718
## Processed_RNA_TP_2 Processed_RNA_TP_2.1 Processed_RNA_TP_2.2
## 112 3.754021 4.168131 3.875612
## Processed_RNA_TP_4 Processed_RNA_TP_4.1 Processed_RNA_TP_4.2
## 112 -0.1310355 NA 0.1837094
## Processed_RNA_TP_6 Processed_RNA_TP_6.1 Processed_RNA_TP_6.2
## 112 -2.880764 -2.269142 -1.774086
## Processed_RNA_TP_8 Processed_RNA_TP_8.1 Processed_RNA_TP_8.2
## 112 -4.724525 -5.125785 -4.780815
## Processed_RNA_TP_10 Processed_RNA_TP_10.1 Processed_RNA_TP_10.2
## 112 -4.919622 -5.271985 -4.453277
## Processed_RNA_TP_12 Processed_RNA_TP_12.1 Processed_RNA_TP_12.2
## 112 -4.984205 -5.637898 -5.89182
## Processed_RNA_TP_14 Processed_RNA_TP_14.1 Processed_RNA_TP_14.2
## 112 -3.756594 -4.147797 -5.199659
## Processed_RNA_TP_16 Processed_RNA_TP_16.1 Processed_RNA_TP_16.2
## 112 -2.600454 -2.85117 -2.860476
## Processed_RNA_TP_18 Processed_RNA_TP_18.1 Processed_RNA_TP_18.2
## 112 -1.882311 -2.357486 -1.938199
## Processed_RNA_TP_20 Processed_RNA_TP_20.1 Processed_RNA_TP_20.2
## 112 0.7918994 -0.5020944 0.001700021
## Processed_RNA_TP_22 Processed_RNA_TP_22.1 Processed_RNA_TP_22.2
## 112 0.1983453 0.4554254 1.209879
## Processed_RNA_TP_24 Processed_RNA_TP_24.1 Processed_RNA_TP_24.2
## 112 0.5684147 0.4036352 0.6004868
## Processed_RNA_TP_26 Processed_RNA_TP_26.1 Processed_RNA_TP_26.2
## 112 0.9273787 0.3436154 0.6093465
## Processed_RNA_TP_28 Processed_RNA_TP_28.1 Processed_RNA_TP_28.2
## 112 1.089579 0.2013894 NA
## Processed_RNA_TP_30 Processed_RNA_TP_30.1 Processed_RNA_TP_30.2
## 112 -0.5642648 -0.4228868 -2.258175
## Processed_RNA_TP_32 Processed_RNA_TP_32.1 Processed_RNA_TP_32.2
## 112 NA -0.6388114 -0.6754961
## Processed_RNA_TP_34 Processed_RNA_TP_34.1 Processed_RNA_TP_34.2
## 112 -2.360276 -2.078823 -1.529084
## Processed_RNA_TP_36 Processed_RNA_TP_36.1 Processed_RNA_TP_36.2
## 112 -1.825594 -2.589809 -2.467512
## Processed_RNA_TP_38 Processed_RNA_TP_38.1 Processed_RNA_TP_38.2
## 112 -2.713209 -2.574499 -2.012953
## Processed_RNA_TP_40 Processed_RNA_TP_40.1 Processed_RNA_TP_40.2
## 112 -1.853889 -1.564534 -0.3381466
## Processed_RNA_TP_42 Processed_RNA_TP_42.1 Processed_RNA_TP_42.2
## 112 -1.704297 -1.922834 -1.72339
## Processed_RNA_TP_44 Processed_RNA_TP_44.1 Processed_RNA_TP_44.2
## 112 -1.686813 -0.2003985 -1.179989
## Processed_RNA_TP_46 Processed_RNA_TP_46.1 Processed_RNA_TP_46.2
## 112 -0.4345448 -1.083279 -1.220444
## Processed_RNA_TP_48 Processed_RNA_TP_48.1 Processed_RNA_TP_48.2
## 112 -0.8149757 -1.000756 -0.3102249
## Fitted_RNA_TP2 Fitted_RNA_TP4 Fitted_RNA_TP6 Fitted_RNA_TP8
## 112 3.943954 0.4376398 -2.529325 -4.52069
## Fitted_RNA_TP10 Fitted_RNA_TP12 Fitted_RNA_TP14 Fitted_RNA_TP16
## 112 -5.384624 -5.217037 -4.286551 -2.944113
## Fitted_RNA_TP18 Fitted_RNA_TP20 Fitted_RNA_TP22 Fitted_RNA_TP24
## 112 -1.537816 -0.3478393 0.4508485 0.7973514
## Fitted_RNA_TP26 Fitted_RNA_TP28 Fitted_RNA_TP30 Fitted_RNA_TP32
## 112 0.730136 0.3569411 -0.181478 -0.7455091
## Fitted_RNA_TP34 Fitted_RNA_TP36 Fitted_RNA_TP38 Fitted_RNA_TP40
## 112 -1.22278 -1.543114 -1.682088 -1.655129
## Fitted_RNA_TP42 Fitted_RNA_TP44 Fitted_RNA_TP46 Fitted_RNA_TP48
## 112 -1.50545 -1.289503 -1.063284 -0.8718625
## Processed_Protein_TP_2 Processed_Protein_TP_2.1
## 112 1.73277 1.635491
## Processed_Protein_TP_2.2 Processed_Protein_TP_4
## 112 1.598475 2.085561
## Processed_Protein_TP_4.1 Processed_Protein_TP_4.2
## 112 1.961745 2.030575
## Processed_Protein_TP_6 Processed_Protein_TP_6.1
## 112 1.902162 2.649547
## Processed_Protein_TP_6.2 Processed_Protein_TP_8
## 112 2.442324 2.880684
## Processed_Protein_TP_8.1 Processed_Protein_TP_8.2
## 112 2.735317 2.508182
## Processed_Protein_TP_10 Processed_Protein_TP_10.1
## 112 2.640153 2.746428
## Processed_Protein_TP_10.2 Processed_Protein_TP_12
## 112 2.709603 2.618462
## Processed_Protein_TP_12.1 Processed_Protein_TP_12.2
## 112 2.660648 2.523894
## Processed_Protein_TP_14 Processed_Protein_TP_14.1
## 112 3.161411 2.803451
## Processed_Protein_TP_14.2 Processed_Protein_TP_16
## 112 3.093063 3.102046
## Processed_Protein_TP_16.1 Processed_Protein_TP_16.2
## 112 3.281512 3.549912
## Processed_Protein_TP_18 Processed_Protein_TP_18.1
## 112 3.153352 3.114335
## Processed_Protein_TP_18.2 Processed_Protein_TP_20
## 112 3.842172 3.373165
## Processed_Protein_TP_20.1 Processed_Protein_TP_20.2
## 112 3.105235 3.600638
## Processed_Protein_TP_22 Processed_Protein_TP_22.1
## 112 3.943721 3.928703
## Processed_Protein_TP_22.2 Processed_Protein_TP_24
## 112 3.467277 3.733599
## Processed_Protein_TP_24.1 Processed_Protein_TP_24.2
## 112 3.926123 3.689913
## Processed_Protein_TP_26 Processed_Protein_TP_26.1
## 112 3.862635 3.905335
## Processed_Protein_TP_26.2 Processed_Protein_TP_28
## 112 4.303546 4.2887
## Processed_Protein_TP_28.1 Processed_Protein_TP_28.2
## 112 4.156734 4.071896
## Processed_Protein_TP_30 Processed_Protein_TP_30.1
## 112 4.830815 4.775326
## Processed_Protein_TP_30.2 Processed_Protein_TP_32
## 112 4.630182 4.579518
## Processed_Protein_TP_32.1 Processed_Protein_TP_32.2
## 112 4.461469 4.384259
## Processed_Protein_TP_34 Processed_Protein_TP_34.1
## 112 5.066425 NA
## Processed_Protein_TP_34.2 Processed_Protein_TP_36
## 112 4.691526 5.381199
## Processed_Protein_TP_36.1 Processed_Protein_TP_36.2
## 112 NA 4.899999
## Processed_Protein_TP_38 Processed_Protein_TP_38.1
## 112 5.072838 4.914001
## Processed_Protein_TP_38.2 Processed_Protein_TP_40
## 112 5.235965 5.370474
## Processed_Protein_TP_40.1 Processed_Protein_TP_40.2
## 112 NA 5.444477
## Processed_Protein_TP_42 Processed_Protein_TP_42.1
## 112 5.625993 5.282117
## Processed_Protein_TP_42.2 Processed_Protein_TP_44
## 112 5.452276 5.408484
## Processed_Protein_TP_44.1 Processed_Protein_TP_44.2
## 112 5.707133 5.731252
## Processed_Protein_TP_46 Processed_Protein_TP_46.1
## 112 5.452682 5.437982
## Processed_Protein_TP_46.2 Processed_Protein_TP_48
## 112 5.971252 6.205994
## Processed_Protein_TP_48.1 Processed_Protein_TP_48.2 Fitted_Protein_TP2
## 112 NA 5.998865 1.911412
## Fitted_Protein_TP4 Fitted_Protein_TP6 Fitted_Protein_TP8
## 112 2.090107 2.268801 2.447496
## Fitted_Protein_TP10 Fitted_Protein_TP12 Fitted_Protein_TP14
## 112 2.62619 2.804885 2.983579
## Fitted_Protein_TP16 Fitted_Protein_TP18 Fitted_Protein_TP20
## 112 3.162274 3.340968 3.519663
## Fitted_Protein_TP22 Fitted_Protein_TP24 Fitted_Protein_TP26
## 112 3.698357 3.877052 4.055746
## Fitted_Protein_TP28 Fitted_Protein_TP30 Fitted_Protein_TP32
## 112 4.234441 4.413135 4.59183
## Fitted_Protein_TP34 Fitted_Protein_TP36 Fitted_Protein_TP38
## 112 4.770524 4.949219 5.127913
## Fitted_Protein_TP40 Fitted_Protein_TP42 Fitted_Protein_TP44
## 112 5.306608 5.485302 5.663997
## Fitted_Protein_TP46 Fitted_Protein_TP48
## 112 5.842691 6.021386
We can see that MOSAIC did a great job with these fits, with the fits matching closely to the trend. Looking at the subset of results corresponding to Sample 2, we can see that MOSAIC correctly designated our RNA data as an ECHO model with damping, and our protein data as a linear model with a positive slope. These fits match pretty closely to the trend, which are emphasized by their very low BH adjusted p-values.
Now let’s see how preprocessing affects the results. Here, we’re going to search for all possible periods, as well as allowing for all our preprocessing options: removing unexpressed genes, normalizing, and smoothing. These preprocessing methods are described in our paper.
# run mosaic_find() with preprocessing
results <- mosaic_find(
rna = expressions_rna, # rna data
pro = expressions_pro, # protein data
begin = 2, # first time point (hours)
end = 48, # last time point (hours)
resol = 2, # resolution (hours)
num_reps = 3, # number of replicates
paired = F, # unpaired, or unrelated, replicates
run_all_per = T, # now we're looking for any period within our time course
rem_unexpr = T, # remove any unexpressed genes (we don't have any, but some time courses might)
rem_unexpr_amt = 70, # if we have less than 70% expression, we want to remove the gene
rem_unexpr_amt_below = 0, # unexpressed genes are those with expression equal to 0
is_normal = T, # normalize data
is_smooth = T # smooth the data
)
# look at the results!
head(results[,1:16], 5)
## Gene_Name Best_Model_RNA Best_Model_Protein P_Value_RNA
## 1 Sample_1 Exponential Exponential 1.178262e-30
## 2 Sample_10 Exponential Linear 1.420713e-21
## 3 Sample_100 Exponential Exponential 7.165064e-20
## 4 Sample_101 Linear Linear 4.737161e-29
## 5 Sample_102 Linear Linear 8.696209e-30
## BH_Adj_P_Value_RNA P_Value_Protein BH_Adj_P_Value_Protein P_Value_Joint
## 1 4.062973e-30 3.004781e-32 5.916211e-31 NA
## 2 1.715146e-21 3.118392e-26 7.060511e-26 NA
## 3 8.563821e-20 2.839392e-31 1.665078e-30 NA
## 4 9.972970e-29 2.622474e-21 3.383838e-21 NA
## 5 2.278483e-29 6.746870e-07 7.164818e-07 NA
## BH_Adj_P_Value_Joint P_Value_Linear_Slope_RNA
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA 1.102292e-56
## 5 NA 5.491577e-60
## BH_Adj_P_Value_Linear_Slope_RNA P_Value_Linear_Slope_Protein
## 1 NA NA
## 2 NA 7.054902e-47
## 3 NA NA
## 4 1.336341e-56 6.836260e-30
## 5 7.419916e-60 1.692101e-07
## BH_Adj_P_Value_Linear_Slope_Protein AC_Coefficient_RNA
## 1 NA NA
## 2 1.492071e-46 NA
## 3 NA NA
## 4 8.115302e-30 NA
## 5 1.842287e-07 NA
## Oscillation_Type_RNA Period_RNA
## 1 <NA> NA
## 2 <NA> NA
## 3 <NA> NA
## 4 <NA> NA
## 5 <NA> NA
Let’s look at how this preprocessing affected our distribution and original sample!
# plot our fitted data!
suppressWarnings(plot_exp("Sample_2", expressions_rna, expressions_pro, results)) # to ignore warnings for missing values
## Gene_Name Best_Model_RNA Best_Model_Protein P_Value_RNA
## 112 Sample_2 ECHO Linear 2.174569e-26
## BH_Adj_P_Value_RNA P_Value_Protein BH_Adj_P_Value_Protein
## 112 3.020234e-26 8.048197e-31 4.273379e-30
## P_Value_Joint BH_Adj_P_Value_Joint P_Value_Linear_Slope_RNA
## 112 NA NA NA
## BH_Adj_P_Value_Linear_Slope_RNA P_Value_Linear_Slope_Protein
## 112 NA 1.894637e-69
## BH_Adj_P_Value_Linear_Slope_Protein AC_Coefficient_RNA
## 112 1.743066e-67 0.1140198
## Oscillation_Type_RNA Period_RNA Hours_Shifted_RNA
## 112 Damped 27.23337 25.4299
## Initial_Amplitude_RNA Radian_Frequency_RNA Phase_Shift_RNA
## 112 3.463709 0.2307164 0.4160901
## Growth_Rate_RNA Slope_RNA Equilibrium_Value_RNA AC_Coefficient_Protein
## 112 NA NA 0.1572618 NA
## Oscillation_Type_Protein Period_Protein Hours_Shifted_Protein
## 112 <NA> NA NA
## Initial_Amplitude_Protein Radian_Frequency_Protein Phase_Shift_Protein
## 112 NA NA NA
## Growth_Rate_Protein Slope_Protein Equilibrium_Value_Protein
## 112 NA 0.07294042 -1.757411
## Processed_RNA_TP_2 Processed_RNA_TP_2.1 Processed_RNA_TP_2.2
## 112 1.886177 2.016075 1.924318
## Processed_RNA_TP_4 Processed_RNA_TP_4.1 Processed_RNA_TP_4.2
## 112 0.8647523 NA 0.9387996
## Processed_RNA_TP_6 Processed_RNA_TP_6.1 Processed_RNA_TP_6.2
## 112 -0.5438456 -0.3999546 -0.283487
## Processed_RNA_TP_8 Processed_RNA_TP_8.1 Processed_RNA_TP_8.2
## 112 -1.25274 -1.347141 -1.265983
## Processed_RNA_TP_10 Processed_RNA_TP_10.1 Processed_RNA_TP_10.2
## 112 -1.674122 -1.75702 -1.564409
## Processed_RNA_TP_12 Processed_RNA_TP_12.1 Processed_RNA_TP_12.2
## 112 -1.556154 -1.709943 -1.769681
## Processed_RNA_TP_14 Processed_RNA_TP_14.1 Processed_RNA_TP_14.2
## 112 -1.152736 -1.244772 -1.492234
## Processed_RNA_TP_16 Processed_RNA_TP_16.1 Processed_RNA_TP_16.2
## 112 -0.663361 -0.7223447 -0.7245342
## Processed_RNA_TP_18 Processed_RNA_TP_18.1 Processed_RNA_TP_18.2
## 112 -0.05284582 -0.1646363 -0.06599417
## Processed_RNA_TP_20 Processed_RNA_TP_20.1 Processed_RNA_TP_20.2
## 112 0.7216151 0.4171885 0.5357118
## Processed_RNA_TP_22 Processed_RNA_TP_22.1 Processed_RNA_TP_22.2
## 112 0.8242308 0.8847117 1.062205
## Processed_RNA_TP_24 Processed_RNA_TP_24.1 Processed_RNA_TP_24.2
## 112 0.9850069 0.9462407 0.9925523
## Processed_RNA_TP_26 Processed_RNA_TP_26.1 Processed_RNA_TP_26.2
## 112 1.060243 0.9229059 0.9854222
## Processed_RNA_TP_28 Processed_RNA_TP_28.1 Processed_RNA_TP_28.2
## 112 0.9072927 0.698336 NA
## Processed_RNA_TP_30 Processed_RNA_TP_30.1 Processed_RNA_TP_30.2
## 112 0.5703559 0.6036167 0.1718444
## Processed_RNA_TP_32 Processed_RNA_TP_32.1 Processed_RNA_TP_32.2
## 112 NA 0.192927 0.1842965
## Processed_RNA_TP_34 Processed_RNA_TP_34.1 Processed_RNA_TP_34.2
## 112 -0.1979852 -0.1317704 -0.002438085
## Processed_RNA_TP_36 Processed_RNA_TP_36.1 Processed_RNA_TP_36.2
## 112 -0.2452876 -0.4250777 -0.3963061
## Processed_RNA_TP_38 Processed_RNA_TP_38.1 Processed_RNA_TP_38.2
## 112 -0.3510112 -0.3183782 -0.1862681
## Processed_RNA_TP_40 Processed_RNA_TP_40.1 Processed_RNA_TP_40.2
## 112 -0.2277257 -0.1596516 0.1288699
## Processed_RNA_TP_42 Processed_RNA_TP_42.1 Processed_RNA_TP_42.2
## 112 0.03596156 -0.01545159 0.03146974
## Processed_RNA_TP_44 Processed_RNA_TP_44.1 Processed_RNA_TP_44.2
## 112 -0.009526636 0.3401691 0.1097093
## Processed_RNA_TP_46 Processed_RNA_TP_46.1 Processed_RNA_TP_46.2
## 112 0.3986219 0.2459998 0.2137303
## Processed_RNA_TP_48 Processed_RNA_TP_48.1 Processed_RNA_TP_48.2
## 112 0.3056779 0.2474021 0.464009
## Fitted_RNA_TP2 Fitted_RNA_TP4 Fitted_RNA_TP6 Fitted_RNA_TP8
## 112 2.132247 0.7908356 -0.4026525 -1.241782
## Fitted_RNA_TP10 Fitted_RNA_TP12 Fitted_RNA_TP14 Fitted_RNA_TP16
## 112 -1.632458 -1.588677 -1.207706 -0.6338249
## Fitted_RNA_TP18 Fitted_RNA_TP20 Fitted_RNA_TP22 Fitted_RNA_TP24
## 112 -0.02013657 0.5035839 0.8518587 0.99142
## Fitted_RNA_TP26 Fitted_RNA_TP28 Fitted_RNA_TP30 Fitted_RNA_TP32
## 112 0.9371592 0.7393549 0.4664895 0.1879617
## Fitted_RNA_TP34 Fitted_RNA_TP36 Fitted_RNA_TP38 Fitted_RNA_TP40
## 112 -0.03985828 -0.1821468 -0.2281368 -0.1883473
## Fitted_RNA_TP42 Fitted_RNA_TP44 Fitted_RNA_TP46 Fitted_RNA_TP48
## 112 -0.08815724 0.0402555 0.1656794 0.2638598
## Processed_Protein_TP_2 Processed_Protein_TP_2.1
## 112 -1.689306 -1.742496
## Processed_Protein_TP_2.2 Processed_Protein_TP_4
## 112 -1.762736 -1.51788
## Processed_Protein_TP_4.1 Processed_Protein_TP_4.2
## 112 -1.568656 -1.540429
## Processed_Protein_TP_6 Processed_Protein_TP_6.1
## 112 -1.439902 -1.133409
## Processed_Protein_TP_6.2 Processed_Protein_TP_8
## 112 -1.218389 -0.9779194
## Processed_Protein_TP_8.1 Processed_Protein_TP_8.2
## 112 -1.037533 -1.130678
## Processed_Protein_TP_10 Processed_Protein_TP_10.1
## 112 -1.019353 -0.9757709
## Processed_Protein_TP_10.2 Processed_Protein_TP_12
## 112 -0.9908724 -0.9443907
## Processed_Protein_TP_12.1 Processed_Protein_TP_12.2
## 112 -0.9270907 -0.983172
## Processed_Protein_TP_14 Processed_Protein_TP_14.1
## 112 -0.6819309 -0.8287257
## Processed_Protein_TP_14.2 Processed_Protein_TP_16
## 112 -0.7099596 -0.6084489
## Processed_Protein_TP_16.1 Processed_Protein_TP_16.2
## 112 -0.5348519 -0.4247845
## Processed_Protein_TP_18 Processed_Protein_TP_18.1
## 112 -0.5296738 -0.5456741
## Processed_Protein_TP_18.2 Processed_Protein_TP_20
## 112 -0.2471974 -0.3413117
## Processed_Protein_TP_20.1 Processed_Protein_TP_20.2
## 112 -0.4511866 -0.2480278
## Processed_Protein_TP_22 Processed_Protein_TP_22.1
## 112 -0.1087616 -0.1149202
## Processed_Protein_TP_22.2 Processed_Protein_TP_24
## 112 -0.3041453 -0.0594274
## Processed_Protein_TP_24.1 Processed_Protein_TP_24.2
## 112 0.01952431 -0.07734237
## Processed_Protein_TP_26 Processed_Protein_TP_26.1
## 112 0.02463812 0.04214881
## Processed_Protein_TP_26.2 Processed_Protein_TP_28
## 112 0.2054501 0.3661904
## Processed_Protein_TP_28.1 Processed_Protein_TP_28.2
## 112 0.3120729 0.2772818
## Processed_Protein_TP_30 Processed_Protein_TP_30.1
## 112 0.5635398 0.5407845
## Processed_Protein_TP_30.2 Processed_Protein_TP_32
## 112 0.4812628 0.6607916
## Processed_Protein_TP_32.1 Processed_Protein_TP_32.2
## 112 0.6123813 0.5807184
## Processed_Protein_TP_34 Processed_Protein_TP_34.1
## 112 0.858675 NA
## Processed_Protein_TP_34.2 Processed_Protein_TP_36
## 112 0.7049336 1.056975
## Processed_Protein_TP_36.1 Processed_Protein_TP_36.2
## 112 NA 0.8596406
## Processed_Protein_TP_38 Processed_Protein_TP_38.1
## 112 1.052486 0.9873489
## Processed_Protein_TP_38.2 Processed_Protein_TP_40
## 112 1.119382 1.170371
## Processed_Protein_TP_40.1 Processed_Protein_TP_40.2
## 112 NA 1.200719
## Processed_Protein_TP_42 Processed_Protein_TP_42.1
## 112 1.376728 1.235709
## Processed_Protein_TP_42.2 Processed_Protein_TP_44
## 112 1.305489 1.297988
## Processed_Protein_TP_44.1 Processed_Protein_TP_44.2
## 112 1.420461 1.430351
## Processed_Protein_TP_46 Processed_Protein_TP_46.1
## 112 1.448152 1.442124
## Processed_Protein_TP_46.2 Processed_Protein_TP_48
## 112 1.660811 1.739331
## Processed_Protein_TP_48.1 Processed_Protein_TP_48.2 Fitted_Protein_TP2
## 112 NA 1.626077 -1.611531
## Fitted_Protein_TP4 Fitted_Protein_TP6 Fitted_Protein_TP8
## 112 -1.46565 -1.319769 -1.173888
## Fitted_Protein_TP10 Fitted_Protein_TP12 Fitted_Protein_TP14
## 112 -1.028007 -0.8821265 -0.7362456
## Fitted_Protein_TP16 Fitted_Protein_TP18 Fitted_Protein_TP20
## 112 -0.5903648 -0.444484 -0.2986032
## Fitted_Protein_TP22 Fitted_Protein_TP24 Fitted_Protein_TP26
## 112 -0.1527223 -0.006841486 0.1390393
## Fitted_Protein_TP28 Fitted_Protein_TP30 Fitted_Protein_TP32
## 112 0.2849202 0.430801 0.5766818
## Fitted_Protein_TP34 Fitted_Protein_TP36 Fitted_Protein_TP38
## 112 0.7225627 0.8684435 1.014324
## Fitted_Protein_TP40 Fitted_Protein_TP42 Fitted_Protein_TP44
## 112 1.160205 1.306086 1.451967
## Fitted_Protein_TP46 Fitted_Protein_TP48
## 112 1.597848 1.743729
We can see that the distributions are a little more even now between our omics types, with much of the same effects between them. We can also see the effects of our preprocessing – normalization put our expressions on the same scale, and smoothing has reduced the space between our replicates. Though the fit hasn’t changed very much, it’s important to note that smoothing has also reduced the amount of damping in the system: the amplitude change coefficient has decreased by about .02. More discussions on the effects of preprocessing can be found in the supplemental information of ECHO: an application for detection and analysis of oscillators identifies metabolic regulation on genome-wide circadian output.
Other selections are also available, including adjustments for the harmonic and overexpressed cutoffs for ECHO and ECHO linear modelscan be adjusted through the harm_cut and over_cut parameters, respectively, though the defaults are recommended.
Now that you understand the basics of using the mosaic_find() function, feel free to experiment and play around with this vignette and the example data. Good luck with your multi-omics trend searches!