Introduction to Using MOSAIC

Hannah De los Santos

2020-06-10

The mosaic.find package provides a function (mosaic_find()) designed to find rhythmic and non-rhythmic trends in multi-omics time course data using model selection and joint modeling, a method called MOSAIC (Multi-Omics Selection with Amplitude Independent Criteria). To read more about our work on this project and cite us, see MOSAIC: A Joint Modeling Methodology for Combined Circadian and Non-Circadian Analysis of Multi-Omics Data by H. De los Santos et al. (2020)

Further, for users who prefer an interface more than coding, as well as built-in visualizations, our GitHub repository can be found here. There, you can find a Shiny application for finding trends and automatically visualizing results, with features such as summary visualizations, heat maps, gene expression plots (with or without replicates visualized), and parameter density graphs. It will also have the most up-to-date features. A FAQ for possible user errors can also be found there.

In this vignette, we’ll walk through an example of how to use mosaic.find, and how to choose from the several different built-in methods of preprocessing.

Loading and Examining Data

We’ll start by loading our library, which contains the mosaic_find() function. It also has two example datsets: expressions_rna, which contains synthetic RNA expression data; and expressions_pro, which contains synthetic protein expression data. We’ll be using both throughout this vignette. Here we’ll look at the first few rows and columns of our dataset.

library(mosaic.find)

head(expressions_rna[,1:5])
head(expressions_pro[,1:5])
##   Sample_Name        TP_2      TP_2.1     TP_2.2       TP_4
## 1    Sample_1 -0.73939338 -0.77703064 -0.4005528 -0.9975816
## 2    Sample_2  3.75402084  4.16813069  3.8756120 -0.1310355
## 3    Sample_3  2.22714192  2.33298725  2.1916027  2.4385300
## 4    Sample_4  0.51375623 -0.08142932 -0.5781523 -0.1006701
## 5    Sample_5 -0.01933717 -0.26711040 -0.6958891 -0.8056555
## 6    Sample_6 -3.02732977 -3.41256591 -2.9703498 -3.7241991
##   Sample_Name       TP_2     TP_2.1    TP_2.2       TP_4
## 1    Sample_1 -0.5936787 -3.3199488 -2.142538 -3.0727607
## 2    Sample_2  1.7327705  1.6354914  1.598475  2.0855611
## 3    Sample_3 -0.9830408 -0.9886902 -1.439174 -0.4246837
## 4    Sample_4 -2.9620445 -2.6023412 -2.402504 -2.8248408
## 5    Sample_5 -2.2724832 -2.3953960 -2.115315 -3.1454384
## 6    Sample_6  2.1182184  3.4492056  5.188103  5.8263585

Note the data format: its first column first column has gene labels/names, and all other columns have numerical expression data for RNA and protein, respectively. This expression data is ordered by time point then by replicate, and has evenly spaced time points. Any missing data has cells left blank (NA). Also note that the sample names/labels are the same between our RNA and protein datasets, to indicate corresponding expression data. Any expressions with a name not found in the other dataset will not be reflected in the results. In order to use the mosaic_find() function, data must be in this format.

Now, let’s look at one the expressions, Sample 2. Here we plot RNA in blue and protein in red. each of the replicates in a different color, then plot the difference between them in gray.

library(ggplot2)

# making a function to plot expression, since we're going to use it again
# samp: sample name that we want to visualize
# expressions_rna, expressions_pro: our original RNA and protein expression data frames
# results: data frame of mosaic results
plot_exp <- function(samp, expressions_rna, expressions_pro, results = data.frame()){
  # color information
  rna_blue_high <- "#268beb"
  pro_red_high <- "#d61e1e"
  
  rna_blue_low <- "#9ebfde"
  pro_red_low <- "#d49d9d"
  
  # making a function to create automatic palettes for rna and protein vizualizations
  rna_col_func <- colorRampPalette(c(rna_blue_low, rna_blue_high))
  pro_col_func <- colorRampPalette(c(pro_red_low, pro_red_high))
  
  tp <- seq(2,48,by=2) # our time points
  num_reps <- 3 # number of replicates
  
  # set to plot either the mosaic results or the original data
  if (nrow(results) == 0){ 
    # we're plotting the original data
    dat_rna <- expressions_rna[,-1]
    dat_pro <- expressions_pro[,-1]
    
    # logicals for the position of our desired sample in each dataset
    gene_log_rna <- expressions_rna$Sample_Name == samp
    gene_log_pro <- expressions_pro$Sample_Name == samp
  } else {
    # last number that is a parameter in the results dataset
    end_num <- 33

    # we're plotting the processed original data and the fitted data
    dat_rna <- results[,(end_num+1):((end_num+1)+((length(tp)*num_reps)-1))]
    fit_rna <- results[,(end_num+(length(tp)*num_reps)+1):((end_num+(length(tp)*num_reps)+1)+(length(tp)-1))]
    dat_pro <- results[,(end_num+(length(tp)*num_reps)+1+length(tp)):(end_num+(length(tp)*num_reps*2)+1+length(tp)-1)]
    fit_pro <- results[,(end_num+(length(tp)*num_reps*2)+1+length(tp)):ncol(results)]
    
    # logicals for the position of our desired sample
    gene_log_rna <- gene_log_pro <- results$Gene_Name == samp
  }
  
  # our visualization data frame
  gg.df <- data.frame(
    # maximum over time points
    "rna_max" = sapply(seq(1,ncol(dat_rna), by = num_reps), function(x) max(dat_rna[gene_log_rna,x:(x+num_reps-1)], na.rm = T)),
    "pro_max" = sapply(seq(1,ncol(dat_pro), by = num_reps), function(x) max(dat_pro[gene_log_pro,x:(x+num_reps-1)], na.rm = T)),
    # minimum over time points
    "rna_min" = sapply(seq(1,ncol(dat_rna), by = num_reps), function(x) min(dat_rna[gene_log_rna,x:(x+num_reps-1)], na.rm = T)),
    "pro_min" = sapply(seq(1,ncol(dat_pro), by = num_reps), function(x) min(dat_pro[gene_log_pro,x:(x+num_reps-1)], na.rm = T)),
    "timen" = tp # time points
  )
  
  # colors for lines and shading
  col_vect <- c(
    "Original RNA" = rna_blue_low,
    "Original Protein" = pro_red_low
  )
  
  # only plot individual replicates if we're only plotting the original data
  if (nrow(results) == 0){
    # add replicate colors
    # add two to provide more variability
    rna_rep_col <- rna_col_func(num_reps+2)
    pro_rep_col <- pro_col_func(num_reps+2)
    
    # add replicates to df and color scale
    for (i in 1:num_reps){
      gg.df[,paste0("rna_rep",i)] <- as.numeric(dat_rna[gene_log_rna,seq(i, ncol(dat_rna), by=num_reps)])
      gg.df[,paste0("pro_rep",i)] <- as.numeric(dat_pro[gene_log_pro,seq(i, ncol(dat_pro), by=num_reps)])
    
      col_vect[paste0("RNA, Replicate ",i)] <- rna_rep_col[i+1]
      col_vect[paste0("Protein, Replicate ",i)] <- pro_rep_col[i+1]
    }
  } else {
    # otherwise, we plot the model fits
    gg.df$rna_fit <- as.numeric(fit_rna[gene_log_rna,])
    gg.df$pro_fit <- as.numeric(fit_pro[gene_log_pro,])
    
    # add the colors
    col_vect["Fit RNA"] <- rna_blue_high
    col_vect["Fit Protein"] <- pro_red_high
  }
  
  # name breaks for the colors in the legend
  col_name_breaks <- c(names(col_vect)[grepl("RNA", names(col_vect), fixed = T)],
                       names(col_vect)[grepl("Protein", names(col_vect), fixed = T)])
  
  # visualize, with shading for each omics type
  p <- ggplot(gg.df, aes(x = timen))+
    # setting up scales and labels
    scale_fill_manual("", values = col_vect, breaks = col_name_breaks)+
    scale_color_manual("", values = col_vect, breaks = col_name_breaks)+
    ylab("Expression")+
    xlab("Time (Hours)")+
    theme_bw()+
    theme(
      plot.title = element_text(hjust = .5),
      legend.position = "bottom",
      legend.direction = "horizontal"
    )+
    scale_x_continuous(expand = c(0, 0))+
    # add shading for original rna and protein data
    geom_ribbon(aes(ymax = rna_max, ymin = rna_min, fill = "Original RNA"), alpha = .5)+
    geom_ribbon(aes(ymax = pro_max, ymin = pro_min, fill = "Original Protein"), alpha = .5)+
    ggtitle(samp)+ # title
    guides( # colors
      color = guide_legend(nrow = 2, byrow = T),
      fill = guide_legend(nrow = 2)
    )+
    NULL
  
  
  # only plot individual replicates if we're only plotting the original data
  if (nrow(results) == 0){
    # add each of the replicates
    for (i in 1:num_reps){
      p <- p +
        geom_line(aes_string(y = paste0("rna_rep",i), color = shQuote(paste0("RNA, Replicate ",i))))+
        geom_line(aes_string(y = paste0("pro_rep",i), color = shQuote(paste0("Protein, Replicate ",i))))+
        NULL
    }
  } else { # otherwise, plot the fits
    p <- p +
      geom_line(aes(y = rna_fit, color = "Fit RNA"), size = 1)+
      geom_line(aes(y = pro_fit, color = "Fit Protein"), size = 1)+
      NULL
  }
  
  return(p)
}

suppressWarnings(plot_exp("Sample_2", expressions_rna, expressions_pro)) # to ignore warnings for missing values

We can see that this sample manifests very differently depending on the omics type! In RNA, Sample 2 seems to be a rhythmic gene with damping, while in protein, Sample 2 is clearly linear. This difference in expression makes mosaic_find() the perfect function for our dataset.

Running mosaic_find()

So we begin by assigning our parameters and running the mosaic_find() function. In this first run, in addition to looking for trends, we want to look for rhythms between 20 and 28 hours, with no preprocessing, assigning these results to a new dataframe.

# run mosaic_find()
results <- mosaic_find(
  rna  = expressions_rna, # rna data
  pro = expressions_pro, # protein data
  begin = 2, # first time point (hours)
  end = 48, # last time point (hours)
  resol = 2, # resolution (hours)
  num_reps = 3, # number of replicates
  paired = F, # unpaired, or unrelated, replicates
  low = 20, # lower limit when looking for rhythms (hours)
  high = 28 # lower limit when looking for rhythms (hours)
)

# look at the results!
head(results[,1:16], 5)
##    Gene_Name Best_Model_RNA Best_Model_Protein  P_Value_RNA
## 1   Sample_1         Linear        Exponential 7.343098e-28
## 2  Sample_10    Exponential             Linear 6.159862e-13
## 3 Sample_100    Exponential        Exponential 1.280193e-13
## 4 Sample_101         Linear             Linear 1.005577e-24
## 5 Sample_102         Linear             Linear 2.928965e-24
##   BH_Adj_P_Value_RNA P_Value_Protein BH_Adj_P_Value_Protein P_Value_Joint
## 1       2.475202e-27    3.399446e-32           8.545638e-31            NA
## 2       7.347748e-13    2.936618e-20           6.013551e-20            NA
## 3       1.530111e-13    2.839392e-31           2.041703e-30            NA
## 4       1.704368e-24    1.296587e-12           1.568452e-12            NA
## 5       4.673880e-24    2.429743e-03           2.603296e-03            NA
##   BH_Adj_P_Value_Joint P_Value_Linear_Slope_RNA
## 1                   NA             8.727573e-56
## 2                   NA                       NA
## 3                   NA                       NA
## 4                   NA             3.223569e-43
## 5                   NA             1.656604e-40
##   BH_Adj_P_Value_Linear_Slope_RNA P_Value_Linear_Slope_Protein
## 1                    4.403457e-55                           NA
## 2                              NA                 4.840777e-29
## 3                              NA                           NA
## 4                    4.835353e-43                 1.993804e-15
## 5                    2.270162e-40                 1.431576e-03
##   BH_Adj_P_Value_Linear_Slope_Protein AC_Coefficient_RNA
## 1                                  NA                 NA
## 2                        1.063316e-28                 NA
## 3                                  NA                 NA
## 4                        2.641277e-15                 NA
## 5                        1.627943e-03                 NA
##   Oscillation_Type_RNA Period_RNA
## 1                 <NA>         NA
## 2                 <NA>         NA
## 3                 <NA>         NA
## 4                 <NA>         NA
## 5                 <NA>         NA

Looks like we’ve sucessfully run MOSAIC! Now we can see that the results data frame has information about which models fit each of the expressions in our omics types, the parameters for those models, and p-values. You’ll also notice that it reordered our sample names. For more information on the models and parameters available, please see MOSAIC: A Joint Modeling Methodology for Combined Circadian and Non-Circadian Analysis of Multi-Omics Data.

So, first, let’s take a look at the distributions of our models in each of our omics types!

# making a function, since we'll be using this again later
plot_dist <- function(results){
  # first, we only want to consider significant expressions
  sig_rna <- results$BH_Adj_P_Value_RNA < .05
  sig_pro <- results$BH_Adj_P_Value_Protein < .05
  
  # then get the significant models for each omics type
  mod_rna <- results$Best_Model_RNA[sig_rna]
  mod_pro <- results$Best_Model_Protein[sig_pro]
  
  # form into the data frame for plotting
  gg.df <- data.frame(
    "Model" = c(mod_rna, mod_pro),
    "Omics_Type" = c(rep("RNA", length(mod_rna)), rep("Protein",length(mod_pro)))
  )
  
  # colors
  col_proper <- c("Linear" = "#b32515", # red
           "Exponential" = "#e39f22", # orange
           "ECHO" = "#3461eb", # blue
           "ECHO Joint" = "#5c7de0", # light blue
           "ECHO Linear" = "#7b2bcc", # purple
           "ECHO Linear Joint" = "#945fc9" # light purple
  )
  
  # make a bar chart of distributions for each model type
  ggplot(gg.df, aes(Omics_Type, fill = Model))+
    geom_bar()+
    theme_bw()+
    ylab("Total Significant Expressions")+
    xlab("Omics Type")+
    theme_bw()+
    theme(
      plot.title = element_text(hjust = .5),
      legend.position = "bottom",
      legend.direction = "horizontal"
    )+
    ggtitle("MOSAIC Model Distributions")+ # title
    guides( # colors
      fill = guide_legend(nrow = 2)
    )+
    scale_y_continuous(expand = expand_scale(mult = c(0, .1)))+
    scale_fill_manual("Model", values = col_proper)+
    NULL
}

# plot our model distributions!
plot_dist(results)

So we can see that each of our omics types has roughly similar distributions, with mostly non-oscillatory models, Linear and Protein, making up most of the distribution, but with roughly 200 of the genes remaining oscillatory.

Let’s next look at how the fit turned out for our initial sample. Here, we add the fitted values to our plot and print the parameters to the console.

# plot our fitted data!
suppressWarnings(plot_exp("Sample_2", expressions_rna, expressions_pro, results)) # to ignore warnings for missing values

# print our parameters for our sample
results[results$Gene_Name == "Sample_2",]
##     Gene_Name Best_Model_RNA Best_Model_Protein  P_Value_RNA
## 112  Sample_2           ECHO             Linear 5.751267e-22
##     BH_Adj_P_Value_RNA P_Value_Protein BH_Adj_P_Value_Protein
## 112        7.70259e-22    6.170966e-27           3.137779e-26
##     P_Value_Joint BH_Adj_P_Value_Joint P_Value_Linear_Slope_RNA
## 112            NA                   NA                       NA
##     BH_Adj_P_Value_Linear_Slope_RNA P_Value_Linear_Slope_Protein
## 112                              NA                 6.323223e-51
##     BH_Adj_P_Value_Linear_Slope_Protein AC_Coefficient_RNA
## 112                        8.125342e-49          0.1305151
##     Oscillation_Type_RNA Period_RNA Hours_Shifted_RNA
## 112               Damped         28          25.88371
##     Initial_Amplitude_RNA Radian_Frequency_RNA Phase_Shift_RNA
## 112              9.291564            0.2243995       0.4748946
##     Growth_Rate_RNA Slope_RNA Equilibrium_Value_RNA AC_Coefficient_Protein
## 112              NA        NA            -0.9723169                     NA
##     Oscillation_Type_Protein Period_Protein Hours_Shifted_Protein
## 112                     <NA>             NA                    NA
##     Initial_Amplitude_Protein Radian_Frequency_Protein Phase_Shift_Protein
## 112                        NA                       NA                  NA
##     Growth_Rate_Protein Slope_Protein Equilibrium_Value_Protein
## 112                  NA    0.08934725                  1.732718
##     Processed_RNA_TP_2 Processed_RNA_TP_2.1 Processed_RNA_TP_2.2
## 112           3.754021             4.168131             3.875612
##     Processed_RNA_TP_4 Processed_RNA_TP_4.1 Processed_RNA_TP_4.2
## 112         -0.1310355                   NA            0.1837094
##     Processed_RNA_TP_6 Processed_RNA_TP_6.1 Processed_RNA_TP_6.2
## 112          -2.880764            -2.269142            -1.774086
##     Processed_RNA_TP_8 Processed_RNA_TP_8.1 Processed_RNA_TP_8.2
## 112          -4.724525            -5.125785            -4.780815
##     Processed_RNA_TP_10 Processed_RNA_TP_10.1 Processed_RNA_TP_10.2
## 112           -4.919622             -5.271985             -4.453277
##     Processed_RNA_TP_12 Processed_RNA_TP_12.1 Processed_RNA_TP_12.2
## 112           -4.984205             -5.637898              -5.89182
##     Processed_RNA_TP_14 Processed_RNA_TP_14.1 Processed_RNA_TP_14.2
## 112           -3.756594             -4.147797             -5.199659
##     Processed_RNA_TP_16 Processed_RNA_TP_16.1 Processed_RNA_TP_16.2
## 112           -2.600454              -2.85117             -2.860476
##     Processed_RNA_TP_18 Processed_RNA_TP_18.1 Processed_RNA_TP_18.2
## 112           -1.882311             -2.357486             -1.938199
##     Processed_RNA_TP_20 Processed_RNA_TP_20.1 Processed_RNA_TP_20.2
## 112           0.7918994            -0.5020944           0.001700021
##     Processed_RNA_TP_22 Processed_RNA_TP_22.1 Processed_RNA_TP_22.2
## 112           0.1983453             0.4554254              1.209879
##     Processed_RNA_TP_24 Processed_RNA_TP_24.1 Processed_RNA_TP_24.2
## 112           0.5684147             0.4036352             0.6004868
##     Processed_RNA_TP_26 Processed_RNA_TP_26.1 Processed_RNA_TP_26.2
## 112           0.9273787             0.3436154             0.6093465
##     Processed_RNA_TP_28 Processed_RNA_TP_28.1 Processed_RNA_TP_28.2
## 112            1.089579             0.2013894                    NA
##     Processed_RNA_TP_30 Processed_RNA_TP_30.1 Processed_RNA_TP_30.2
## 112          -0.5642648            -0.4228868             -2.258175
##     Processed_RNA_TP_32 Processed_RNA_TP_32.1 Processed_RNA_TP_32.2
## 112                  NA            -0.6388114            -0.6754961
##     Processed_RNA_TP_34 Processed_RNA_TP_34.1 Processed_RNA_TP_34.2
## 112           -2.360276             -2.078823             -1.529084
##     Processed_RNA_TP_36 Processed_RNA_TP_36.1 Processed_RNA_TP_36.2
## 112           -1.825594             -2.589809             -2.467512
##     Processed_RNA_TP_38 Processed_RNA_TP_38.1 Processed_RNA_TP_38.2
## 112           -2.713209             -2.574499             -2.012953
##     Processed_RNA_TP_40 Processed_RNA_TP_40.1 Processed_RNA_TP_40.2
## 112           -1.853889             -1.564534            -0.3381466
##     Processed_RNA_TP_42 Processed_RNA_TP_42.1 Processed_RNA_TP_42.2
## 112           -1.704297             -1.922834              -1.72339
##     Processed_RNA_TP_44 Processed_RNA_TP_44.1 Processed_RNA_TP_44.2
## 112           -1.686813            -0.2003985             -1.179989
##     Processed_RNA_TP_46 Processed_RNA_TP_46.1 Processed_RNA_TP_46.2
## 112          -0.4345448             -1.083279             -1.220444
##     Processed_RNA_TP_48 Processed_RNA_TP_48.1 Processed_RNA_TP_48.2
## 112          -0.8149757             -1.000756            -0.3102249
##     Fitted_RNA_TP2 Fitted_RNA_TP4 Fitted_RNA_TP6 Fitted_RNA_TP8
## 112       3.943954      0.4376398      -2.529325       -4.52069
##     Fitted_RNA_TP10 Fitted_RNA_TP12 Fitted_RNA_TP14 Fitted_RNA_TP16
## 112       -5.384624       -5.217037       -4.286551       -2.944113
##     Fitted_RNA_TP18 Fitted_RNA_TP20 Fitted_RNA_TP22 Fitted_RNA_TP24
## 112       -1.537816      -0.3478393       0.4508485       0.7973514
##     Fitted_RNA_TP26 Fitted_RNA_TP28 Fitted_RNA_TP30 Fitted_RNA_TP32
## 112        0.730136       0.3569411       -0.181478      -0.7455091
##     Fitted_RNA_TP34 Fitted_RNA_TP36 Fitted_RNA_TP38 Fitted_RNA_TP40
## 112        -1.22278       -1.543114       -1.682088       -1.655129
##     Fitted_RNA_TP42 Fitted_RNA_TP44 Fitted_RNA_TP46 Fitted_RNA_TP48
## 112        -1.50545       -1.289503       -1.063284      -0.8718625
##     Processed_Protein_TP_2 Processed_Protein_TP_2.1
## 112                1.73277                 1.635491
##     Processed_Protein_TP_2.2 Processed_Protein_TP_4
## 112                 1.598475               2.085561
##     Processed_Protein_TP_4.1 Processed_Protein_TP_4.2
## 112                 1.961745                 2.030575
##     Processed_Protein_TP_6 Processed_Protein_TP_6.1
## 112               1.902162                 2.649547
##     Processed_Protein_TP_6.2 Processed_Protein_TP_8
## 112                 2.442324               2.880684
##     Processed_Protein_TP_8.1 Processed_Protein_TP_8.2
## 112                 2.735317                 2.508182
##     Processed_Protein_TP_10 Processed_Protein_TP_10.1
## 112                2.640153                  2.746428
##     Processed_Protein_TP_10.2 Processed_Protein_TP_12
## 112                  2.709603                2.618462
##     Processed_Protein_TP_12.1 Processed_Protein_TP_12.2
## 112                  2.660648                  2.523894
##     Processed_Protein_TP_14 Processed_Protein_TP_14.1
## 112                3.161411                  2.803451
##     Processed_Protein_TP_14.2 Processed_Protein_TP_16
## 112                  3.093063                3.102046
##     Processed_Protein_TP_16.1 Processed_Protein_TP_16.2
## 112                  3.281512                  3.549912
##     Processed_Protein_TP_18 Processed_Protein_TP_18.1
## 112                3.153352                  3.114335
##     Processed_Protein_TP_18.2 Processed_Protein_TP_20
## 112                  3.842172                3.373165
##     Processed_Protein_TP_20.1 Processed_Protein_TP_20.2
## 112                  3.105235                  3.600638
##     Processed_Protein_TP_22 Processed_Protein_TP_22.1
## 112                3.943721                  3.928703
##     Processed_Protein_TP_22.2 Processed_Protein_TP_24
## 112                  3.467277                3.733599
##     Processed_Protein_TP_24.1 Processed_Protein_TP_24.2
## 112                  3.926123                  3.689913
##     Processed_Protein_TP_26 Processed_Protein_TP_26.1
## 112                3.862635                  3.905335
##     Processed_Protein_TP_26.2 Processed_Protein_TP_28
## 112                  4.303546                  4.2887
##     Processed_Protein_TP_28.1 Processed_Protein_TP_28.2
## 112                  4.156734                  4.071896
##     Processed_Protein_TP_30 Processed_Protein_TP_30.1
## 112                4.830815                  4.775326
##     Processed_Protein_TP_30.2 Processed_Protein_TP_32
## 112                  4.630182                4.579518
##     Processed_Protein_TP_32.1 Processed_Protein_TP_32.2
## 112                  4.461469                  4.384259
##     Processed_Protein_TP_34 Processed_Protein_TP_34.1
## 112                5.066425                        NA
##     Processed_Protein_TP_34.2 Processed_Protein_TP_36
## 112                  4.691526                5.381199
##     Processed_Protein_TP_36.1 Processed_Protein_TP_36.2
## 112                        NA                  4.899999
##     Processed_Protein_TP_38 Processed_Protein_TP_38.1
## 112                5.072838                  4.914001
##     Processed_Protein_TP_38.2 Processed_Protein_TP_40
## 112                  5.235965                5.370474
##     Processed_Protein_TP_40.1 Processed_Protein_TP_40.2
## 112                        NA                  5.444477
##     Processed_Protein_TP_42 Processed_Protein_TP_42.1
## 112                5.625993                  5.282117
##     Processed_Protein_TP_42.2 Processed_Protein_TP_44
## 112                  5.452276                5.408484
##     Processed_Protein_TP_44.1 Processed_Protein_TP_44.2
## 112                  5.707133                  5.731252
##     Processed_Protein_TP_46 Processed_Protein_TP_46.1
## 112                5.452682                  5.437982
##     Processed_Protein_TP_46.2 Processed_Protein_TP_48
## 112                  5.971252                6.205994
##     Processed_Protein_TP_48.1 Processed_Protein_TP_48.2 Fitted_Protein_TP2
## 112                        NA                  5.998865           1.911412
##     Fitted_Protein_TP4 Fitted_Protein_TP6 Fitted_Protein_TP8
## 112           2.090107           2.268801           2.447496
##     Fitted_Protein_TP10 Fitted_Protein_TP12 Fitted_Protein_TP14
## 112             2.62619            2.804885            2.983579
##     Fitted_Protein_TP16 Fitted_Protein_TP18 Fitted_Protein_TP20
## 112            3.162274            3.340968            3.519663
##     Fitted_Protein_TP22 Fitted_Protein_TP24 Fitted_Protein_TP26
## 112            3.698357            3.877052            4.055746
##     Fitted_Protein_TP28 Fitted_Protein_TP30 Fitted_Protein_TP32
## 112            4.234441            4.413135             4.59183
##     Fitted_Protein_TP34 Fitted_Protein_TP36 Fitted_Protein_TP38
## 112            4.770524            4.949219            5.127913
##     Fitted_Protein_TP40 Fitted_Protein_TP42 Fitted_Protein_TP44
## 112            5.306608            5.485302            5.663997
##     Fitted_Protein_TP46 Fitted_Protein_TP48
## 112            5.842691            6.021386

We can see that MOSAIC did a great job with these fits, with the fits matching closely to the trend. Looking at the subset of results corresponding to Sample 2, we can see that MOSAIC correctly designated our RNA data as an ECHO model with damping, and our protein data as a linear model with a positive slope. These fits match pretty closely to the trend, which are emphasized by their very low BH adjusted p-values.

Now let’s see how preprocessing affects the results. Here, we’re going to search for all possible periods, as well as allowing for all our preprocessing options: removing unexpressed genes, normalizing, and smoothing. These preprocessing methods are described in our paper.

# run mosaic_find() with preprocessing
results <- mosaic_find(
  rna  = expressions_rna, # rna data
  pro = expressions_pro, # protein data
  begin = 2, # first time point (hours)
  end = 48, # last time point (hours)
  resol = 2, # resolution (hours)
  num_reps = 3, # number of replicates
  paired = F, # unpaired, or unrelated, replicates
  run_all_per = T, # now we're looking for any period within our time course
  rem_unexpr = T, # remove any unexpressed genes (we don't have any, but some time courses might)
  rem_unexpr_amt = 70, # if we have less than 70% expression, we want to remove the gene
  rem_unexpr_amt_below = 0, # unexpressed genes are those with expression equal to 0
  is_normal = T, # normalize data
  is_smooth = T # smooth the data
)

# look at the results!
head(results[,1:16], 5)
##    Gene_Name Best_Model_RNA Best_Model_Protein  P_Value_RNA
## 1   Sample_1    Exponential        Exponential 1.178262e-30
## 2  Sample_10    Exponential             Linear 1.420713e-21
## 3 Sample_100    Exponential        Exponential 7.165064e-20
## 4 Sample_101         Linear             Linear 4.737161e-29
## 5 Sample_102         Linear             Linear 8.696209e-30
##   BH_Adj_P_Value_RNA P_Value_Protein BH_Adj_P_Value_Protein P_Value_Joint
## 1       4.062973e-30    3.004781e-32           5.916211e-31            NA
## 2       1.715146e-21    3.118392e-26           7.060511e-26            NA
## 3       8.563821e-20    2.839392e-31           1.665078e-30            NA
## 4       9.972970e-29    2.622474e-21           3.383838e-21            NA
## 5       2.278483e-29    6.746870e-07           7.164818e-07            NA
##   BH_Adj_P_Value_Joint P_Value_Linear_Slope_RNA
## 1                   NA                       NA
## 2                   NA                       NA
## 3                   NA                       NA
## 4                   NA             1.102292e-56
## 5                   NA             5.491577e-60
##   BH_Adj_P_Value_Linear_Slope_RNA P_Value_Linear_Slope_Protein
## 1                              NA                           NA
## 2                              NA                 7.054902e-47
## 3                              NA                           NA
## 4                    1.336341e-56                 6.836260e-30
## 5                    7.419916e-60                 1.692101e-07
##   BH_Adj_P_Value_Linear_Slope_Protein AC_Coefficient_RNA
## 1                                  NA                 NA
## 2                        1.492071e-46                 NA
## 3                                  NA                 NA
## 4                        8.115302e-30                 NA
## 5                        1.842287e-07                 NA
##   Oscillation_Type_RNA Period_RNA
## 1                 <NA>         NA
## 2                 <NA>         NA
## 3                 <NA>         NA
## 4                 <NA>         NA
## 5                 <NA>         NA

Let’s look at how this preprocessing affected our distribution and original sample!

# plot our model distributions!
plot_dist(results)

# plot our fitted data!
suppressWarnings(plot_exp("Sample_2", expressions_rna, expressions_pro, results)) # to ignore warnings for missing values

# print our parameters for our sample
results[results$Gene_Name == "Sample_2",]
##     Gene_Name Best_Model_RNA Best_Model_Protein  P_Value_RNA
## 112  Sample_2           ECHO             Linear 2.174569e-26
##     BH_Adj_P_Value_RNA P_Value_Protein BH_Adj_P_Value_Protein
## 112       3.020234e-26    8.048197e-31           4.273379e-30
##     P_Value_Joint BH_Adj_P_Value_Joint P_Value_Linear_Slope_RNA
## 112            NA                   NA                       NA
##     BH_Adj_P_Value_Linear_Slope_RNA P_Value_Linear_Slope_Protein
## 112                              NA                 1.894637e-69
##     BH_Adj_P_Value_Linear_Slope_Protein AC_Coefficient_RNA
## 112                        1.743066e-67          0.1140198
##     Oscillation_Type_RNA Period_RNA Hours_Shifted_RNA
## 112               Damped   27.23337           25.4299
##     Initial_Amplitude_RNA Radian_Frequency_RNA Phase_Shift_RNA
## 112              3.463709            0.2307164       0.4160901
##     Growth_Rate_RNA Slope_RNA Equilibrium_Value_RNA AC_Coefficient_Protein
## 112              NA        NA             0.1572618                     NA
##     Oscillation_Type_Protein Period_Protein Hours_Shifted_Protein
## 112                     <NA>             NA                    NA
##     Initial_Amplitude_Protein Radian_Frequency_Protein Phase_Shift_Protein
## 112                        NA                       NA                  NA
##     Growth_Rate_Protein Slope_Protein Equilibrium_Value_Protein
## 112                  NA    0.07294042                 -1.757411
##     Processed_RNA_TP_2 Processed_RNA_TP_2.1 Processed_RNA_TP_2.2
## 112           1.886177             2.016075             1.924318
##     Processed_RNA_TP_4 Processed_RNA_TP_4.1 Processed_RNA_TP_4.2
## 112          0.8647523                   NA            0.9387996
##     Processed_RNA_TP_6 Processed_RNA_TP_6.1 Processed_RNA_TP_6.2
## 112         -0.5438456           -0.3999546            -0.283487
##     Processed_RNA_TP_8 Processed_RNA_TP_8.1 Processed_RNA_TP_8.2
## 112           -1.25274            -1.347141            -1.265983
##     Processed_RNA_TP_10 Processed_RNA_TP_10.1 Processed_RNA_TP_10.2
## 112           -1.674122              -1.75702             -1.564409
##     Processed_RNA_TP_12 Processed_RNA_TP_12.1 Processed_RNA_TP_12.2
## 112           -1.556154             -1.709943             -1.769681
##     Processed_RNA_TP_14 Processed_RNA_TP_14.1 Processed_RNA_TP_14.2
## 112           -1.152736             -1.244772             -1.492234
##     Processed_RNA_TP_16 Processed_RNA_TP_16.1 Processed_RNA_TP_16.2
## 112           -0.663361            -0.7223447            -0.7245342
##     Processed_RNA_TP_18 Processed_RNA_TP_18.1 Processed_RNA_TP_18.2
## 112         -0.05284582            -0.1646363           -0.06599417
##     Processed_RNA_TP_20 Processed_RNA_TP_20.1 Processed_RNA_TP_20.2
## 112           0.7216151             0.4171885             0.5357118
##     Processed_RNA_TP_22 Processed_RNA_TP_22.1 Processed_RNA_TP_22.2
## 112           0.8242308             0.8847117              1.062205
##     Processed_RNA_TP_24 Processed_RNA_TP_24.1 Processed_RNA_TP_24.2
## 112           0.9850069             0.9462407             0.9925523
##     Processed_RNA_TP_26 Processed_RNA_TP_26.1 Processed_RNA_TP_26.2
## 112            1.060243             0.9229059             0.9854222
##     Processed_RNA_TP_28 Processed_RNA_TP_28.1 Processed_RNA_TP_28.2
## 112           0.9072927              0.698336                    NA
##     Processed_RNA_TP_30 Processed_RNA_TP_30.1 Processed_RNA_TP_30.2
## 112           0.5703559             0.6036167             0.1718444
##     Processed_RNA_TP_32 Processed_RNA_TP_32.1 Processed_RNA_TP_32.2
## 112                  NA              0.192927             0.1842965
##     Processed_RNA_TP_34 Processed_RNA_TP_34.1 Processed_RNA_TP_34.2
## 112          -0.1979852            -0.1317704          -0.002438085
##     Processed_RNA_TP_36 Processed_RNA_TP_36.1 Processed_RNA_TP_36.2
## 112          -0.2452876            -0.4250777            -0.3963061
##     Processed_RNA_TP_38 Processed_RNA_TP_38.1 Processed_RNA_TP_38.2
## 112          -0.3510112            -0.3183782            -0.1862681
##     Processed_RNA_TP_40 Processed_RNA_TP_40.1 Processed_RNA_TP_40.2
## 112          -0.2277257            -0.1596516             0.1288699
##     Processed_RNA_TP_42 Processed_RNA_TP_42.1 Processed_RNA_TP_42.2
## 112          0.03596156           -0.01545159            0.03146974
##     Processed_RNA_TP_44 Processed_RNA_TP_44.1 Processed_RNA_TP_44.2
## 112        -0.009526636             0.3401691             0.1097093
##     Processed_RNA_TP_46 Processed_RNA_TP_46.1 Processed_RNA_TP_46.2
## 112           0.3986219             0.2459998             0.2137303
##     Processed_RNA_TP_48 Processed_RNA_TP_48.1 Processed_RNA_TP_48.2
## 112           0.3056779             0.2474021              0.464009
##     Fitted_RNA_TP2 Fitted_RNA_TP4 Fitted_RNA_TP6 Fitted_RNA_TP8
## 112       2.132247      0.7908356     -0.4026525      -1.241782
##     Fitted_RNA_TP10 Fitted_RNA_TP12 Fitted_RNA_TP14 Fitted_RNA_TP16
## 112       -1.632458       -1.588677       -1.207706      -0.6338249
##     Fitted_RNA_TP18 Fitted_RNA_TP20 Fitted_RNA_TP22 Fitted_RNA_TP24
## 112     -0.02013657       0.5035839       0.8518587         0.99142
##     Fitted_RNA_TP26 Fitted_RNA_TP28 Fitted_RNA_TP30 Fitted_RNA_TP32
## 112       0.9371592       0.7393549       0.4664895       0.1879617
##     Fitted_RNA_TP34 Fitted_RNA_TP36 Fitted_RNA_TP38 Fitted_RNA_TP40
## 112     -0.03985828      -0.1821468      -0.2281368      -0.1883473
##     Fitted_RNA_TP42 Fitted_RNA_TP44 Fitted_RNA_TP46 Fitted_RNA_TP48
## 112     -0.08815724       0.0402555       0.1656794       0.2638598
##     Processed_Protein_TP_2 Processed_Protein_TP_2.1
## 112              -1.689306                -1.742496
##     Processed_Protein_TP_2.2 Processed_Protein_TP_4
## 112                -1.762736               -1.51788
##     Processed_Protein_TP_4.1 Processed_Protein_TP_4.2
## 112                -1.568656                -1.540429
##     Processed_Protein_TP_6 Processed_Protein_TP_6.1
## 112              -1.439902                -1.133409
##     Processed_Protein_TP_6.2 Processed_Protein_TP_8
## 112                -1.218389             -0.9779194
##     Processed_Protein_TP_8.1 Processed_Protein_TP_8.2
## 112                -1.037533                -1.130678
##     Processed_Protein_TP_10 Processed_Protein_TP_10.1
## 112               -1.019353                -0.9757709
##     Processed_Protein_TP_10.2 Processed_Protein_TP_12
## 112                -0.9908724              -0.9443907
##     Processed_Protein_TP_12.1 Processed_Protein_TP_12.2
## 112                -0.9270907                 -0.983172
##     Processed_Protein_TP_14 Processed_Protein_TP_14.1
## 112              -0.6819309                -0.8287257
##     Processed_Protein_TP_14.2 Processed_Protein_TP_16
## 112                -0.7099596              -0.6084489
##     Processed_Protein_TP_16.1 Processed_Protein_TP_16.2
## 112                -0.5348519                -0.4247845
##     Processed_Protein_TP_18 Processed_Protein_TP_18.1
## 112              -0.5296738                -0.5456741
##     Processed_Protein_TP_18.2 Processed_Protein_TP_20
## 112                -0.2471974              -0.3413117
##     Processed_Protein_TP_20.1 Processed_Protein_TP_20.2
## 112                -0.4511866                -0.2480278
##     Processed_Protein_TP_22 Processed_Protein_TP_22.1
## 112              -0.1087616                -0.1149202
##     Processed_Protein_TP_22.2 Processed_Protein_TP_24
## 112                -0.3041453              -0.0594274
##     Processed_Protein_TP_24.1 Processed_Protein_TP_24.2
## 112                0.01952431               -0.07734237
##     Processed_Protein_TP_26 Processed_Protein_TP_26.1
## 112              0.02463812                0.04214881
##     Processed_Protein_TP_26.2 Processed_Protein_TP_28
## 112                 0.2054501               0.3661904
##     Processed_Protein_TP_28.1 Processed_Protein_TP_28.2
## 112                 0.3120729                 0.2772818
##     Processed_Protein_TP_30 Processed_Protein_TP_30.1
## 112               0.5635398                 0.5407845
##     Processed_Protein_TP_30.2 Processed_Protein_TP_32
## 112                 0.4812628               0.6607916
##     Processed_Protein_TP_32.1 Processed_Protein_TP_32.2
## 112                 0.6123813                 0.5807184
##     Processed_Protein_TP_34 Processed_Protein_TP_34.1
## 112                0.858675                        NA
##     Processed_Protein_TP_34.2 Processed_Protein_TP_36
## 112                 0.7049336                1.056975
##     Processed_Protein_TP_36.1 Processed_Protein_TP_36.2
## 112                        NA                 0.8596406
##     Processed_Protein_TP_38 Processed_Protein_TP_38.1
## 112                1.052486                 0.9873489
##     Processed_Protein_TP_38.2 Processed_Protein_TP_40
## 112                  1.119382                1.170371
##     Processed_Protein_TP_40.1 Processed_Protein_TP_40.2
## 112                        NA                  1.200719
##     Processed_Protein_TP_42 Processed_Protein_TP_42.1
## 112                1.376728                  1.235709
##     Processed_Protein_TP_42.2 Processed_Protein_TP_44
## 112                  1.305489                1.297988
##     Processed_Protein_TP_44.1 Processed_Protein_TP_44.2
## 112                  1.420461                  1.430351
##     Processed_Protein_TP_46 Processed_Protein_TP_46.1
## 112                1.448152                  1.442124
##     Processed_Protein_TP_46.2 Processed_Protein_TP_48
## 112                  1.660811                1.739331
##     Processed_Protein_TP_48.1 Processed_Protein_TP_48.2 Fitted_Protein_TP2
## 112                        NA                  1.626077          -1.611531
##     Fitted_Protein_TP4 Fitted_Protein_TP6 Fitted_Protein_TP8
## 112           -1.46565          -1.319769          -1.173888
##     Fitted_Protein_TP10 Fitted_Protein_TP12 Fitted_Protein_TP14
## 112           -1.028007          -0.8821265          -0.7362456
##     Fitted_Protein_TP16 Fitted_Protein_TP18 Fitted_Protein_TP20
## 112          -0.5903648           -0.444484          -0.2986032
##     Fitted_Protein_TP22 Fitted_Protein_TP24 Fitted_Protein_TP26
## 112          -0.1527223        -0.006841486           0.1390393
##     Fitted_Protein_TP28 Fitted_Protein_TP30 Fitted_Protein_TP32
## 112           0.2849202            0.430801           0.5766818
##     Fitted_Protein_TP34 Fitted_Protein_TP36 Fitted_Protein_TP38
## 112           0.7225627           0.8684435            1.014324
##     Fitted_Protein_TP40 Fitted_Protein_TP42 Fitted_Protein_TP44
## 112            1.160205            1.306086            1.451967
##     Fitted_Protein_TP46 Fitted_Protein_TP48
## 112            1.597848            1.743729

We can see that the distributions are a little more even now between our omics types, with much of the same effects between them. We can also see the effects of our preprocessing – normalization put our expressions on the same scale, and smoothing has reduced the space between our replicates. Though the fit hasn’t changed very much, it’s important to note that smoothing has also reduced the amount of damping in the system: the amplitude change coefficient has decreased by about .02. More discussions on the effects of preprocessing can be found in the supplemental information of ECHO: an application for detection and analysis of oscillators identifies metabolic regulation on genome-wide circadian output.

Other selections are also available, including adjustments for the harmonic and overexpressed cutoffs for ECHO and ECHO linear modelscan be adjusted through the harm_cut and over_cut parameters, respectively, though the defaults are recommended.

Now that you understand the basics of using the mosaic_find() function, feel free to experiment and play around with this vignette and the example data. Good luck with your multi-omics trend searches!