peprThis vignette will show you how and why to use the derived attributes functionality of the pepr package.
basic information about the PEP concept on the project website.
broader theoretical description in the derived attributes documentation section.
The example below demonstrates how to use the derived attributes to flexibly define the samples attributes the file_path column of the sample_table.csv file to match the file names in your project. Please consider the example below for reference:
| sample_name | protocol | organism | time | file_path | 
|---|---|---|---|---|
| pig_0h | RRBS | pig | 0 | data/lab/project/pig_0h.fastq | 
| pig_1h | RRBS | pig | 1 | data/lab/project/pig_1h.fastq | 
| frog_0h | RRBS | frog | 0 | data/lab/project/frog_0h.fastq | 
| frog_1h | RRBS | frog | 1 | data/lab/project/frog_1h.fastq | 
As the name suggests the attributes in the specified attributes (here: file_path) can be derived from other ones. The way how this process is carried out is indicated explicitly in the project_config.yaml file (presented below). The name of the column is determined in the sample_modifiers.derive.attributes key-value pair, whereas the pattern for the attributes construction - in the sample_modifiers.derive.sources one. Note that the second level key (here: source) has to exactly match the attributes in the file_path column of the modified sample_annotation.csv (presented below).
   pep_version: 2.0.0
   sample_table: sample_table.csv
   output_dir: $HOME/hello_looper_results
   sample_modifiers:
      derive:
          attributes: file_path
          sources:
              source1: $HOME/data/lab/project/{organism}_{time}h.fastq
              source2: 
  /path/from/collaborator/weirdNamingScheme_{external_id}.fastqLet’s introduce a few modifications to the original sample_annotation.csv file to map the appropriate data sources from the project_config.yaml with attributes in the derived column - [file_path]:
| sample_name | protocol | organism | time | file_path | 
|---|---|---|---|---|
| pig_0h | RRBS | pig | 0 | source1 | 
| pig_1h | RRBS | pig | 1 | source1 | 
| frog_0h | RRBS | frog | 0 | source1 | 
| frog_1h | RRBS | frog | 1 | source1 | 
Load pepr and read in the project metadata by specifying the path to the project_config.yaml:
library(pepr)
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_derive",
"project_config.yaml",
package = "pepr"
)
p = Project(projectConfig)
#> Loading config file: /private/var/folders/3f/0wj7rs2144l9zsgxd3jn5nxc0000gn/T/RtmpF0yVmb/Rinstbd5d643c109c/pepr/extdata/example_peps-master/example_derive/project_config.yamlAnd inspect it:
sampleTable(p)
#>    sample_name protocol organism time
#> 1:      pig_0h     RRBS      pig    0
#> 2:      pig_1h     RRBS      pig    1
#> 3:     frog_0h     RRBS     frog    0
#> 4:     frog_1h     RRBS     frog    1
#>                                            file_path
#> 1:  /Users/mstolarczyk/data/lab/project/pig_0h.fastq
#> 2:  /Users/mstolarczyk/data/lab/project/pig_1h.fastq
#> 3: /Users/mstolarczyk/data/lab/project/frog_0h.fastq
#> 4: /Users/mstolarczyk/data/lab/project/frog_1h.fastqAs you can see, the resulting samples are annotated the same way as if they were read from the original, unwieldy, annotations file.
What is more, the p object consists of all the information from the project config file (project_config.yaml). Run the following line to explore it:
config(p)
#> Config object. Class: Config
#>  pep_version: 2.0.0
#>  sample_table: 
#> /private/var/folders/3f/0wj7rs2144l9zsgxd3jn5nxc0000gn/T/RtmpF0yVmb/Rinstbd5d643c109c/pepr/extdata/example_peps-master/example_derive/sample_table.csv
#>  output_dir: /Users/mstolarczyk/hello_looper_results
#>  sample_modifiers:
#>     derive:
#>         attributes: file_path
#>         sources:
#>             source1: 
#> /Users/mstolarczyk/data/lab/project/{organism}_{time}h.fastq
#>             source2: 
#> /path/from/collaborator/weirdNamingScheme_{external_id}.fastq
#>  name: example_derive