AdhereR
from Python 3
While AdhereR
is written in R
and makes extensive use of various R
packages and techniques (such as data.table
and parallel processing), it is possible to use it from other programming languages and applications. This is accomplished through a very generic mecahnism that only requires the caller to be able to read and write files in a location of its choice and to invoke an external command with a set of arguments. These requirements are widely available in programming languages (such as C
/C++
, Java
, Python 2
and 3
, and R
itself), can be accomplished from the scripting available in several applications (e.g., VBA
in Microsoft Excel, STATA
scripting or SAS
programs), and works in similar ways across the major Operating Systems (Linux
flavors, macOS
/Mac OS X
/OS X
and Microsoft Windows
).
We present here this generic mechanism using its reference implementation for Python 3
. While this reference implementation is definitely useable in production environments, this can probably be improved both in terms of calling and passing data between Python
and R
, as well as in terms of the “pythonicity” of the Python
side of the implementation. Nevertheless, we hope this implementation will be useful to users of Python
that would like to access AdhereR
without switching to R
, and will provide a template and working example for further implementations that aim to make AdhereR
available to other programming languages and platforms.
The mechanism is very general, and is based on a wrapper being available on the caller platform (here, Python 3
) that performs the following general tasks:
AdhereR
that are of interest to the platform’s users;AdhereR
; in particular, it saves any datasets to be processed (as TAB-separated CSV files
) and writes the argument values to text file
(in a standardised format), all in a directory of its choice (henceforth, the data sharing directory);shell
mechanism to call R
(as it is installed on the caller system) and instructs it to execute a simple sequence of R
commands;R
commands load the AdhereR
package and execute the callAdhereR()
function from the package, passing it the path to the data sharing directory as its only argument;callAdhereR()
:
AdhereR
method(s) with the appropriate arguments,CSV
files or image files;R
has finished executing, and:
The full protocol is detailed in Appendix I.
AdhereR
from Python 3
We will use here a macOS
setup for illustration purposes, but this is very similar on the other supported OS
s. Essentially, the Python 3
wrapper creates the input files parameters.log
and dataset.csv
in the data sharing directory (let us denote it as DATA_SHARING_DIRECTORY
, by default, a unique temorary directory). Let’s assume that DATA_SHARING_DIRECTORY
is set to /var/folders/kx/bphryt7j5tz1n_fcjk5809940000gn/T/adherer-qmx4pw7t
; then, before calling AdhereR
, this directory should contain the files:
~
|-parameters.log
\-dataset.csv
Please note that R
must be properly installed on the system such that Rscript
(or Rscript.exe
on Windows) does exist and works; the Python 3
wrapper tries to locate it using a variety of strategies (in order, which
, followed by a set of standard locations on macOS
and Linux
or a set of standard Windows Registry Keys on MS Windows
) but if this fails or if the user wants to use a non-standard R
installation, the wrapper allows this through the exported function set_rscript_path()
. Let’s assume for now that Rscript
is located in /usr/local/bin/Rscript
and its automatic detection was successful (let us denote this path as RSCRIPT_PATH
).
With these path variables automatically or manually set, the Python 3
wrapper is ready to call AdhereR
:
import subprocess # allow shell calls
[...]
# Call adhereR:
rscript_cmd = '"' + RSCRIPT_PATH + '"' + ' --vanilla -e ' + \
'"library(AdhereR); ' + \
`callAdhereR(` + DATA_SHARING_DIRECTORY + '\')"'
return_code = subprocess.call(rscript_cmd, shell=True)
When the Rscript
process returns, return_code
should be 0
for success (in the sense of calling AdhereR
, not in the sense that AdhereR
also succeeded in the task it was assigned to do) or something else for errors.
If return_code != 0
, the process returns with a warning. Otherwise, an attempt is made to read the messages produced by AdhereR
(available in the Adherer-results.txt
file in the DATA_SHARING_DIRECTORY
directory) and checking if the last line begins with OK:
. If it does not, a warning contaning the messages is thrown and the process returns.
If it does, the appropriate output files are read, parsed and loaded (depending on the invoked function, these files might differ). For example, after successfully invoking CMA1
, the DATA_SHARING_DIRECTORY
might look like:
~
|-parameters.log
|-dataset.csv
|-Adherer-results.txt
\-CMA.csv
In this example, the wrapper would parse and load CMA.csv
as a pandas
table:
import pandas # load pandas
[...]
# Expecting CMA.csv
ret_val['CMA'] = pandas.read_csv(os.path.join(path_to_data_directory,
'CMA.csv'), sep='\t', header=0)
If plotting was requested, the resulting plot is also loaded using the PIL
/Pillow
library:
from PIL import Image # load images
[...]
# Load the produced image (if any):
ret_val['plot'] = Image.open(os.path.join((plot_save_to
if not (plot_save_to is None) else
DATA_SHARING_DIRECTORY),
'adherer-plot' + '.' + plot_save_as))
where plot_save_to
and plot_save_as
may specify where the plots are to be saved and in which format.
Python 3
wrapper: the adherer
moduleThe reference implementation is contained in single file (adherer.py
) included with the R
AdhereR
package and whose location can be obtained using the function getCallerWrapperLocation(full.path=TRUE)
from AdhereR
(N.B. it is located in the directory where the AdhereR
package is installed on the system, subdirectory wrappers/python3/adherer.py
; for example, on the example macos
machine this is /Library/Frameworks/R.framework/Versions/3.4/Resources/library/AdhereR/wrappers/python3/adherer.py
). In the future, as more wrappers are added, the argument callig.platform
will allow the selection of the desired wrapper (now it is implicitely set to python3
).
This file can be either:
Python
’s “module search paths” (a list of directories comprising, in order, the directory contining the input script, the environment variable PYTHONPATH
, and a default location; see here for details), in which cae it can be simply imported using the standard Python
syntax (e.g. import adherer
or import adherer as ad
), orPYTHONPATH
environment variable [the recommended solution], orOn the example macos
machine, this can be achiveded by adding:
# Add AdhereR to PYTHONPATH
export PYTHONPATH=$PYTHONPATH:/Library/Frameworks/[...]/AdhereR/wrappers/python3
to the .bash_profile
file in the user’s home folder (if this file does not exist, then it can be created using a text editor such as nano
; please note that the [...]
are for shortening the path and should be replaced by the actual path given in full above). The process should be very simular on Linux
, while on MS Windows
one should use the system’s “Environment Variables” settings (for example, see here for details).
NOTE: we will consistently use AdhereR
to refer to the R
package, and adherer
to refer to the Python 3
module.
adherer
module and initializationsThus, the reference implementation is technically a module
called adherer
that can be imported in your code (we assume here the recommended solution, but see above for other ways of doing it):
When the adherer
module is imported for the first time, it runs the following initialization code:
R
is installed on the system. More precisely, it looks for Rscript
(or Rscript.exe
on Windows) using several strategies, in order: which
, followed by a set of standard locations on macOS
(/usr/bin/Rscript
, /usr/local/bin/Rscript
, /opt/local/bin/Rscript
and /Library/Frameworks/R.framework/Versions/Current/Resources/bin/Rscript
) and Linux
(/usr/bin/Rscript
, /usr/local/bin/Rscript
, /opt/local/bin/Rscript
and ~/bin/Rscript
) or a set of standard Windows Registry Keys on MS Windows
(HKEY_CURRENT_USER\SOFTWARE\R-core\R
, HKEY_CURRENT_USER\SOFTWARE\R-core\R32
, HKEY_CURRENT_USER\SOFTWARE\R-core\R64
, HKEY_LOCAL_MACHINE\SOFTWARE\R-core\R
, HKEY_LOCAL_MACHINE\SOFTWARE\R-core\R32
and HKEY_LOCAL_MACHINE\SOFTWARE\R-core\R64
) which should contain Current Version
and InstallPath
(with the added complexity that 64 bits Windows hists both 32 and 64 bits regsistries). This procedure is inspired by the way RStudio
checks for R
: - if this process fails, a warning is thrown instructing the user to manually set the path using the set_rscript_path()
function exposed by the adherer
module, and sets the internal variable _RSCRIPT_PATH
to None
(which insures that all future calls to AdhereR
will fail; - if the process succeeds, it checks if the AdhereR
package is installed for the detected R
and has a correct version:
_RSCRIPT_PATH
is set to None
;adherer-
) with read and write access for the current user: - if this fails, it throws a warning instructing the user to manually set this to a directory with read & write access using the set_data_sharing_directory()
function, and sets the internal variable _DATA_SHARING_DIRECTORY
to None
(ensuring that calls to AdhereR
will fail); - if it succeeds, the initialization code is considered to have finished successfully; also, on exit this temporary _DATA_SHARING_DIRECTORY
is automatically deleted.The adherer
module tries to emulate the same philosophy as the AhereR
package, where various types of CMAs (“continuous multiple-interval measures of medication availability/gaps”) that implement different ways of computing adherence encapsulate the data on which they were computed, the various relevant parameter values used, as well as the results of the computation.
Here, we implemented this through the following class hierarchy (image generated with pyreverse, not showing the private attributes):
We will discuss now the main classes in turn.
CallAdhereRError
exception classErrors in the adherer
code are signalled by throwing CallAdhereRError
exceptions (the red class shown in the bottom right corner).
CMA0
All classes that implement ways to compute adherence (CMA
s) are derived from CMA0
. CMA0
does not in itself compute any sort of adherence, but instead provides the infrastructure for storing the data, parameter values and results (including errors), and for interacting with AdhereR
. Please note that in the “higher” CMA
s, the class constructor __init()__
implicitely performs the actual computation of the CMA
and saves the results (for CMA0
there are no such computations and `init()’ only saves the parameters internally)!
CMA0
allows the user to set various parameters through the constructor __init()__
, parameters that are stored for later use, printing, saving, and for facilitating easy reconstruction of all types of computations. By main groups, these are (please, see the manual entry for ?CMA0
in the AdhereR
package and the full protocol in Appendix I as well):- `dataset` stores the primary data (as a `Pandas` table with various columns) containing the actual events; must be given;
- `id_colname`, `event_date_colname`, `event_duration_colname`, `event_daily_dose_colname` and `medication_class_colname`: these give the *names* of the columns in the `dataset` table containing important information about the events (the first three are required, the last two are optional);
- `carryover_within_obs_window`, `carryover_into_obs_window`, `carry_only_for_same_medication`, `consider_dosage_change`, `medication_change_means_new_treatment_episode`, `maximum_permissible_gap`, `maximum_permissible_gap_unit`: optional parameters defining the types of carry-over, changes and treatment episode triggers;
- `followup_window_start_type`, `followup_window_start `followup_window_start_unit`, `followup_window_duration_type `followup_window_duration`, `followup_window_duration_unit`, `observation_window_start_type`, `observation_window_start`, `observation_window_start_unit `observation_window_duration_type`, `observation_window_duration`, `observation_window_duration_unit`: optional parameters defining the follow-up and observation windows;
- `sliding_window_start_type`, `sliding_window_start`, `sliding_window_start_unit`, `sliding_window_duration_type`, `sliding_window_duration`, `sliding_window_duration_unit`, `sliding_window_step_duration_type`, `sliding_window_step_duration`, `sliding_window_step_unit`, `sliding_window_no_steps`: optional parameters defining the sliding windows;
- `cma_to_apply`: optional parameter specifying which "simple" `CMA` is to be used when computing sliding windos and treatment episodes;
- `date_format`: optional parameter describing the format of column dates in `dataset` (defaults to month/day/year);
- `event_interval_colname`, `gap_days_colname`: optional parameters allowing the user to change the names of the columns where these computed data are stored in the resuling table;
- `force_na_cma_for_failed_patients`, `keep_window_start_end_dates`, `remove_events_outside_followup_window`, `keep_event_interval_for_all_events`: optional parameters governing the content of the resuling table;
- `parallel_backend`, `parallel_threads`: these optional parameters control the parallelism of the computations (if any); see **PARALLEL PROCESSING** for details;
- `suppress_warnings`: should all the internal warning be shown?
- `save_event_info`: should this "advanced" info be also made available?
- `na_symbol_numeric`, `na_symbol_string`, `logical_symbol_true`, `logical_symbol_false`, `colnames_dot_symbol`, `colnames_start_dot`: these optional parameters allow `AdhereR` to adapt to "non-`R`" conventions concerning the data format for missing values, logicals and column names;
- `path_to_rscript`, `path_to_data_directory`: these parameters allow the user to override the `_RSCRIPT_PATH` and `_DATA_SHARING_DIRECTORY` variables;
- `print_adherer_messages`: should the important messages be printed to the user as well?
- `get_dataset()`: returns the internally saved `Pandas` table `dataset`;
- `get_cma()`: returns the computed `CMA` (if any);
- `get_event_info()`: returns the computed event information (if any);
- `get_treatment_episodes()`: returns the computed treatment episodes information (if any);
2.5. `get_computation_results()`: return the results of the last computation (if any); more precisely, a `dictionary` containing the numeric `code` returned by `AdhereR` and the string `messages` written by `AdhereR` during the computation;
computing event interval and treatment episode info: this can be done by explicitelly calling the compute_event_int_gaps()
and compute_treatment_episodes()
functions;
plotting:
- *static plotting*: this is realized by the `plot()` function that takes several plotting-specific parameters:
+ `patients_to_plot`: should a subset of the patients present in the `dataset` be plotted (by default, all will be)?
+ `save_to`, `save_as`, `width`, `height`, `quality`, `dpi`: where should the plot be saved, in what format, dimentions and quality?
+ `duration`, `align_all_patients`, `align_first_event_at_zero`, `show_period`, `period_in_days`: duration to plots and alignment of patients;
+ `show_legend`, `legend_x`, `legend_y`, `legend_bkg_opacity`: legend parameters;
+ `cex`, `cex_axis`, `cex_lab`: the relative size of various text elements;
+ `show_cma`, `print_cma`, `plot_cma`, `plot_cma_as_histogram`, `cma_plot_ratio`, `cma_plot_col`, `cma_plot_border`, `cma_plot_bkg`, `cma_plot_text`: should the cma be shown and how?
+ `unspecified_category_label`: implicit label of unlabelled categories?
+ `lty_event`, `lwd_event`, `pch_start_event`, `pch_end_event`, `show_event_intervals`, `col_na`, `col_continuation`, `lty_continuation`, `lwd_continuation`: visual aspects of events and continuations;
+ `highlight_followup_window`, `followup_window_col`, `highlight_observation_window`, `observation_window_col`, `observation_window_density`, `observation_window_angle`, `show_real_obs_window_start`, `real_obs_window_density`, `real_obs_window_angle`: visual appearance of the follow-up, obervation and "real observation" windows (the latter for `CMA`s that djust it);
+ `bw_plot`: produce a grayscel plot?
- *interactive plotting*: the `plot_interactive()` function launches a [Shiny](https://shiny.rstudio.com/)-powered interactive plot using the system's WEB browser; the only parameter `patient_to_plot` may specify which patient to show initially, as all the relevant parameters can be interactively altered ar run-time;
printing: the __repr__()
function implements a very simple printing mechanism showing the CMA
type and a summary of the dataset
;
calling AdhereR
: the private function _call_adherer()
is the real workhorse that manages all the interaction with the R
AdhereR
package as described above. This function can take many parameters covering all that AdhereR
can do, but it is not intended to be directly called by the end-user but instead to be internally called by various exposed functions such as plot()
, compute_event_int_gaps()
and __init()__
. Roughly, after some checks, it creates the files needed for communication, calls AdhereR
, analyses any errors, warnings and messages that it might have generated, and packs the results in a manageable format.
To preserve the generality of the interaction with AdhereR
, all the CMA
classes define a private static member _adherer_function
which is the name of the corresponding S3
class as implemented in AdhereR
.
CMA1
and its daughter classes CMA2
, CMA3
and CMA4
CMA1
is derived from CMA0
by redefining the __init__()
constructor to (a) take only a subset of arguments relevant for the CMA
s 1–4 (see the AdhereR
help for them), and (b) to internally call _call_adherer()
with these parameters. It checks if the result of _call_adherer()
signals an error, in which case ir throws a CallAdhereRError
exception, otherwise packing the code, messages, cma and (possibly) event information in the corresponding member variables for later access.
Due to the generic mechanism implemented by _adherer_function
, CMA2
, CMA3
and CMA4
are derived directly from CMA1
but only redefine _adherer_function
appropriately.
CMA5
and its daughter classes CMA6
, CMA7
, CMA8
and CMA9
The same story applies here, with CMA5
being derived from CMA0
and redefining __init__()
, with CMA6
–CMA9
only using the _adherer_function
mechanism. Compared with CMA1
, CMA5
defines new required arguments related to medication type and dosage.
CMAPerEpisode
and CMASlidingWindow
Just like CMA1
and CMA5
, these two require specific parameters and are thus derived directly from CMA0
(but, in contrast, they don’t have their own derived classes).
Below we show some examples of using the Python 3
reference wrapper. We are using IPython
from the Spyder 3
environment; the In [n]:
represents the input prompt, the ...:
the continuation of the input on the following line(s), and Out[n]:
the produced output.
adherer
and checking autodetectionPython 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31)
Type "copyright", "credits" or "license" for more information.
IPython 6.3.0 -- An enhanced Interactive Python.
In [1]: # Import adherer as ad:
...: import adherer as ad
In [2]: # Show the _DATA_SHARING_DIRECTORY (should be set automatically to a temporary location):
...: ad._DATA_SHARING_DIRECTORY.name
Out[2]: '/var/folders/kx/bphryt7j5tz1n_fcjk5809940000gn/T/adherer-05hdq6el'
In [3]: # Show the _RSCRIPT_PATH (should be dectedt automatically):
...: ad._RSCRIPT_PATH
Out[3]: '/usr/local/bin/Rscript'
Everything seems fine!
R
and import it in Python
Let’s export the sample dataset med.events
included in AdhereR
as a TAB-separated CSV file in a location for use here (please note that this must be done from an R
console, such as from RStudio
, and not from Python
!):
R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(AdhereR) # load the AdhereR package
> head(med.events) # see how the included med.events dataset looks like
PATIENT_ID DATE PERDAY CATEGORY DURATION
286 1 04/26/2033 4 medA 50
287 1 07/04/2033 4 medB 30
288 1 08/03/2033 4 medB 30
289 1 08/17/2033 4 medB 30
291 1 10/13/2033 4 medB 30
290 1 10/16/2033 4 medB 30
> write.table(med.events, file="~/Temp/med-events.csv", quote=FALSE, sep="\t", row.names=FALSE, col.names=TRUE) # save med.events as TAB-separated CSV file in a location (here, in the Temp folder)
>
Now, back to Python
:
In [4]: # Import Pandas as pd:
...: import pandas as pd
In [5]: # Load the test dataset
...: df = pd.read_csv('~/Temp/med-events.csv', sep='\t', header=0)
In [6]: # Let's look at first 6 rows (it should match the R output above except for the row names):
...: df.head(6)
Out[6]:
PATIENT_ID DATE PERDAY CATEGORY DURATION
0 1 04/26/2033 4 medA 50
1 1 07/04/2033 4 medB 30
2 1 08/03/2033 4 medB 30
3 1 08/17/2033 4 medB 30
4 1 10/13/2033 4 medB 30
5 1 10/16/2033 4 medB 30
All good so far, the data was imported successfully as a Pandas
table.
Now let’s compute CMA8
on these data in Python
:
In [7]: # Compute CMA8 as a test:
...: cma8 = ad.CMA8(df,
...: id_colname='PATIENT_ID',
...: event_date_colname='DATE',
...: event_duration_colname='DURATION',
...: event_daily_dose_colname='PERDAY',
...: medication_class_colname='CATEGORY')
...:
Adherer returned code 0 and said:
AdhereR 0.2.0 on R 3.4.3 started at 2018-06-04 22:27:10:
OK: the results were exported successfully (but there might be warnings and messages above worth paying attention to)!
We can see that things went pretty well, as no exceptions were being thrown and the message starts with a reassuring Adherer returned code 0
, followed by precisely what AdhereR
said:
AdhereR 0.2.0 on R 3.4.3 started at 2018-06-04 22:27:10:
: first, self-identification (its own version and R
’s version), followed by the date and time the processing was initated;OK: the results were exported successfully (but there might be warnings and messages above worth paying attention to)!
, which means that basically all seems allright but that there might still be some messages or warning displayed above that could be informative or point to subtler issues.Let’s see how these results look like:
In [8]: # Summary of cma8:
...: cma8
Out[8]: CMA object of type CMA8 (on 1080 rows).
In [9]: # The return value and messages:
...: cma8.get_computation_results()
Out[9]:
{'code': 0,
'messages': ['AdhereR 0.2.0 on R 3.4.3 started at 2018-06-04 22:27:10:\n',
'OK: the results were exported successfully (but there might be warnings and messages above worth paying attention to)!\n']}
In [10]: # The CMA (the first 6 rows out of all 100):
...: cma8.get_cma().head(6)
Out[10]:
PATIENT_ID CMA
0 1 0.947945
1 2 0.616438
2 3 0.994521
3 4 0.379452
4 5 0.369863
5 6 0.406849
In [11]: # Plot it (statically):
...: cma8.plot(patients_to_plot=['1', '2', '3'],
...: align_all_patients=True,
...: period_in_days=30,
...: cex=0.5)
...:
Adherer returned code 0 and said:
AdhereR 0.2.0 on R 3.4.3 started at 2018-06-05 13:22:36:
OK: the results were exported successfully (but there might be warnings and messages above worth paying attention to)!
Out[11]:
The output produced in this case (Out[11]
) consists of the actual image plotted in the IPython
console (thanks to the PIL
/Pillow
package) and reproduced below:
Now, we turn again to R
:
> # Compute the same CMA8 in R:
> cma8 <- CMA8(data=med.events,
+ ID.colname="PATIENT_ID",
+ event.date.colname="DATE",
+ event.duration.colname="DURATION",
+ medication.class.colname="CATEGORY",
+ event.daily.dose.colname="PERDAY"
+ )
>
> # The computed CMA:
> head(getCMA(cma8))
PATIENT_ID CMA
1 1 0.9479452
2 2 0.6164384
3 3 0.9945205
4 4 0.3794521
5 5 0.3698630
6 6 0.4068493
>
> # Plot the cma:
> plot(cma8,
+ patients.to.plot=c("1", "2", "3"),
+ align.all.patients=TRUE,
+ period.in.days=30,
+ cex=0.5)
>
Again, the output is an image (here, shown in the “Plots” panel in RStudio
):
It can be seen that, except for the slightly different dimensions (and x/y ratio and quality) due to the actual plotting and exporting, the images show identical patterns.
We will initate now an interactive plotting session from Python
:
The output is represented by an interactive session in the default browser; below is a screenshot of this session in Firefox
: The interactive session ends by pressing the “Exit” button in the browser (and optinally also closing the browser tab/window), at which point the usual text output is provided to Python
and a True
value signalling success is returned:
Adherer returned code 0 and said:
AdhereR 0.2.0 on R 3.4.3 started at 2018-06-05 18:01:28:
OK: the results were exported successfully (but there might be warnings and messages above worth paying attention to)!
Out[12]: True
Please note that it does not matter how the interactive session is started, as it only needs access to the base CMA0
object and, more precisely, the raw dataset; all relevant parameters, including the CMA type can be changed interactively (this is why the CMA shown in the screenshot is CMA1
even if the function cma8.plot_interactive()
was initiated from a CMA8
object).
AdhereR
uses R
’s parallel procesing capacities to split expensive computations and distribute them across multiple CPUs/cores in a single computer or even across a network of computers. As an example, we will compute here CMA1
across sliding windows on the whole dataset, first in R
and then in Python 3
.
The default mode of computation uses just a single CPU/core on the local machine.
R
> # Sliding windows with CMA1 (single thread, locally):
> cma1w.1l <- CMA_sliding_window(CMA.to.apply="CMA1",
+ data=med.events,
+ ID.colname='PATIENT_ID',
+ event.date.colname='DATE',
+ event.duration.colname='DURATION',
+ event.daily.dose.colname='PERDAY',
+ medication.class.colname='CATEGORY',
+ sliding.window.duration=30,
+ sliding.window.step.duration=30,
+ parallel.backend="none",
+ parallel.threads=1)
> head(getCMA(cma1w.1l))
PATIENT_ID window.ID window.start window.end CMA
1 1 1 2033-04-26 2033-05-26 NA
2 1 2 2033-05-26 2033-06-25 NA
3 1 3 2033-06-25 2033-07-25 NA
4 1 4 2033-07-25 2033-08-24 2.142857
5 1 5 2033-08-24 2033-09-23 NA
6 1 6 2033-09-23 2033-10-23 10.000000
>
Python 3
In [13]: # Sliding windows with CMA1 (single thread, locally):
...: cma1w_1l = ad.CMASlidingWindow(dataset=df,
...: cma_to_apply="CMA1",
...: id_colname='PATIENT_ID',
...: event_date_colname='DATE',
...: event_duration_colname='DURATION',
...: event_daily_dose_colname='PERDAY',
...: medication_class_colname='CATEGORY',
...: sliding_window_duration=30,
...: sliding_window_step_duration=30,
...: parallel_backend="none",
...: parallel_threads=1)
...:
Adherer returned code 0 and said:
AdhereR 0.2.0 on R 3.4.3 started at 2018-06-06 15:19:07:
OK: the results were exported successfully (but there might be warnings and messages above worth paying attention to)!
In [14]: cma1w_1l.get_cma().head(6)
Out[14]:
PATIENT_ID window.ID window.start window.end CMA
0 1 1 04/26/2033 05/26/2033 NaN
1 1 2 05/26/2033 06/25/2033 NaN
2 1 3 06/25/2033 07/25/2033 NaN
3 1 4 07/25/2033 08/24/2033 2.142857
4 1 5 08/24/2033 09/23/2033 NaN
5 1 6 09/23/2033 10/23/2033 10.000000
If the local machine has multiple CPUs/cores (even with hyperthreading), it might make sense to use them for lengthy computations. AdhereR
can use several backends (as provided by the parallel
package in R
), of which the most used are “multicore” (preffered on Linux
and macOS
but currently not available on Windows
) and “SNOW” (on all three OS’s). AdhereR
is smart enough to use “SNOW” on Windows
even if “multicore” was requested.
R
> # Sliding windows with CMA1 (two threads, multicore, locally):
> cma1w.2ml <- CMA_sliding_window(CMA.to.apply="CMA1",
+ data=med.events,
+ ID.colname='PATIENT_ID',
+ event.date.colname='DATE',
+ event.duration.colname='DURATION',
+ event.daily.dose.colname='PERDAY',
+ medication.class.colname='CATEGORY',
+ sliding.window.duration=30,
+ sliding.window.step.duration=30,
+ parallel.backend="multicore", # <--- multicore
+ parallel.threads=2)
> head(getCMA(cma1w.2ml))
PATIENT_ID window.ID window.start window.end CMA
1 1 1 2033-04-26 2033-05-26 NA
2 1 2 2033-05-26 2033-06-25 NA
3 1 3 2033-06-25 2033-07-25 NA
4 1 4 2033-07-25 2033-08-24 2.142857
5 1 5 2033-08-24 2033-09-23 NA
6 1 6 2033-09-23 2033-10-23 10.000000
>
> cma1w.2sl <- CMA_sliding_window(CMA.to.apply="CMA1",
+ data=med.events,
+ ID.colname='PATIENT_ID',
+ event.date.colname='DATE',
+ event.duration.colname='DURATION',
+ event.daily.dose.colname='PERDAY',
+ medication.class.colname='CATEGORY',
+ sliding.window.duration=30,
+ sliding.window.step.duration=30,
+ parallel.backend="snow", # <--- SNOW
+ parallel.threads=2)
> head(getCMA(cma1w.2sl))
PATIENT_ID window.ID window.start window.end CMA
1 1 1 2033-04-26 2033-05-26 NA
2 1 2 2033-05-26 2033-06-25 NA
3 1 3 2033-06-25 2033-07-25 NA
4 1 4 2033-07-25 2033-08-24 2.142857
5 1 5 2033-08-24 2033-09-23 NA
6 1 6 2033-09-23 2033-10-23 10.000000
>
Python 3
In [15]: # Sliding windows with CMA1 (two threads, multicore, locally):
...: cma1w_2ml = ad.CMASlidingWindow(dataset=df,
...: cma_to_apply="CMA1",
...: id_colname='PATIENT_ID',
...: event_date_colname='DATE',
...: event_duration_colname='DURATION',
...: event_daily_dose_colname='PERDAY',
...: medication_class_colname='CATEGORY',
...: sliding_window_duration=30,
...: sliding_window_step_duration=30,
...: parallel_backend="multicore", # <--- multicore
...: parallel_threads=2)
...:
Adherer returned code 0 and said:
AdhereR 0.2.0 on R 3.4.3 started at 2018-06-07 11:44:49:
OK: the results were exported successfully (but there might be warnings and messages above worth paying attention to)!
In [16]: cma1w_2ml.get_cma().head(6)
Out[16]:
PATIENT_ID window.ID window.start window.end CMA
0 1 1 04/26/2033 05/26/2033 NaN
1 1 2 05/26/2033 06/25/2033 NaN
2 1 3 06/25/2033 07/25/2033 NaN
3 1 4 07/25/2033 08/24/2033 2.142857
4 1 5 08/24/2033 09/23/2033 NaN
5 1 6 09/23/2033 10/23/2033 10.000000
In [17]: # Sliding windows with CMA1 (two threads, snow, locally):
...: cma1w_2sl = ad.CMASlidingWindow(dataset=df,
...: cma_to_apply="CMA1",
...: id_colname='PATIENT_ID',
...: event_date_colname='DATE',
...: event_duration_colname='DURATION',
...: event_daily_dose_colname='PERDAY',
...: medication_class_colname='CATEGORY',
...: sliding_window_duration=30,
...: sliding_window_step_duration=30,
...: parallel_backend="snow", # <--- SNOW
...: parallel_threads=2)
...:
Adherer returned code 0 and said:
AdhereR 0.2.0 on R 3.4.3 started at 2018-06-07 11:44:49:
OK: the results were exported successfully (but there might be warnings and messages above worth paying attention to)!
In [18]: cma1w_2sl.get_cma().head(6)
Out[18]:
PATIENT_ID window.ID window.start window.end CMA
0 1 1 04/26/2033 05/26/2033 NaN
1 1 2 05/26/2033 06/25/2033 NaN
2 1 3 06/25/2033 07/25/2033 NaN
3 1 4 07/25/2033 08/24/2033 2.142857
4 1 5 08/24/2033 09/23/2033 NaN
5 1 6 09/23/2033 10/23/2033 10.000000
Sometimes it is better to use one or more powerful machines over a network to do very expensive computations, usually, a Linux
cluster from a Windows
/macos
laptop. AdhereR
leverages the power of R
’s snow
package (as exposed through the parallel
package) to distribute workloads across a network of computing nodes. There are several types of “Simple Network of Workstations” (snow
), described in the package’s manual. For example, one may use an already existing MPI
(Message Passing Interface) cluster, but an even simpler setup (and the one that we will illustrate here) involves a collection of machines running Linux
and connected to a network (local or even over the Internet).
The machines are called workhorse1
and workhorse2
, have differen hardware configurations (both sport quad-core i7 CPUs of different generations with 16Gb RAM) but run the same version of Ubuntu 16.04
and R 3.4.2
(not a requirement, as the architecture can be seriously heterogeneous, combining different OS’s and versions of R
). These two machines are connected to the same WiFi router (but they could be on different networks or even across the Internet). The “master” is the same macOS
laptop used before, connected to the same WiFi router (not a requirement).
As pre-requisites, the worker machines should allow SSH
access (for easiness, we use here passwordless SSH access from the “master”; see for example here for this setup) and should have the snow
package installed in R
. Let’s assume that the username allowing ssh
into the workers is user
, so that
works with no password needed. With these, we can distribute our processing to the two “workers” (two parallel threads for each, totalling 4 parallel threads):
R
> # Sliding windows with CMA1 (two remote machines with two threads each):
> # First, we need to specify the workers
> # This is a list of lists!
> # rep(,2) means that we generate two threads on each worker
> workers <- c(rep(list(list(host="workhorse1", # hostname (make sure this works from the "master", otherwise use the IP-address)
+ user="user", # the username that can ssh into the worker (passwordless highly recommended)
+ rscript="/usr/local/bin/Rscript", # the location of Rscript on the worker
+ snowlib="/usr/local/lib64/R/library/")), # the location of the snow package on the worker
+ 2),
+ rep(list(list(host="workhorse2",
+ user="user",
+ rscript="/usr/local/bin/Rscript",
+ snowlib="/usr/local/lib64/R/library/")),
+ 2));
>
> cma1w.2sw <- CMA_sliding_window(CMA="CMA1",
+ data=med.events,
+ ID.colname="PATIENT_ID",
+ event.date.colname="DATE",
+ event.duration.colname="DURATION",
+ event.daily.dose.colname="PERDAY",
+ medication.class.colname="CATEGORY",
+ carry.only.for.same.medication=FALSE,
+ consider.dosage.change=FALSE,
+ sliding.window.duration=30,
+ sliding.window.step.duration=30,
+ parallel.backend="snow",
+ parallel.threads=workers)
> head(getCMA(cma1w.2sw))
PATIENT_ID window.ID window.start window.end CMA
1 1 1 2033-04-26 2033-05-26 NA
2 1 2 2033-05-26 2033-06-25 NA
3 1 3 2033-06-25 2033-07-25 NA
4 1 4 2033-07-25 2033-08-24 2.142857
5 1 5 2033-08-24 2033-09-23 NA
6 1 6 2033-09-23 2033-10-23 10.000000
>
Python 3
A quick for Python
is that due to the communication protocol between the wrapper and AdhereR
, the specification of the computer cluster must be a one-line string literally contaning the R
code defining it, string that will be verbatim parsed and interpreted by AdhereR
:
In [19]: # Sliding windows with CMA1 (two remote machines with two threads each):
...: # The workers are defined as *literal R code* this is verbatim sent to AdhereR for parsing and interpretation
...: # Please note, however, that this string should not contain line breaks (i.e., it should be a one-liner):
...: workers = 'c(rep(list(list(host="workhorse1", user="user", rscript="/usr/local/bin/Rscript", snowlib="/usr/local/lib64/R/library/")), 2), rep(list(list(host="workhorse2", user="user", rscript="/usr/local/bin/Rscript", snowlib="/usr/local/lib64/R/library/")), 2))'
In [20]: cma1w_2sw = ad.CMASlidingWindow(dataset=df,
...: cma_to_apply="CMA1",
...: id_colname='PATIENT_ID',
...: event_date_colname='DATE',
...: event_duration_colname='DURATION',
...: event_daily_dose_colname='PERDAY',
...: medication_class_colname='CATEGORY',
...: sliding_window_duration=30,
...: sliding_window_step_duration=30,
...: parallel_backend="snow",
...: parallel_threads=workers)
Adherer returned code 0 and said:
AdhereR 0.2.0 on R 3.4.3 started at 2018-06-07 13:22:21:
OK: the results were exported successfully (but there might be warnings and messages above worth paying attention to)!
In [21]: cma1w_2sw.get_cma().head(6)
Out[21]:
PATIENT_ID window.ID window.start window.end CMA
0 1 1 04/26/2033 05/26/2033 NaN
1 1 2 05/26/2033 06/25/2033 NaN
2 1 3 06/25/2033 07/25/2033 NaN
3 1 4 07/25/2033 08/24/2033 2.142857
4 1 5 08/24/2033 09/23/2033 NaN
5 1 6 09/23/2033 10/23/2033 10.000000
While this is a very good way to transparently distribute processing to more powerful nodes over a network, there are several (potential) issues one must be aware of:
it may be very hard to debug failures: failures of this might result from network issues, firewals blocking connections, incorrect SSH
setup on the “workers” or errors in accesing the “workers” with the given user accounts; see, for examples, discussion here and here in case you need to solve such problems;
latency over the network: starting the “workers” and especially transmitting the data to the “workers” and the results back to the “master” may take a non-negligible time, especially on slow networks (such as the Internet) and for large datasets; therefore, the best scenarios would involve relatively large computations (but not too large; see below) distributed to several nodes over a fast network;
you need to wait for the results: this process assumes that the “master” will wait for the “workers” to finish and return their results; thus, putting the “master” to sleep, shutting it down or disconnecting it from the network will probably result in not being able to collect the resuls back. If one needs very long computations (say 3+ hours), offline mobility or the network is unreliable, we would suggest setting up a separate compute process (that may itself parallelise computations) on the remote machines using, for example, screen
, nohup
or a more specialised cluster management platform such as Son of a Grid Engine (SGE).
All arguments are written to the text file parameters.log
; the input data are in the TAB-separated no quotes file dataset.csv
. The call returns any errors, warning and messages in the text file Adherer-results.txt
file, and the actual results as TAB-separated no quotes files (not all necessarily produced, depending on the specific methods called) CMA.csv
, EVENTINFO.csv
and TREATMENTEPISODES.csv
, and various image file(s). The argument values in the parameters.log
are contained between single (' '
) or double (" "
) quotes.
Some are required and must be explicitly defined, but for most we can use implicit values (i.e., if the user doesn’t set them explicitly, we may simply not specify them to the parameters.log
file and the default values in AdhereR
will be used).
PARAMETER | MEANING | DEFAULT VALUE IF MISSING | PYHTON 3 | STATA |
---|---|---|---|---|
NA.SYMBOL.NUMERIC |
the numeric missing data symbol | NA |
NA |
. |
NA.SYMBOL.STRING |
the string missing data symbol | NA |
NA |
"" |
LOGICAL.SYMBOL.TRUE |
the logical TRUE symbol |
TRUE |
TRUE |
1 |
LOGICAL.SYMBOL.FALSE |
the logical FALSE symbol |
FALSE |
FALSE |
0 |
COLNAMES.DOT.SYMBOL |
can we use . in column names, and if not, what to replace it with? |
. |
. |
_ |
COLNAMES.START.DOT |
can begin column names with . (or equivalent symbol), and if not, what to replace it with? |
. |
. |
internal_ |
Possible values are:
CMA0
,CMA1
…CMA9
,CMA_per_episode
,CMA_sliding_window
,compute.event.int.gaps
,compute.treatment.episodes
andplot_interactive_cma
.For all the CMA
functions (i.e., CMA0
, CMA1
…CMA9
, CMA_per_episode
, CMA_sliding_window
) one can ask for a plot of (a subset) of the patients, in which case the parameter plot.show
must be TRUE
, and there are several plotting-specific parameters that can be set:
PARAMETER | REQUIRED | DEFAULT_VALUE | POSSIBLE_VALUES |
---|---|---|---|
function |
YES | "CMA0" |
can also be "CMA0" for plotting! |
plot.show |
NO | "FALSE" |
[do the plotting? If TRUE , save the resulting dataset with a "-plotted" suffix to avoid overwriting previous results] |
plot.save.to |
NO | "" |
[the folder where to save the plots (by default, same folder as the results)] |
plot.save.as |
NO | "jpg" |
"jpg" , "png" , "tiff" , "eps" , "pdf" [the type of image to save] |
plot.width |
NO | "7" |
[plot width in inches] |
plot.height |
NO | "7" |
[plot height in inches] |
plot.quality |
NO | "90" |
[plot quality (applies only to some types of plots] |
plot.dpi |
NO | "150" |
[plot DPI (applies only to some types of plots] |
plot.patients.to.plot |
NO | "" |
[the patient IDs to plot (if missing, all patients) given as "id1;id2; .. ;idn" ] |
plot.duration |
NO | "" |
[duration to plot in days (if missing, determined from the data)] |
plot.align.all.patients |
NO | "FALSE" |
[should all patients be aligned? and, if so, place the first event as the horizontal 0?] |
plot.align.first.event.at.zero |
NO | "TRUE" |
|
plot.show.period |
NO | "days" |
"dates" , "days" [draw vertical bars at regular interval as dates or days?] |
plot.period.in.days |
NO | "90" |
[the interval (in days) at which to draw vertical lines] |
plot.show.legend |
NO | "TRUE" |
[legend params and position] |
plot.legend.x |
NO | "bottom right" |
|
plot.legend.y |
NO | "" |
|
plot.legend.bkg.opacity |
NO | "0.5" |
[background opacity] |
plot.cex |
NO | "1.0" |
[various plotting font sizes] |
plot.cex.axis |
NO | "0.75" |
|
plot.cex.lab |
NO | "1.0" |
|
plot.show.cma |
NO | "TRUE" |
[show the CMA type] |
plot.unspecified.category.label |
NO | "drug" |
[the label of the unspecified category of medication] |
plot.lty.event |
NO | "solid" |
[style parameters controlling the plotting of events] |
plot.lwd.event |
NO | "2" |
|
plot.pch.start.event |
NO | "15" |
|
plot.pch.end.event |
NO | "16" |
|
plot.show.event.intervals |
NO | "TRUE" |
[show the actual prescription intervals] |
plot.col.na |
NO | "lightgray" |
[colour for missing data] |
plot.col.continuation |
NO | "black" |
[colour, style and width of the continuation lines connecting consecutive events] |
plot.lty.continuation |
NO | "dotted" |
|
plot.lwd.continuation |
NO | "1" |
|
plot.print.CMA |
NO | "TRUE" |
[print CMA next to the participant’s ID?] |
plot.plot.CMA |
NO | "TRUE" |
[plot the CMA next to the participant ID?] |
plot.plot.CMA.as.histogram |
NO | "TRUE" |
[lot CMA as a histogram or as a density plot?] |
plot.CMA.plot.ratio |
NO | "0.10" |
[the proportion of the total horizontal plot to be taken by the CMA plot] |
plot.CMA.plot.col |
NO | "lightgreen" |
[attributes of the CMA plot] |
plot.CMA.plot.border |
NO | "darkgreen" |
|
plot.CMA.plot.bkg |
NO | "aquamarine" |
|
plot.CMA.plot.text |
NO | "" |
[by default, the same as plot.CMA.plot.border ] |
plot.highlight.followup.window |
NO | "TRUE" |
|
plot.followup.window.col |
NO | "green" |
|
plot.highlight.observation.window |
NO | "TRUE" |
|
plot.observation.window.col |
NO | "yellow" |
|
plot.observation.window.density |
NO | "35" |
|
plot.observation.window.angle |
NO | "-30" |
|
plot.show.real.obs.window.start |
NO | "TRUE" |
[for some CMAs, the real observation window starts at a different date] |
plot.real.obs.window.density |
NO | "35" |
|
plot.real.obs.window.angle |
NO | "30" |
|
plot.bw.plot |
NO | "FALSE" |
[if TRUE , override all user-given colours and replace them with a scheme suitable for grayscale plotting] |
CMA1
, CMA2
, CMA3
, CMA4
The parameters for these functions are (N.B.: the plotting parameters can also appear if plotting is required):
PARAMETER | REQUIRED | DEFAULT_VALUE | POSSIBLE_VALUES |
---|---|---|---|
ID.colname |
YES | ||
event.date.colname |
YES | ||
event.duration.colname |
YES | ||
followup.window.start.type |
NO | "numeric" |
"numeric" , "character" , "date" |
followup.window.start |
NO | 0 |
|
followup.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
followup.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
followup.window.duration |
NO | "365 * 2" |
|
followup.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
observation.window.start.type |
NO | "numeric" |
"numeric" , "character" , "date" |
observation.window.start |
NO | 0 |
|
observation.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
observation.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
observation.window.duration |
NO | "365 * 2" |
|
observation.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
date.format |
NO | "%m/%d/%Y" |
|
event.interval.colname |
NO | "event.interval" |
|
gap.days.colname |
NO | "gap.days" |
|
force.NA.CMA.for.failed.patients |
NO | "TRUE" |
|
parallel.backend |
NO | "none" |
"none" , "multicore" , "snow" , "snow(SOCK)" , "snow(MPI)" , "snow(NWS)" |
parallel.threads |
NO | "auto" |
|
suppress.warnings |
NO | "FALSE" |
|
save.event.info |
NO | "FALSE" |
RETURN VALUE(S) | FILE | OBSERVATIONS |
---|---|---|
Errors, warnings and other messages | Adherer-results.txt |
Possibly more than one line; if the processing was successful, the last line must begin with OK: |
The computed CMAs, as a TAB-separated no quotes CSV file | CMA.csv |
Always generated in case of successful processing |
The gap days and event info data, as a TAB-separated no quotes CSV file | EVENTINFO.csv |
Only by explicit request (i.e., save.event.info = "TRUE" ) |
CMA5
, CMA6
, CMA7
, CMA8
, CMA9
The parameters for these functions are (N.B.: the plotting parameters can also appear if plotting is required):
PARAMETER | REQUIRED | DEFAULT_VALUE | POSSIBLE_VALUES |
---|---|---|---|
ID.colname |
YES | ||
event.date.colname |
YES | ||
event.duration.colname |
YES | ||
event.daily.dose.colname |
YES | ||
medication.class.colname |
YES | ||
carry.only.for.same.medication |
NO | "FALSE" |
|
consider.dosage.change |
NO | "FALSE" |
|
followup.window.start.type |
NO | "numeric" |
"numeric" , "character" , "date" |
followup.window.start |
NO | 0 |
|
followup.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
followup.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
followup.window.duration |
NO | "365 * 2" |
|
followup.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
observation.window.start.type |
NO | "numeric" |
"numeric" , "character" , "date" |
observation.window.start |
NO | 0 |
|
observation.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
observation.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
observation.window.duration |
NO | "365 * 2" |
|
observation.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
date.format |
NO | "%m/%d/%Y" |
|
event.interval.colname |
NO | "event.interval" |
|
gap.days.colname |
NO | "gap.days" |
|
force.NA.CMA.for.failed.patients |
NO | "TRUE" |
|
parallel.backend |
NO | "none" |
"none" , "multicore" , "snow" , "snow(SOCK)" , "snow(MPI)" , "snow(NWS)" |
parallel.threads |
NO | "auto" |
|
suppress.warnings |
NO | "FALSE" |
|
save.event.info |
NO | "FALSE" |
RETURN VALUE(S) | FILE | OBSERVATIONS |
---|---|---|
Errors, warnings and other messages | Adherer-results.txt |
Possibly more than one line; if the processing was successful, the last line must begin with OK: |
The computed CMAs, as a TAB-separated no quotes CSV file | CMA.csv |
Always generated in case of successful processing |
The gap days and event info data, as a TAB-separated no quotes CSV file | EVENTINFO.csv |
Only by explicit request (i.e., save.event.info = "TRUE" ) |
CMA_per_episode
The parameters for this function are (N.B.: the plotting parameters can also appear if plotting is required):
PARAMETER | REQUIRED | DEFAULT_VALUE | POSSIBLE_VALUES |
---|---|---|---|
CMA.to.apply |
YES | CMA1 , CMA2 , CMA3 , CMA4 , CMA5 , CMA6 , CMA7 , CMA8 , CMA9 |
|
ID.colname |
YES | ||
event.date.colname |
YES | ||
event.duration.colname |
YES | ||
event.daily.dose.colname |
YES | ||
medication.class.colname |
YES | ||
carry.only.for.same.medication |
NO | "FALSE" |
|
consider.dosage.change |
NO | "FALSE" |
|
medication.change.means.new.treatment.episode |
NO | "TRUE" |
|
maximum.permissible.gap |
NO | "90" |
|
maximum.permissible.gap.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" , "percent" |
followup.window.start.type |
NO | "numeric" |
"numeric" , "character" , "date" |
followup.window.start |
NO | 0 |
|
followup.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
followup.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
followup.window.duration |
NO | "365 * 2" |
|
followup.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
observation.window.start.type |
NO | "numeric" |
"numeric" , "character" , "date" |
observation.window.start |
NO | 0 |
|
observation.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
observation.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
observation.window.duration |
NO | "365 * 2" |
|
observation.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
date.format |
NO | "%m/%d/%Y" |
|
event.interval.colname |
NO | "event.interval" |
|
gap.days.colname |
NO | "gap.days" |
|
force.NA.CMA.for.failed.patients |
NO | "TRUE" |
|
parallel.backend |
NO | "none" |
"none" , "multicore" , "snow" , "snow(SOCK)" , "snow(MPI)" , "snow(NWS)" |
parallel.threads |
NO | "auto" |
|
suppress.warnings |
NO | "FALSE" |
|
save.event.info |
NO | "FALSE" |
RETURN VALUE(S) | FILE | OBSERVATIONS |
---|---|---|
Errors, warnings and other messages | Adherer-results.txt |
Possibly more than one line; if the processing was successful, the last line must begin with OK: |
The computed CMAs, as a TAB-separated no quotes CSV file | CMA.csv |
Always generated in case of successful processing |
The gap days and event info data, as a TAB-separated no quotes CSV file | EVENTINFO.csv |
Only by explicit request (i.e., save.event.info = "TRUE" ) |
CMA_sliding_window
The parameters for this function are (N.B.: the plotting parameters can also appear if plotting is required):
PARAMETER | REQUIRED | DEFAULT_VALUE | POSSIBLE_VALUES |
---|---|---|---|
CMA.to.apply |
YES | CMA1 , CMA2 , CMA3 , CMA4 , CMA5 , CMA6 , CMA7 , CMA8 , CMA9 |
|
ID.colname |
YES | ||
event.date.colname |
YES | ||
event.duration.colname |
YES | ||
event.daily.dose.colname |
YES | ||
medication.class.colname |
YES | ||
carry.only.for.same.medication |
NO | "FALSE" |
|
consider.dosage.change |
NO | "FALSE" |
|
followup.window.start.type |
NO | "numeric" |
"numeric" , "character" , "date" |
followup.window.start |
NO | 0 |
|
followup.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
followup.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
followup.window.duration |
NO | "365 * 2" |
|
followup.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
observation.window.start.type |
NO | "numeric" |
"numeric" , "character" , "date" |
observation.window.start |
NO | 0 |
|
observation.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
observation.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
observation.window.duration |
NO | "365 * 2" |
|
observation.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
sliding.window.start.type |
NO | "numeric" |
"numeric" , "character' , "date' |
sliding.window.start |
NO | 0 |
|
sliding.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
sliding.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
sliding.window.duration |
NO | "90" |
|
sliding.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
sliding.window.step.duration.type |
NO | "numeric" |
"numeric" , "character" |
sliding.window.step.duration |
NO | "30" |
|
sliding.window.step.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
sliding.window.no.steps |
NO | "-1" |
|
date.format |
NO | "%m/%d/%Y" |
|
event.interval.colname |
NO | "event.interval" |
|
gap.days.colname |
NO | "gap.days" |
|
force.NA.CMA.for.failed.patients |
NO | "TRUE" |
|
parallel.backend |
NO | "none" |
"none" , "multicore" , "snow" , "snow(SOCK)" , "snow(MPI)" , "snow(NWS)" |
parallel.threads |
NO | "auto" |
|
suppress.warnings |
NO | "FALSE" |
|
save.event.info |
NO | "FALSE" |
RETURN VALUE(S) | FILE | OBSERVATIONS |
---|---|---|
Errors, warnings and other messages | Adherer-results.txt |
Possibly more than one line; if the processing was successful, the last line must begin with OK: |
The computed CMAs, as a TAB-separated no quotes CSV file | CMA.csv |
Always generated in case of successful processing |
The gap days and event info data, as a TAB-separated no quotes CSV file | EVENTINFO.csv |
Only by explicit request (i.e., save.event.info = "TRUE" ) |
compute_event_int_gaps
This function is intended for advanced users only; the parameters for this function are:
PARAMETER | REQUIRED | DEFAULT_VALUE | POSSIBLE_VALUES |
---|---|---|---|
ID.colname |
YES | ||
event.date.colname |
YES | ||
event.duration.colname |
YES | ||
event.daily.dose.colname |
NO | ||
medication.class.colname |
NO | ||
carryover.within.obs.window |
NO | "FALSE" |
|
carryover.into.obs.window |
NO | "FALSE" |
|
carry.only.for.same.medication |
NO | "FALSE" |
|
consider.dosage.change |
NO | "FALSE" |
|
followup.window.start.type |
NO | "numeric" |
"numeric" , "character" , "date" |
followup.window.start |
NO | "0" |
|
followup.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
followup.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
followup.window.duration |
NO | "365 * 2 " |
|
followup.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
observation.window.start.type |
NO | "numeric" |
"numeric" , "character" , "date" |
observation.window.start |
NO | "0" |
|
observation.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
observation.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
observation.window.duration |
NO | "365 * 2" |
|
observation.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
date.format |
NO | "%m/%d/%Y" |
|
keep.window.start.end.dates |
NO | "FALSE" |
|
remove.events.outside.followup.window |
NO | "TRUE" |
|
keep.event.interval.for.all.events |
NO | "FALSE" |
|
event.interval.colname |
NO | "event.interval " |
|
gap.days.colname |
NO | "gap.days" |
|
force.NA.CMA.for.failed.patients |
NO | "TRUE" |
|
parallel.backend |
NO | "none " |
"none" , "multicore" , "snow" , "snow(SOCK)" , "snow(MPI)" , "snow(NWS)" |
parallel.threads |
NO | "auto" |
|
suppress.warnings |
NO | "FALSE" |
RETURN VALUE(S) | FILE | OBSERVATIONS |
---|---|---|
Errors, warnings and other messages | Adherer-results.txt |
Possibly more than one line; if the processing was successful, the last line must begin with OK: |
The gap days and event info data, as a TAB-separated no quotes CSV file | EVENTINFO.csv |
In this case, always returned is successful |
compute_treatment_episodes
This function is intended for advanced users only; the parameters for this function are:
PARAMETER | REQUIRED | DEFAULT_VALUE | POSSIBLE_VALUES |
---|---|---|---|
ID.colname |
YES | ||
event.date.colname |
YES | ||
event.duration.colname |
YES | ||
event.daily.dose.colname |
NO | ||
medication.class.colname |
NO | ||
carryover.within.obs.window |
NO | "FALSE" |
|
carryover.into.obs.window |
NO | "FALSE" |
|
carry.only.for.same.medication |
NO | "FALSE" |
|
consider.dosage.change |
NO | "FALSE" |
|
medication.change.means.new.treatment.episode |
NO | "TRUE" |
|
maximum.permissible.gap |
NO | "90" |
|
maximum.permissible.gap.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" , "percent" |
followup.window.start.type |
NO | "numeric" |
"numeric" , "character" , "date" |
followup.window.start |
NO | 0 |
|
followup.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
followup.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
followup.window.duration |
NO | "365 * 2" |
|
followup.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
observation.window.start.type |
NO | "numeric" |
"numeric" , "character" , "date" |
observation.window.start |
NO | 0 |
|
observation.window.start.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
observation.window.duration.type |
NO | "numeric" |
"numeric" , "character" , "date" |
observation.window.duration |
NO | "365 * 2" |
|
observation.window.duration.unit |
NO | "days" |
"days" , "weeks" , "months" , "years" |
date.format |
NO | "%m/%d/%Y" |
|
keep.window.start.end.dates |
NO | "FALSE" |
|
remove.events.outside.followup.window |
NO | "TRUE" |
|
keep.event.interval.for.all.events |
NO | "FALSE" |
|
event.interval.colname |
NO | "event.interval" |
|
gap.days.colname |
NO | "gap.days" |
|
force.NA.CMA.for.failed.patients |
NO | "TRUE" |
|
parallel.backend |
NO | "none" |
"none" , "multicore" , "snow" , "snow(SOCK)" , "snow(MPI)" , "snow(NWS)" |
parallel.threads |
NO | "auto" |
|
suppress.warnings |
NO | "FALSE" |
RETURN VALUE(S) | FILE | OBSERVATIONS |
---|---|---|
Errors, warnings and other messages | Adherer-results.txt |
Possibly more than one line; if the processing was successful, the last line must begin with OK: |
The treatment episodes data, as a TAB-separated no quotes CSV file | TREATMENTEPISODES.csv |
Always if successful |
plot_interactive_cma
This function initiates the interactive plotting in AdhereR
using Shiny
: all the plotting will be done in the current internet browser and there are no results expected (except for errors, warnings and other messages). This function ignores the argument plot.show = "TRUE"
and takes very few arguments of its own, as most of the relevant parameters can be set interactively through the Shiny
interface.
PARAMETER | REQUIRED | DEFAULT_VALUE | POSSIBLE_VALUES |
---|---|---|---|
patient_to_plot |
NO | defaults to the first patient in the dataset | |
ID.colname |
YES | ||
event.date.colname |
YES | ||
event.duration.colname |
YES | ||
event.daily.dose.colname |
NO | ||
medication.class.colname |
NO | ||
date.format |
NO | "%m/%d/%Y" |
|
followup.window.start.max |
NO | integer >0 | |
followup.window.duration.max |
NO | integer >0 | |
observation.window.start.max |
NO | integer >0 | |
observation.window.duration.max |
NO | integer >0 | |
maximum.permissible.gap.max |
NO | integer >0 | |
sliding.window.start.max |
NO | integer >0 | |
sliding.window.duration.max |
NO | integer >0 | |
sliding.window.step.duration.max |
NO | integer >0 |
Python 3
codeThis annex lists the Python 3
code included in this vignette in an easy-to-run form (i.e., no In []
, Out []
and prompts):
# Import adherer as ad:
import adherer as ad
# Show the _DATA_SHARING_DIRECTORY (should be set automatically to a temporary location):
ad._DATA_SHARING_DIRECTORY.name
# Show the _RSCRIPT_PATH (should be dectedt automatically):
ad._RSCRIPT_PATH
# Import Pandas as pd:
import pandas as pd
# Load the test dataset
df = pd.read_csv('~/Temp/med-events.csv', sep='\t', header=0)
# Let's look at first 6 rows (it should match the R output above except for the row names):
df.head(6)
# Compute CMA8 as a test:
cma8 = ad.CMA8(df,
id_colname='PATIENT_ID',
event_date_colname='DATE',
event_duration_colname='DURATION',
event_daily_dose_colname='PERDAY',
medication_class_colname='CATEGORY')
# Summary of cma8:
cma8
# The return value and messages:
cma8.get_computation_results()
# The CMA (the first 6 rows out of all 100):
cma8.get_cma().head(6)
# Plot it (statically):
cma8.plot(patients_to_plot=['1', '2', '3'],
align_all_patients=True,
period_in_days=30,
cex=0.5)
# Interactive plotting:
cma8.plot_interactive()
# Sliding windows with CMA1 (single thread, locally):
cma1w_1l = ad.CMASlidingWindow(dataset=df,
cma_to_apply="CMA1",
id_colname='PATIENT_ID',
event_date_colname='DATE',
event_duration_colname='DURATION',
event_daily_dose_colname='PERDAY',
medication_class_colname='CATEGORY',
sliding_window_duration=30,
sliding_window_step_duration=30,
parallel_backend="none",
parallel_threads=1)
cma1w_1l.get_cma().head(6)
# Sliding windows with CMA1 (two threads, multicore, locally):
cma1w_2ml = ad.CMASlidingWindow(dataset=df,
cma_to_apply="CMA1",
id_colname='PATIENT_ID',
event_date_colname='DATE',
event_duration_colname='DURATION',
event_daily_dose_colname='PERDAY',
medication_class_colname='CATEGORY',
sliding_window_duration=30,
sliding_window_step_duration=30,
parallel_backend="multicore", # <--- multicore
parallel_threads=2)
cma1w_2ml.get_cma().head(6)
# Sliding windows with CMA1 (two threads, snow, locally):
cma1w_2sl = ad.CMASlidingWindow(dataset=df,
cma_to_apply="CMA1",
id_colname='PATIENT_ID',
event_date_colname='DATE',
event_duration_colname='DURATION',
event_daily_dose_colname='PERDAY',
medication_class_colname='CATEGORY',
sliding_window_duration=30,
sliding_window_step_duration=30,
parallel_backend="snow", # <--- SNOW
parallel_threads=2)
cma1w_2sl.get_cma().head(6)
# Sliding windows with CMA1 (two remote machines with two threads each):
# The workers are defined as *literal R code* this is verbatim sent to AdhereR for parsing and interpretation
# Please note, however, that this string should not contain line breaks (i.e., it should be a one-liner):
workers = 'c(rep(list(list(host="workhorse1", user="user", rscript="/usr/local/bin/Rscript", snowlib="/usr/local/lib64/R/library/")), 2), rep(list(list(host="workhorse2", user="user", rscript="/usr/local/bin/Rscript", snowlib="/usr/local/lib64/R/library/")), 2))'
cma1w_2sw = ad.CMASlidingWindow(dataset=df,
cma_to_apply="CMA1",
id_colname='PATIENT_ID',
event_date_colname='DATE',
event_duration_colname='DURATION',
event_daily_dose_colname='PERDAY',
medication_class_colname='CATEGORY',
sliding_window_duration=30,
sliding_window_step_duration=30,
parallel_backend="snow",
parallel_threads=workers)
cma1w_2sw.get_cma().head(6)
COMMENTS
Everything on a line following
///
or#
is considered a comment and ignored (except when included within quotes" "
or' '
).