Distance Profiling:
distToNearest.groupUsingOnlyIGH argument of distToNearest to onlyHeavy.Backwards Incompatible Changes:
V_CALL (Change-O) as the default to identify the field that stored the V gene calls, they now use v_call (AIRR). That means, scripts that relied on default values (previously, v_call="V_CALL"), will now fail if calls to the functions are not updated to reflect the correct value for the data. If data are in the Change-O format, the current default value v_call="v_call" will fail to identify the column with the V gene calls as the column v_call doesn’t exist. In this case, v_call="V_CALL" needs to be specified in the function call.ExampleDb converted to the AIRR Rearrangement standard and examples updated accordingly.labels slot of IMGT_V has changed from CDR_R, CDR_S, FWR_R and FWR_S to cdr_r, cdr_s, fwr_r and fwr_s, respectively.CODON_TABLE and the different MUTATION_SCHEMES change from R, S and Stop to r, s and stop, respectively.MU_COUNT_SEQ to mu_count_seq.calcBaseline and related function output columns and S4 object slots. For example, from PVALUE, REGION and BASELINE_CI_PVALUE to pvalue, region and baseline_ci_pvalue, respectively.createSubstitutionMatrix, createMutabilityMatrix and createTargetingModel, changed from model=c("S","RS") to model=c("s","rs").General:
Targeting Models:
createMutabilityMatrix, extendMutabilityMatrix, createTargetingMatrix, and createTargetingModel now also returns the numbers of silent and replacement mutations used for estimating the 5-mer mutabilities. These numbers are recorded in the numMutS and numMutR slots in the newly defined MutabilityModel, MutabilityModelWithSource, and TargetingMatrix classes.Mutation Profiling:
shmulateSeq now also supports specifying the frequency of mutations to be introduced. (Previously, only the number of mutations was supported.)General:
General:
Distance Calculation:
distToNearest that could potentially cause sequences from different partitions to be used for distance calculation.General:
Distance Calculation:
plotDensityThreshold for negative densities.distToNearest for performing subsampling while calculating cross-group nearest neighbor distances.distToNearest now supports, via a new argument VJthenLen, either a 2-stage partitioning (first by V gene and J gene, then by junction length), or a 1-stage partitioning (simultaneously by V gene, J gene, and junction length). For 1-stage partitioning, distToNearest supports export of the partitioning information as a new column via keepVJLgroup.distToNearest now supports single-cell input data with the addition of new arguments cellIdColumn, locusColumn, and groupUsingOnlyIGH.Mutation Profiling:
shmulateTree has new arguments, start and end, to specify the region in the sequence where mutations can be introduced.Selection Analysis
consensusSequence which can be used to build a consensus sequence using a variety of methods.General:
TargetingModel and RegionDefinition S4 classes.General:
subsample argument to distToNearest function.alakazam. Specifically, progressBar, getBaseTheme and checkColumns.clearConsole, getnproc, and getPlatform functions.Distance Calculation:
findThreshold method to density.density method by retuning the bandwidth detection process. The density method should now also yield more consistent thresholds, on average.subsample argument to findThreshold now applies to both the density and gmm methods. Subsampling of distance is not performed by default.plotDensityThreshold and plotGmmThreshold wherein the breaks argument was ignored when specifying xmax and/or xmin.Selection Analsis:
plotBaselineDensity arising when the groupColumn and idColumn arguments were set to the same column.sizeElement argument to plotBaselineDensity to control line sizefield_name argument to field in editBaseline.Selection Analysis:
plotBaselineDensity which caused an empty plot to be generated if there was only a single value in the idColumn.calcBaseline which caused a crash in summarizeBaseline and groupBaseline when input baseline is based on only 1 sequence (i.e. when nrow(baseline@db) is 1).plot call on a Baseline object to plotBaselineDensity.getBaselineStats function.summary method for Baseline objects that calls summarizeBaseline and returns a data.frame.Mutation Profiling:
shmulateSeq which caused a crash when the input sequence contains gaps (.).mutations in shmulateSeq to numMutations.shmulateSeq and shmulateTree.calcExpectedMutations will now treat non-ACTG characters as Ns rather than produce an error.RegionDefinition objects for the full V segment as single region (IMGT_V_BY_SEGMENTS) and the V segment with each codon as a separate region (IMGT_V_BY_CODONS).Targeting Models:
calculateMutability function which computes the aggregate mutability for sequences.createSubstitutionMatrix to fail for data containing only a single V family.model="S") in createSubstitutionMatrix, createSubstitutionMatrix and createTargetingModelplot call on a TargetingModel object to plotMutability.General:
Distance Calculation:
"gmm" method of findThreshold() that allows users to choose a mixture of two univariate density distribution functions among four available combinations: "norm-norm", "norm-gamma","gamma-norm", or "gamma-gamma"."gmm" method of findThreshold() from the best average sensitivity and specificity, the curve intersection or user defined sensitivity or specificity.cutEdge argument of findThreshold() to edge.Mutation Profiling:
collapseClones(), adding various deterministic and stochastic methods to obtain effective clonal sequences, support for including ambiguous IUPAC characters in output, as well as extensive documentation. Removed calcClonalConsensus() from exported functions.observedMutations() and calcObservedMutations().calcObservedMutations() for sequences with non-triplet overhang at the tail.OBSERVED) and expected mutations (previously EXPECTED) returned by observedMutations() and expectedMutations() to MU_COUNT and MU_EXPECTED respectively.Selection Analysis:
calcBaseline() no longer calls collapseClones() automatically if a CLONE column is present. As indicated by the documentation for calcBaseline() users are advised to obtain effective clonal sequences (for example, calling collapseClones()) before running calcBaseline().calcBaseline().Mutation Profiling:
collapseClones() that prevented it from running when nproc is greater than 1.General:
Mutation Profiling:
collapseClones() that resulted in erroneous CLONAL_SEQUENCE and CLONAL_GERMLINE being returned.observedMutations was running.General:
Selection Analysis:
summarizeBaseline(). The returned p-value can now be either positive or negative. Its magnitude (without the sign) should be interpreted as per normal. Its sign indicates the direction of the seLicense chalection detected. A positive p-value indicates positive selection, whereas a negative p-value indicates negative selection.editBaseline() to exported functions, and a corresponding section in the vignette.calcBaseline().Targeting Models:
numMutationsOnly argument to createSubstitutionMatrix(), enabling parameter tuning for minNumMutations.minNumMutationsTune() and minNumSeqMutationsTune() to tune for parameters minNumMutations and minNumSeqMutations in functions createSubstitutionMatrix() and createMutabilityMatrix() respectively. Also added function plotTune() which helps visualize parameter tuning using the abovementioned two new functions.HKL_S5F).HS5FModel as HH_S5F, MRS5NFModel as MK_RS5NF, and U5NModel as U5N.HH_S1F), human kappa and lambda light chain, silent, 1-mer, functional substitution model (HKL_S1F), and mouse kappa light chain, replacement and silent, 1-mer, non-functional substitution model (MK_RS1NF).makeDegenerate5merSub and makeDegenerate5merMut which make degenerate 5-mer substitution and mutability models respectively based on the 1-mer models. Also added makeAverage1merSub and makeAverage1merMut which make 1-mer substitution and mutability models respectively by averaging over the 5-mer models.Mutation Profiling:
returnRaw argument to calcObservedMutations(), which if true returns the positions of point mutations and their corresponding mutation types, as opposed to counts of mutations (hence “raw”).slideWindowSeq() and slideWindowDb() which implement a sliding window approach towards filtering a single sequence or sequences in a data.frame which contain(s) equal to or more than a given number of mutations in a given number of consecutive nucleotides.slideWindowTune() which allows for parameter tuning for using slideWindowSeq() and slideWindowDb().slideWindowTunePlot() which visualizes parameter tuning by slideWindowTune().Distance Calculation:
distToNearest wherein normalize="length" for 5-mer models was resulting in distances normalized by junction length squared instead of raw junction length.distToNearest wherein symmetry="min" was calculating the minimum of the total distance between two sequences instead of the minimum distance at each mutated position.findThreshold function to infer clonal distance threshold from nearest neighbor distances returned by distToNearest.length option for the normalize argument of distToNearest to len so it matches Change-O.HS1FDistance and M1NDistance distance models, which have been renamed to hs1f_compat and m1n_compat in the model argument of distToNearest. These deprecated models should be used for compatibility with DefineClones in Change-O v0.3.3. These models have been replaced by replaced by hh_s1f and mk_rs1nf, which are supported by Change-O v0.3.4.hs5f model in distToNearest to hh_s5f.MK_RS5NF models to distToNearest.calcTargetingDistance() to enable calculation of a symmetric distance matrix given a 1-mer substitution matrix normalized by row, such as HH_S1F.findThreshold. The previous smoothed density method is available via the method="density" argument and the new GMM method is available via method="gmm".plotGmmThreshold and plotDensityThreshold to plot the threshold detection results from findThreshold for the "gmm" and "density" methods, respectively.Region Definition:
IMGT_V_NO_CDR3 and IMGT_V_BY_REGIONS_NO_CDR3. Updated IMGT_V and IMGT_V_BY_REGIONS so that neither includes CDR3 now.Selection Analysis:
Targeting Models:
numSeqMutationsOnly argument to createMutabilityMatrix(), enabling parameter tuning for minNumSeqMutations.General:
InfluenzaDb data object, in favor of the updated ExampleDb provided in alakazam 0.2.4.Distance Calculation:
cross argument to distToNearest() which allows restriction of distances to only distances across samples (ie, excludes within-sample distances).mst flag to distToNearest(), which will return all distances to neighboring nodes in a minimum spanning tree.aa model of distToNearest().aa model of distToNearest().Mutation Profiling:
MutationDefinition VOLUME_MUTATIONS.shmulateSeq() and shmulateTree() to simulate mutations on sequences and lineage trees, respectively, using a 5-mer targeting model.collapseByClone, calcDbExpectedMutations and calcDbObservedMutations to collapseClones, expectedMutations, and observedMutations, respectively.Selection Analysis:
Baseline object through groupBaseline() multiple times resulted in incorrect normalization.title options to plotBaselineSummary() and plotBaselineDensity().plotBaselineSummary() and plotBaselineDensity().testBaseline() function to test the significance of differences between two selection distributions.General:
InfluenzaDb.dplyr::tbl_df object instead of a data.frame.Distance Calculation:
distToNearest() did not return the nearest neighbor with a non-zero distance.Targeting Models:
createSubstitutionMatrix(),createMutabilityMatrix(), and plotMutability().plotMutability().Mutation Profiling:
MutationDefinition objects MUTATIONS_CHARGE, MUTATIONS_HYDROPATHY, MUTATIONS_POLARITY providing alternate approaches to defining replacement and silent annotations to mutations when calling calcDBObservedMutations() and calcDBExpectedMutations().regionDefinition=NULL consistent for all mutation profiling functions. Now the entire sequence is used as the region and calculations are made accordingly.calcDBObservedMutations() returns R and S mutations also when regionDefinition=NULL. Older versions reported the sum of R and S mutations. The function will add the columns OBSERVED_SEQ_R and OBSERVED_SEQ_S when frequency=FALSE, and MU_FREQ_SEQ_R and MU_FREQ_SEQ_R when frequency=TRUE.General:
Distance Calculation:
symmetry parameter to distToNearest to change behavior of how asymmetric distances (A->B != B->A) are combined to get distance between A and B.Mutation Profiling:
Selection Analysis:
Targeting Models:
minNumMutations parameter to createSubstitutionMatrix. This is the minimum number of observed 5-mers required for the substituion model. The substitution rate of 5-mers with fewer number of observed mutations will be inferred from other 5-mers.minNumSeqMutations parameter to createMutabilityMatrix. This is the minimum number of mutations required in sequences containing the 5-mers of interest. The mutability of 5-mers with fewer number of observed mutations in the sequences will be inferred.returnModel parameter to createSubstitutionMatrix. This gives user the option to return 1-mer or 5-mer model.returnSource parameter to createMutabilityMatrix. If TRUE, the code will return a data frame indicating whether each 5-mer mutability is observed or inferred.Initial public release.
General:
Influenza.tab file did not load on Mac OS X.citation("shazam") command.Distance Calculation:
HS1FDistance, based on the Yaari et al, 2013 data.hs1f as the default distance model for distToNearest().distToNearest().Mutation Profiling:
calcDBClonalConsensus() so that the function now works correctly when called with the argument collapseByClone=FALSE.frequency argument to calcObservedMutations() and calcDBObservedMutations(), which enables return of mutation frequencies rather the default of mutation counts.Targeting Models:
M3NModel and all options for using said model.createSubstitutionMatrix() and createMutabilityMatrix() where IMGT gaps were not being handled.General:
Targeting Models:
Targeting Models:
U5NModel, which is a uniform 5-mer model.plotMutability() output.Prerelease for review.