Distance Profiling:
distToNearest
.groupUsingOnlyIGH
argument of distToNearest
to onlyHeavy
.Backwards Incompatible Changes:
V_CALL
(Change-O) as the default to identify the field that stored the V gene calls, they now use v_call
(AIRR). That means, scripts that relied on default values (previously, v_call="V_CALL"
), will now fail if calls to the functions are not updated to reflect the correct value for the data. If data are in the Change-O format, the current default value v_call="v_call"
will fail to identify the column with the V gene calls as the column v_call
doesn’t exist. In this case, v_call="V_CALL"
needs to be specified in the function call.ExampleDb
converted to the AIRR Rearrangement standard and examples updated accordingly.labels
slot of IMGT_V
has changed from CDR_R
, CDR_S
, FWR_R
and FWR_S
to cdr_r
, cdr_s
, fwr_r
and fwr_s
, respectively.CODON_TABLE
and the different MUTATION_SCHEMES
change from R
, S
and Stop
to r
, s
and stop
, respectively.MU_COUNT_SEQ
to mu_count_seq
.calcBaseline
and related function output columns and S4 object slots. For example, from PVALUE
, REGION
and BASELINE_CI_PVALUE
to pvalue
, region
and baseline_ci_pvalue
, respectively.createSubstitutionMatrix
, createMutabilityMatrix
and createTargetingModel
, changed from model=c("S","RS")
to model=c("s","rs")
.General:
Targeting Models:
createMutabilityMatrix
, extendMutabilityMatrix
, createTargetingMatrix
, and createTargetingModel
now also returns the numbers of silent and replacement mutations used for estimating the 5-mer mutabilities. These numbers are recorded in the numMutS
and numMutR
slots in the newly defined MutabilityModel
, MutabilityModelWithSource
, and TargetingMatrix
classes.Mutation Profiling:
shmulateSeq
now also supports specifying the frequency of mutations to be introduced. (Previously, only the number of mutations was supported.)General:
General:
Distance Calculation:
distToNearest
that could potentially cause sequences from different partitions to be used for distance calculation.General:
Distance Calculation:
plotDensityThreshold
for negative densities.distToNearest
for performing subsampling while calculating cross-group nearest neighbor distances.distToNearest
now supports, via a new argument VJthenLen
, either a 2-stage partitioning (first by V gene and J gene, then by junction length), or a 1-stage partitioning (simultaneously by V gene, J gene, and junction length). For 1-stage partitioning, distToNearest
supports export of the partitioning information as a new column via keepVJLgroup
.distToNearest
now supports single-cell input data with the addition of new arguments cellIdColumn
, locusColumn
, and groupUsingOnlyIGH
.Mutation Profiling:
shmulateTree
has new arguments, start
and end
, to specify the region in the sequence where mutations can be introduced.Selection Analysis
consensusSequence
which can be used to build a consensus sequence using a variety of methods.General:
TargetingModel
and RegionDefinition
S4 classes.General:
subsample
argument to distToNearest
function.alakazam
. Specifically, progressBar
, getBaseTheme
and checkColumns
.clearConsole
, getnproc
, and getPlatform
functions.Distance Calculation:
findThreshold
method to density
.density
method by retuning the bandwidth detection process. The density
method should now also yield more consistent thresholds, on average.subsample
argument to findThreshold
now applies to both the density
and gmm
methods. Subsampling of distance is not performed by default.plotDensityThreshold
and plotGmmThreshold
wherein the breaks
argument was ignored when specifying xmax
and/or xmin
.Selection Analsis:
plotBaselineDensity
arising when the groupColumn
and idColumn
arguments were set to the same column.sizeElement
argument to plotBaselineDensity
to control line sizefield_name
argument to field
in editBaseline
.Selection Analysis:
plotBaselineDensity
which caused an empty plot to be generated if there was only a single value in the idColumn
.calcBaseline
which caused a crash in summarizeBaseline
and groupBaseline
when input baseline
is based on only 1 sequence (i.e. when nrow(baseline@db)
is 1).plot
call on a Baseline
object to plotBaselineDensity
.getBaselineStats
function.summary
method for Baseline
objects that calls summarizeBaseline
and returns a data.frame.Mutation Profiling:
shmulateSeq
which caused a crash when the input sequence contains gaps (.
).mutations
in shmulateSeq
to numMutations
.shmulateSeq
and shmulateTree
.calcExpectedMutations
will now treat non-ACTG characters as Ns rather than produce an error.RegionDefinition
objects for the full V segment as single region (IMGT_V_BY_SEGMENTS
) and the V segment with each codon as a separate region (IMGT_V_BY_CODONS
).Targeting Models:
calculateMutability
function which computes the aggregate mutability for sequences.createSubstitutionMatrix
to fail for data containing only a single V family.model="S"
) in createSubstitutionMatrix
, createSubstitutionMatrix
and createTargetingModel
plot
call on a TargetingModel
object to plotMutability
.General:
Distance Calculation:
"gmm"
method of findThreshold()
that allows users to choose a mixture of two univariate density distribution functions among four available combinations: "norm-norm"
, "norm-gamma"
,"gamma-norm"
, or "gamma-gamma"
."gmm"
method of findThreshold()
from the best average sensitivity and specificity, the curve intersection or user defined sensitivity or specificity.cutEdge
argument of findThreshold()
to edge
.Mutation Profiling:
collapseClones()
, adding various deterministic and stochastic methods to obtain effective clonal sequences, support for including ambiguous IUPAC characters in output, as well as extensive documentation. Removed calcClonalConsensus()
from exported functions.observedMutations()
and calcObservedMutations()
.calcObservedMutations()
for sequences with non-triplet overhang at the tail.OBSERVED
) and expected mutations (previously EXPECTED
) returned by observedMutations()
and expectedMutations()
to MU_COUNT
and MU_EXPECTED
respectively.Selection Analysis:
calcBaseline()
no longer calls collapseClones()
automatically if a CLONE
column is present. As indicated by the documentation for calcBaseline()
users are advised to obtain effective clonal sequences (for example, calling collapseClones()
) before running calcBaseline()
.calcBaseline()
.Mutation Profiling:
collapseClones()
that prevented it from running when nproc
is greater than 1.General:
Mutation Profiling:
collapseClones()
that resulted in erroneous CLONAL_SEQUENCE
and CLONAL_GERMLINE
being returned.observedMutations
was running.General:
Selection Analysis:
summarizeBaseline()
. The returned p-value can now be either positive or negative. Its magnitude (without the sign) should be interpreted as per normal. Its sign indicates the direction of the seLicense chalection detected. A positive p-value indicates positive selection, whereas a negative p-value indicates negative selection.editBaseline()
to exported functions, and a corresponding section in the vignette.calcBaseline()
.Targeting Models:
numMutationsOnly
argument to createSubstitutionMatrix()
, enabling parameter tuning for minNumMutations
.minNumMutationsTune()
and minNumSeqMutationsTune()
to tune for parameters minNumMutations
and minNumSeqMutations
in functions createSubstitutionMatrix()
and createMutabilityMatrix()
respectively. Also added function plotTune()
which helps visualize parameter tuning using the abovementioned two new functions.HKL_S5F
).HS5FModel
as HH_S5F
, MRS5NFModel
as MK_RS5NF
, and U5NModel
as U5N
.HH_S1F
), human kappa and lambda light chain, silent, 1-mer, functional substitution model (HKL_S1F
), and mouse kappa light chain, replacement and silent, 1-mer, non-functional substitution model (MK_RS1NF
).makeDegenerate5merSub
and makeDegenerate5merMut
which make degenerate 5-mer substitution and mutability models respectively based on the 1-mer models. Also added makeAverage1merSub
and makeAverage1merMut
which make 1-mer substitution and mutability models respectively by averaging over the 5-mer models.Mutation Profiling:
returnRaw
argument to calcObservedMutations()
, which if true returns the positions of point mutations and their corresponding mutation types, as opposed to counts of mutations (hence “raw”).slideWindowSeq()
and slideWindowDb()
which implement a sliding window approach towards filtering a single sequence or sequences in a data.frame which contain(s) equal to or more than a given number of mutations in a given number of consecutive nucleotides.slideWindowTune()
which allows for parameter tuning for using slideWindowSeq()
and slideWindowDb()
.slideWindowTunePlot()
which visualizes parameter tuning by slideWindowTune()
.Distance Calculation:
distToNearest
wherein normalize="length"
for 5-mer models was resulting in distances normalized by junction length squared instead of raw junction length.distToNearest
wherein symmetry="min"
was calculating the minimum of the total distance between two sequences instead of the minimum distance at each mutated position.findThreshold
function to infer clonal distance threshold from nearest neighbor distances returned by distToNearest
.length
option for the normalize
argument of distToNearest
to len
so it matches Change-O.HS1FDistance
and M1NDistance
distance models, which have been renamed to hs1f_compat
and m1n_compat
in the model
argument of distToNearest
. These deprecated models should be used for compatibility with DefineClones in Change-O v0.3.3. These models have been replaced by replaced by hh_s1f
and mk_rs1nf
, which are supported by Change-O v0.3.4.hs5f
model in distToNearest
to hh_s5f
.MK_RS5NF
models to distToNearest
.calcTargetingDistance()
to enable calculation of a symmetric distance matrix given a 1-mer substitution matrix normalized by row, such as HH_S1F
.findThreshold
. The previous smoothed density method is available via the method="density"
argument and the new GMM method is available via method="gmm"
.plotGmmThreshold
and plotDensityThreshold
to plot the threshold detection results from findThreshold
for the "gmm"
and "density"
methods, respectively.Region Definition:
IMGT_V_NO_CDR3
and IMGT_V_BY_REGIONS_NO_CDR3
. Updated IMGT_V
and IMGT_V_BY_REGIONS
so that neither includes CDR3 now.Selection Analysis:
Targeting Models:
numSeqMutationsOnly
argument to createMutabilityMatrix()
, enabling parameter tuning for minNumSeqMutations
.General:
InfluenzaDb
data object, in favor of the updated ExampleDb
provided in alakazam 0.2.4.Distance Calculation:
cross
argument to distToNearest()
which allows restriction of distances to only distances across samples (ie, excludes within-sample distances).mst
flag to distToNearest()
, which will return all distances to neighboring nodes in a minimum spanning tree.aa
model of distToNearest()
.aa
model of distToNearest()
.Mutation Profiling:
MutationDefinition
VOLUME_MUTATIONS
.shmulateSeq()
and shmulateTree()
to simulate mutations on sequences and lineage trees, respectively, using a 5-mer targeting model.collapseByClone
, calcDbExpectedMutations
and calcDbObservedMutations
to collapseClones
, expectedMutations
, and observedMutations
, respectively.Selection Analysis:
Baseline
object through groupBaseline()
multiple times resulted in incorrect normalization.title
options to plotBaselineSummary()
and plotBaselineDensity()
.plotBaselineSummary()
and plotBaselineDensity()
.testBaseline()
function to test the significance of differences between two selection distributions.General:
InfluenzaDb
.dplyr::tbl_df
object instead of a data.frame
.Distance Calculation:
distToNearest()
did not return the nearest neighbor with a non-zero distance.Targeting Models:
createSubstitutionMatrix()
,createMutabilityMatrix()
, and plotMutability()
.plotMutability()
.Mutation Profiling:
MutationDefinition
objects MUTATIONS_CHARGE
, MUTATIONS_HYDROPATHY
, MUTATIONS_POLARITY
providing alternate approaches to defining replacement and silent annotations to mutations when calling calcDBObservedMutations()
and calcDBExpectedMutations()
.regionDefinition=NULL
consistent for all mutation profiling functions. Now the entire sequence is used as the region and calculations are made accordingly.calcDBObservedMutations()
returns R and S mutations also when regionDefinition=NULL
. Older versions reported the sum of R and S mutations. The function will add the columns OBSERVED_SEQ_R
and OBSERVED_SEQ_S
when frequency=FALSE
, and MU_FREQ_SEQ_R
and MU_FREQ_SEQ_R
when frequency=TRUE
.General:
Distance Calculation:
symmetry
parameter to distToNearest to change behavior of how asymmetric distances (A->B != B->A) are combined to get distance between A and B.Mutation Profiling:
Selection Analysis:
Targeting Models:
minNumMutations
parameter to createSubstitutionMatrix. This is the minimum number of observed 5-mers required for the substituion model. The substitution rate of 5-mers with fewer number of observed mutations will be inferred from other 5-mers.minNumSeqMutations
parameter to createMutabilityMatrix. This is the minimum number of mutations required in sequences containing the 5-mers of interest. The mutability of 5-mers with fewer number of observed mutations in the sequences will be inferred.returnModel
parameter to createSubstitutionMatrix. This gives user the option to return 1-mer or 5-mer model.returnSource
parameter to createMutabilityMatrix. If TRUE, the code will return a data frame indicating whether each 5-mer mutability is observed or inferred.Initial public release.
General:
Influenza.tab
file did not load on Mac OS X.citation("shazam")
command.Distance Calculation:
HS1FDistance
, based on the Yaari et al, 2013 data.hs1f
as the default distance model for distToNearest()
.distToNearest()
.Mutation Profiling:
calcDBClonalConsensus()
so that the function now works correctly when called with the argument collapseByClone=FALSE
.frequency
argument to calcObservedMutations()
and calcDBObservedMutations()
, which enables return of mutation frequencies rather the default of mutation counts.Targeting Models:
M3NModel
and all options for using said model.createSubstitutionMatrix()
and createMutabilityMatrix()
where IMGT gaps were not being handled.General:
Targeting Models:
Targeting Models:
U5NModel
, which is a uniform 5-mer model.plotMutability()
output.Prerelease for review.