The relative age of individuals can be very useful in pedigree reconstruction, both to filter potential relatives and during assignment. Age difference based probability ratios for different relationships are calculated based on a skeleton parentage-only pedigree and the birth years provided in LifeHistData
(Figure 1.1).
This ‘ageprior’ (Glossary in Table 1.1; please note that this is not an official term, nor a proper prior) was calculated quietly in the background up until version 1.3.3, but from version 2.0 you will see heatmaps and messages such as
These are not warnings, but inform you what is going on what is going on behind the scenes. This will hopefully minimise unintended age priors as a source of error. There are four parts to the message:In addition MakeAgePrior
allows more explicit control from version 2.0 onwards, so that ‘hand-tailoring’ will only be required in rare cases. Arguments can be passed to MakeAgePrior
when running sequoia
via args.AP
.
Figure 1.1: Pipeline overview
The ageprior matrix now has 5 columns (more in sequoia versions before 2.0), corresponding to parent-offspring and siblings (Table 2.1). The number of rows equals the maximum age of parents plus one; the first row is for individuals born in the same year (age difference \(A=0\)), the second row for individuals born one year apart, etc. Some values are fixed to zero: parents and their offspring can never be exactly the same age.
Based on the species’ age at first and last reproduction, some age differences between parent and offspring or between siblings are more likely than others, and some are downright impossible. Age differences are calculated from the birth years (or other time unit) in LifeHistData
, and are used both to filter potential relatives (and reduce the number of time-consuming computations) and during assignment, for example to distinguish between half-siblings, grandparent–grand-offspring and full avuncular pairs.
By default, for the parentage assignment only a 0/1 coded ‘flat’ ageprior is used to indicate which age-differences are possible. If any parents with known age are assigned, the age prior is updated before use during sibship clustering and grandparent assignment (Figure 1.1). The ageprior is not updated during or after the sibship clustering.
If an ageprior is provided as argument to sequoia
(as an element of SeqList
), this ageprior will be used instead of calling MakeAgePrior
(details further down).
Symbol | Term | Definition |
---|---|---|
\(BY_i\) | Birth year | Time unit (year, decade, month) of \(i\)’s birth/ hatching |
\(A_{i,j}\) | Age difference | \(BY_i - BY_j\), i.e. if \(j\) is older than \(i\), than \(A_{i,j}\) is positive |
\(R_{i,j}\) | Relationship | Relationship between \(i\) and \(j\), e.g. parent-offspring |
\(\alpha_{A,R}\) | Ageprior | Probability ratio of relationship \(R\) given \(A\), versus \(R\) across all age differences |
The ‘AgePrior’ matrix is a set of age-difference based probability ratios which can be interpreted as
For example, in a species where females lay large clutches but only breed for a few years, two individuals \(i\) and \(j\) born in the same year (\(BY_i=BY_j\), \(A_{i,j}=0\)) may be more likely to be maternal siblings (\(R_{i,j} =\)MS, relationships in Table 2.1) than two randomly picked individuals (\(\alpha_{0,MS} > 1\)). In contrast, in a species where singletons are the norm and twins are rare, individuals born in the same year are unlikely to be maternal siblings (\(\alpha_{0,MS} < 1\)).
The ageprior is defined as \[\begin{equation} \alpha_{A,R} = \frac{P(R|A)}{P(R)} \text{ .} \label{eq:AP} \end{equation}\]
Using Bayes’ rule that
\[\begin{equation}
P(R|A) = \frac{P(A|R) P(R)}{P(A)}
\end{equation}\]
it follows that \[\begin{equation} \alpha_{A,R} = \frac{P(R|A)}{P(R)} = \frac{P(A|R)}{P(A)} \text{ ,} \label{eq:AP2} \end{equation}\] When a pedigree and birth years are provided, \(P(A)\) and \(P(A|R)\) are simply calculated as the observed proportions of individual pairs with age-difference \(A\) (and relationship \(R\)) (for an example, see further).
When using the ageprior for relationships with and between dummy individuals (‘stand-ins’ for unsampled parents of sibship clusters), the assumption is made that the effect of age on relationship probability doesn’t differ (much) between sampled and unsampled individuals:
\[\begin{equation} \frac{P(R|A, \text{sampled})}{P(R| \text{sampled})} \approx \frac{P(R|A | \text{unsampled})}{P(R | \text{unsampled})} \label{eq:APsampled} \end{equation}\]
This does mean that sampling probability is assumed to be independent of \(A\): any sampling period is finite, thus there are always fewer pairs with large age differences than with small age differences. It does assume that the probability to be sampled is independent of relationship \(R\), so that
\[\begin{equation} \frac{P(A|R, \text{sampled})}{P(A| \text{sampled})} \approx \frac{P(A|R, \text{unsampled})}{P(A| \text{unsampled})} \text{.} \end{equation}\]
During pedigree reconstruction, this ageprior can simply be multiplied by the genotypes-dependend probability,
\[\begin{equation} P(R_{i,j}| G_i, G_j, A_{i,j}) = P(R_{i,j} | G_i, G_j) \times P(R_{i,j} | A_{i,j}) \text{ ,} \end{equation}\] as it can be reasonably assumed that conditional on the relationship \(R\), the genotypes \(G\) and age difference \(A\) are independent (at least on the time scales relevant for pedigree reconstruction). The calculation of \(P(R_{i,j} | G_i, G_j)\) is described in detail in (Huisman, 2017).
The main challenge is when there are very few or no sampled pairs with age difference \(A\), while that age difference is biologically plausible for one or more relationships in Table 2.1. In other words, if \(P(A|\text{sampled}) = 0\) in equation (??), this leads to division by zero and undefined \(\alpha_{A,R}\), while we know that \(P(R|A) >0\).
Or, as another extreme example, when no candidate fathers were sampled (\(P(A | R=P, \text{sampled}) = 0\)) then \(\alpha_{A=0,R=P} =0\). The dummy fathers that will stand in for these unsampled fathers will only be assigned when any \(\alpha_{A,R=P} >0\). Consequently, when simply using the ‘raw’ ratio from the observed data, no dummy fathers will be assigned.
To mitigate these issues, the function uses two types of small sample correction, adjustable with the arguments ‘Flatten’ and ‘Smooth’. See section Small sample correction for details.
Column | Meaning | Version |
---|---|---|
M | Mother - offspring | all |
P | Father - offspring | all |
FS | Full siblings | from 1.0 |
MS | Maternal siblings (full + half) | all |
PS | Paternal siblings (full + half) | all |
MGM | Maternal grandmother | up to 1.3 * |
PGF | Paternal grandfather | up to 1.3 * |
MGF | Maternal grandfather (+ paternal grandmother) | up to 1.3 * |
UA | avuncular (niece/nephew – aunt/uncle, full + half) | up to 1.3 |
(M/P)(M/P/F)A | Mother’s/Father’s Maternal/Paternal/Full sibling | from 2.0 * |
The implementation of MakeAgePrior
differs considerably depending on whether or not a pedigree is provided as input, and whether discrete or overlapping generations are assumed. The details are described for each of these situations, and a worked example is given for a pedigree with overlapping generations.
When generations do not overlap, it is often most straightforward to use generation number instead of birth year, such that parents and offspring are born 1 time unit apart. We can initialise this ageprior as follows:
MakeAgePrior(Discrete = TRUE)
## Ageprior: Default 0/1, discrete generations, MaxAgeParent=1,1
Figure 3.1: Ageprior for discrete generations
## M P FS MS PS
## 0 0 0 1 1 1
## 1 1 1 0 0 0
and ‘1’ (pale green in heatmap) indicates that a age-relationship combination is possible, and 0 (black) the impossible combinations.
It is possible to use discrete generations with a larger time interval. For example, pink salmon have a strict two-year life-cycle with odd-year and even-year populations:
MakeAgePrior(Discrete = TRUE, MaxAgeParent = 2, Plot=FALSE)
## Ageprior: Default 0/1, discrete generations, MaxAgeParent=2,2
## M P FS MS PS
## 0 0 0 1 1 1
## 1 0 0 0 0 0
## 2 1 1 0 0 0
When generations (may) overlap, and there is no pedigree yet, you must either provide an estimate of the maximum age of parents, or LifeHistData
with a representative set of birth years.
For example, if females breed to maximum age 3, and males to maximum age 2,
MakeAgePrior(MaxAgeParent = c(3, 2))
## Ageprior: Default 0/1, overlapping generations, MaxAgeParent=3,2
Figure 3.2: Ageprior for overlapping generations, MaxAgeParent is 3 for dams and 2 for sires.
## M P FS MS PS
## 0 0 0 1 1 1
## 1 1 1 1 1 1
## 2 1 1 0 1 0
## 3 1 0 0 1 0
When MaxAgeParent
is not provided, it is set to the maximum age difference observed in LifeHistData
.
Note that when assuming overlapping generations, aunts and uncles may be born in the same year as their nieces/nephews (\(\alpha_{A=0,R=UA} >0\)). Considering these relationship alteratives (maternal/paternal full/half avuncular) may lower the assignment rate of siblings compared to a situation where discrete generations are assumed – both correct and incorrect ones.
When generations do not overlap the ageprior as estimated from a pedigree will look indistinguishable from the ageprior defined without any pedigree.
When a pedigree is provided and , a check is done to confirm that all parent-offspring pairs do indeed have an age difference of 1 (typically), and that all sibling pairs have an age difference of 0 (strictly); any violations result in an error. When (the default), and it is found that these conditions are met, is set to TRUE internally, and thereby and are set to FALSE, but only if there are at least 20 pairs each of mother-offspring, father-offspring, maternal siblings, and paternal siblings (full siblings count here as both maternal and as paternal siblings).
When a large number of parents with known age is assigned, the age difference distribution per relationship can simply be calculated from a table of counts. When the number is small, a weighed average is taken between these observed proportions and the default flat 0/1 prior (see section Flatten). Even with moderate sample sizes bumps and gaps can exist in the empirical distribution purely by chance, which are ‘Smoothened’ out by default (see section Smooth).
The process is illustrated using an imaginary population of griffins were each year exactly 20 baby griffins are born. From 2001 to 2010, all juveniles are sampled, and parents assigned.
data(Ped_griffin)
tail(Ped_griffin)
## id dam sire birthyear
## 195 i195_2010_M i157_2008_F i163_2009_M 2010
## 196 i196_2010_M i148_2008_F i127_2007_M 2010
## 197 i197_2010_F i170_2009_F i127_2007_M 2010
## 198 i198_2010_M i166_2009_F i178_2009_M 2010
## 199 i199_2010_F i165_2009_F i141_2008_M 2010
## 200 i200_2010_F i166_2009_F i142_2008_M 2010
We can calculate the ageprior matrix, and return all the intermediate steps as well:
AP.griffin <- MakeAgePrior(Ped_griffin, Smooth=FALSE, Flatten=FALSE, Return="all")
## Ageprior: Pedigree-based, overlapping generations, MaxAgeParent = 3,3
Figure 3.3: Ageprior for overlapping generations (griffin example)
Were we see that both dams and sires have a maximum age of 3 years, and that for dams (M: Maternal) age 1 is most common (darkest green), while for sires (P:Paternal) age 2 is most common.
names(AP.griffin)
## [1] "BirthYearRange" "MaxAgeParent" "tblA.R"
## [4] "PA.R" "Weights" "LR.RU.A.unweighed"
## [7] "LR.RU.A" "Specs.AP"
The list element tblA.R
contains the raw counts of each age \(A\) per relationship \(R\):
AP.griffin[["tblA.R"]]
## M P FS MS PS X
## 0 0 0 3 89 59 3800
## 1 105 58 2 103 85 3600
## 2 54 83 0 24 15 3200
## 3 8 22 0 0 0 2800
## 4 0 0 0 0 0 2400
## 5 0 0 0 0 0 2000
## 6 0 0 0 0 0 1600
## 7 0 0 0 0 0 1200
## 8 0 0 0 0 0 800
## 9 0 0 0 0 0 400
Maternal ages in this table are calculated as follows:
for (x in c("id", "dam", "sire")) {
Ped_griffin[,x] <- as.character(Ped_griffin[,x])
}
Ped.g2 <- merge(Ped_griffin,
setNames(Ped_griffin[, c("id", "birthyear")], c("dam", "BY.dam")),
all.x=TRUE)
Ped.g2$Age.dam <- with(Ped.g2, birthyear - BY.dam)
table(Ped.g2$Age.dam)
##
## 1 2 3
## 105 54 8
and analogous for paternal age.
Siblings are identified using function GetRelCat
, and all pairwise age differences are calculated using outer
:
# this is part of the code in 'MakeAgePrior':
Ped.R <- setNames(Ped_griffin, c("id", "dam", "sire", "BirthYear"))
RCM <- sapply(seq_along(Ped.R$id), GetRelCat, Ped.R, GenBack=1)
# NOTE: GetRelCat is slow for very large pedigrees
AgeDifM <- outer(Ped.R$BirthYear, Ped.R$BirthYear, "-")
diag(AgeDifM) <- NA
MaxT <- 9 # determined internally based on LifeHistData
RR <- c("M", "P", "FS", "MS", "PS")
tblA.R <- matrix(NA, MaxT+1, length(RR)+1, dimnames=list(0:MaxT, c(RR, "X")))
tblA.R[, "FS"] <- table(factor(abs(AgeDifM[RCM == "FS"]), levels=0:MaxT)) /2
tblA.R[, "MS"] <- table(factor(abs(AgeDifM[RCM %in% c("FS", "MHS")]), levels=0:MaxT)) /2
tblA.R[, "PS"] <- table(factor(abs(AgeDifM[RCM %in% c("FS", "PHS")]), levels=0:MaxT)) /2
# Reference: age difference distribution across all pairs of individuals
tblA.R[, "X"] <- table(factor(abs(AgeDifM), levels=0:MaxT))/2
tblA.R
## M P FS MS PS X
## 0 NA NA 3 89 59 1900
## 1 NA NA 2 103 85 3600
## 2 NA NA 0 24 15 3200
## 3 NA NA 0 0 0 2800
## 4 NA NA 0 0 0 2400
## 5 NA NA 0 0 0 2000
## 6 NA NA 0 0 0 1600
## 7 NA NA 0 0 0 1200
## 8 NA NA 0 0 0 800
## 9 NA NA 0 0 0 400
Taking the absolute value and dividing by 2 ensures that each pair \(i\), \(j\) is counted exactly once, both when they are born in different years and when they are born in the same year. Using factor
with pre-set levels ensures that the sub-tables are lined up properly, and speeds up table
enormously.
From the count data \(P(A|R)\) is calculated by simply dividing by the column sums:
tblA.R <- AP.griffin[["tblA.R"]] # with dam & sire columns
PA.R <- sweep(tblA.R, 2, STATS=colSums(tblA.R), FUN="/")
round(PA.R, 2)
## M P FS MS PS X
## 0 0.00 0.00 0.6 0.41 0.37 0.17
## 1 0.63 0.36 0.4 0.48 0.53 0.17
## 2 0.32 0.51 0.0 0.11 0.09 0.15
## 3 0.05 0.13 0.0 0.00 0.00 0.13
## 4 0.00 0.00 0.0 0.00 0.00 0.11
## 5 0.00 0.00 0.0 0.00 0.00 0.09
## 6 0.00 0.00 0.0 0.00 0.00 0.07
## 7 0.00 0.00 0.0 0.00 0.00 0.06
## 8 0.00 0.00 0.0 0.00 0.00 0.04
## 9 0.00 0.00 0.0 0.00 0.00 0.02
and the age-difference based probability ratio is calculated by dividing each relationship column by the X
column
LR.A.R <- sweep(PA.R[, 1:5], 1, STATS=PA.R[,"X"], FUN="/")
round(LR.A.R, 2)
## M P FS MS PS
## 0 0.00 0.00 3.44 2.36 2.13
## 1 3.81 2.15 2.42 2.89 3.24
## 2 2.20 3.47 0.00 0.76 0.64
## 3 0.37 1.05 0.00 0.00 0.00
## 4 0.00 0.00 0.00 0.00 0.00
## 5 0.00 0.00 0.00 0.00 0.00
## 6 0.00 0.00 0.00 0.00 0.00
## 7 0.00 0.00 0.00 0.00 0.00
## 8 0.00 0.00 0.00 0.00 0.00
## 9 0.00 0.00 0.00 0.00 0.00
This is the non-flattened, non-smoothed ageprior, stored in list element [["LR.RU.A.unweighed"]]
. Since in this example Smooth=FALSE
and Flatten=FALSE
, it is identical to list element [["LR.RU.A"]]
, which is the default output.
The ageprior (or any similar matrix) can be visualised using PlotAgePrior
, or using for example image
in base R or ggplot2
’s geom_tile()
.
The ageprior distribution for avuncular pairs (aunts/uncles) has changed considerably from sequoia v1.3.3 to v2.0. Earlier, it was assumed that the ageprior for maternal and paternal full- and half- aunts and uncles was approximately similar, and approximately symmetrical around zero. From version 2.0 both these assumptions have been dropped, and there are now 6 avuncular classes (FS, MS, or PS of dam or sire) with agepriors that are not symmetrical around zero. The latter allows for the fact that aunts/uncles are often older than their nieces/nephews, but not necessarily so.
The agepriors for grandparents and avuncular relationships have always been estimated from parent and sibling age distributions, but this estimation has now been moved from MakeAgePrior
in R (up to v.1.3.3) to the Fortran part. The extra columns are calculated just before parentage assignment or sibship clustering. This is in contrast to the ‘regular’ ageprior, which is updated just after parentage assignment (see also pipeline in Figure 1.1).
The probability that individual \(i\) and say its maternal grandfather \(j\) have an age difference \(A_{i,j}\) is the sum over the possible ages of \(i\)’s ‘in-between’ mother \(k\), weighed by the agepriors for \(k\) being \(i\)’s mother, and \(j\) being \(k\)’s father, given their age differences.
Or, in mathematical notation:
\[\begin{equation} P(A_{i,j} = y | j \text{ MGF of }i) \propto \sum_{x=0}^{Tmax} \sum_{z=0}^{Tmax} I(x + z = y) P(A_{i,k} = x | k \text{ dam of } i) P(A_{k,j} = z | j \text{ sire of } k) \end{equation}\]
where \[\begin{align} P(A_{i,k} = x | k \text{ dam of } i) = \frac{\alpha_{x, dam}}{\sum \alpha_{., dam}} \\ P(A_{k,j} = z | j \text{ sire of } k) = \frac{\alpha_{z, sire}}{\sum \alpha_{., sire}} \end{align}\]
and \(I(x + z = y)\) is an indicator that equals \(1\) if \(x+z=y\), and \(0\) otherwise, and \(Tmax\) is the maximum age of parents.
For avuncular pairs, we consider that \(j\) might be either an older sibling or a younger sibling of \(i\)’s parent \(k\):
\[\begin{equation} P(A_{i,j} = y | j \text{ MFA of }i) \propto \sum_{x=0}^{Tmax} \sum_{z= -Tmax}^{Tmax} I(x + z = y) P(A_{i,k} = x | k \text{ dam of } i) P(A_{k,j} = z | j \text{ full sib of } k) \end{equation}\]
The grandparent and avuncular agepriors are then scaled to allow direct comparison with the other agepriors. The scaling factor used is the average of the sums of the mother and father agepriors; these may differ somewhat from each other if their range of possible ages differs.
The calculation of the extra columns can (currently) only be done by running sequoia
.
GenoS <- SimGeno(Ped_griffin, nSnp=400, ParMis=0.4)
Ped_griffin$id <- as.character(Ped_griffin$id)
griffin.sex <- sapply(Ped_griffin$ID, function(x) substr(x, start=nchar(x), stop=nchar(x)))
LH.griffin <- data.frame(ID = Ped_griffin$ID,
Sex = ifelse(griffin.sex=="F", 1, 2),
BirthYear = Ped_griffin$birthyear)
SeqOUT.griffin <- sequoia(GenoS, LH.griffin,
MaxSibIter = 10,
args.AP = list(Smooth = FALSE))
data(SeqOUT_griffin) # example output included with package
PlotAgePrior(SeqOUT_griffin$AgePriorExtra)
Figure 4.1: Ageprior with extra columns for Griffin example
This AgePriorExtra
(Figure 4.1) illustrates the following universal properties:
as well as subtle differences between the different avuncular classes, which may be more pronounced for some life histories (abbreviations: e.g. MPA = maternal-paternal avuncular, or paternal half-sibling of mother).
With small to moderate sample sizes some age/relationship combinations may be over- or under- represented by chance, or even completely absent. In the latter case, the uncorrected ageprior \(\alpha_{A,R} = P(A|R)/P(A) = 0\) specifies for subsequent pedigree reconstruction specifies that this age/relationship combination is always impossible, including for unsampled pairs and for individuals of unknown (estimated) age. This can result of all kinds of false negatives, as well as false positives: e.g. when half-siblings \(i\) and \(j\) can (wrongly-presumed) not be maternal siblings, they must be paternal siblings, and subsequently more individuals may be added to this erroneous paternal sibship. Alternatively, certain age-differences may be biologically plausible but do not occur at all in the sample, causing the denominator in Equation (??) \(P(A)=0\), and \(\alpha_{A,R}\) to be undefined.
To minimise these problems, MakeAgePrior
applies two generic, one-size-fits-most corrections: a fairly subtle Smooth
ing of dips and padding of the tails, and a more rigorous Flatten
ing approach.
By default, MakeAgePrior
smooths the distribution, by ‘stretching’ the tails and smoothing over any dips. It is set to TRUE
by default, as in many cases only a subset of the possible parental ages is sampled, and there is only rarely a detrimental effect of smoothing when none is required. Exception is when generations are discrete (non-overlapping); when this is detected or declared by the user, Smooth
is set to FALSE
automatically.
There is no way to change the behaviour of Smooth
, except turning it on or off. If for example a longer tail is required at either end, or more thorough smoothing is needed, either consider using Flatten=TRUE
, or edit the ageprior matrix by hand and supply it as an argument to sequoia
:
MyAgePrior <- MakeAgePrior(LifeHistData) # with or without scaffold/old pedigree
# (...) # edit by hand
SeqOUT <- sequoia(Genotypes, LifeHistData, SeqList(AgePriors = MyAgePrior))
The ageprior is ‘flattened’ by taking a weighed average between the flat, default 0/1 ageprior without any pedigree information, and the ageprior based on the pedigree and birth years:
\[\begin{equation} \alpha* = W_R \times \frac{P(A|R, \text{sampled})}{P(A, \text{sampled})} + (1-W_R) \times I(A,R) \end{equation}\]
where \(W_R\) is a weight based on \(N_{R}\), the number of pairs with relationship \(R\) and known age difference, and \(I(A,R)\) is a 0/1 indicator whether the age-relationship combination is biologically possible (0 for \(A=0\) and parent-offspring, 1 otherwise).
The weighing factor \(W_R\) follows a sigmoid curve, calculated as \(W_R = 1 - \exp(-\lambda_{N,W} * N_R)\). By default \(\lambda_{N,W}=-\log(0.5)/100\), which corresponds to \(W_R< 0.5\) if \(N_R < 100\), but this can be altered with argument (see Figure 5.1). When \(\lambda_{N,W}\) is large (say \(>0.2\)), Flatten
is effectively only applied for relationships with very small sample size and the ‘raw’ pedigree-based estimate used otherwise, while when \(\lambda_{N,W}\) is small (say \(<0.0001\)) there is effectively no contribution of the pedigree.
Figure 5.1: Weights versus number of pairs with relationship R, for weighed average between pedigree-derived ageprior and flat 0/1 ageprior
In theory, the sampling probability per relationship could be estimated, and used to calculate some correction factor for equation (??). In practice, this requires some strong assumptions about the number of assignable parents (i.e., distinction between pedigree founders and non-founders) and the number of sibling pairs (i.e. distribution of sibship sizes). This correction factor would depend on the total number of sampled individuals in a non-nonsensical way: removing individuals without assigned parents from the pedigree would ‘magically’ increase our faith in the ageprior estimated from the remainder of the pedigree. Therefore, the degree of correction by Flatten
depends only on the number of individuals per relationship with known age difference, \(N_R\).
In the smooth
ed to the original ageprior, future parents may be age 4 (with an ageprior half that of age 3), and very rarely even age 5,
(Figure 5.2), while in the scaffold pedigree they are never older than age 3 (see also section With pedigree, overlapping generations). Resultingly, the age difference between siblings can be larger too. If the observed parental ages had started at say age 7, the ageprior for age 6 would be set to a small value (0.001) too, but not age 5: it is assumed that larger age differences are easier to have been missed than smaller age differences.
Figure 5.2: Smoothed & Flattened agepriors for overlapping generations (griffin example). Note the different y-axes; non-displayed age differences have an ageprior of zero for all R.
With Flatten=TRUE
the ageprior looks rather different, with (as is typical) negligible additional effect of Smooth
or not (bottom row in Figure 5.2). The weight for maternal siblings (MS) is larger than for paternal siblings (PS),
AP.griffin$Weights
## M P FS MS PS
## 0.6857 0.6769 0.0341 0.7762 0.6678
as by chance there is a large maternal sibship (see SummarySeq(Ped_griffin)
), resulting in more maternal than paternal sibling pairs:
N.R <- colSums(AP.griffin$tblA.R)
N.R
## M P FS MS PS X
## 167 163 5 216 159 21800
and we can use these counts of pairs to double check the weights:
lambdaNW <- -log(0.5)/100 # the default
W.R <- 1 - exp(-lambdaNW * N.R)
round(W.R, 4)
## M P FS MS PS X
## 0.6857 0.6769 0.0341 0.7762 0.6678 1.0000
In some study systems, it is very difficult or too expensive to genotype the (candidate/known) parents of one sex. In this example we assume that no candidate fathers are genotyped in a population with discrete generations, but the same principle holds when mothers are not genotyped and/or generations overlap.
data(Ped_HSg5, LH_HSg5)
# Simulate data: 20% of mothers non-genotyped & 100% of fathers
GM <- SimGeno(Ped_HSg5, nSnp=400, ParMis=c(0.2, 1.0))
ParOUT <- sequoia(GM, LH_HSg5, MaxSibIter = 0, Plot=FALSE)
PlotAgePrior(ParOUT$AgePriors)
Figure 5.3: Example: no fathers genotyped
In the output the maximum age of parents is stated as 1 for mothers (from the assigned mother-offspring pairs), and 5 for fathers (using as best guess the observed age range in the lifehistory data). Since there are relationships with fewer than 20 pairs with known age difference (P, PS and FS), Flatten
is automatically set to TRUE
.
The ageprior for fathers \(\alpha_{A,R=P}\) equals \(1\) for age differences \(A\) between 1 and MaxAgeParent
, inclusive. Since there is no convincing evidence that the generations are non-overlapping, the default Smooth=TRUE
is used, which adds a ‘tail’ of 2 extra possible years of age differences for all relationship categories.
If we know for sure that fathers cannot be older than 3 years, but want to estimate the maximum age of mothers from the data, we can use
ParOUT <- sequoia(GM, LH_HSg5, MaxSibIter = 0,
args.AP = list(MaxAgeParent=c(NA, 3),
Smooth = FALSE))
Here Flatten=FALSE
would simply be ignored, as the ageprior would be undefined. Using Flatten=FALSE
with 5–20 pairs for some relationships is discouraged and results in a warning.