1 In short

The relative age of individuals can be very useful in pedigree reconstruction, both to filter potential relatives and during assignment. Age difference based probability ratios for different relationships are calculated based on a skeleton parentage-only pedigree and the birth years provided in LifeHistData (Figure 1.1).

This ‘ageprior’ (Glossary in Table 1.1; please note that this is not an official term, nor a proper prior) was calculated quietly in the background up until version 1.3.3, but from version 2.0 you will see heatmaps and messages such as

These are not warnings, but inform you what is going on what is going on behind the scenes. This will hopefully minimise unintended age priors as a source of error. There are four parts to the message:

In addition MakeAgePrior allows more explicit control from version 2.0 onwards, so that ‘hand-tailoring’ will only be required in rare cases. Arguments can be passed to MakeAgePrior when running sequoia via args.AP.

Pipeline overview

Figure 1.1: Pipeline overview

The ageprior matrix now has 5 columns (more in sequoia versions before 2.0), corresponding to parent-offspring and siblings (Table 2.1). The number of rows equals the maximum age of parents plus one; the first row is for individuals born in the same year (age difference \(A=0\)), the second row for individuals born one year apart, etc. Some values are fixed to zero: parents and their offspring can never be exactly the same age.

1.1 Why & how: use of age in pedigree reconstruction

Based on the species’ age at first and last reproduction, some age differences between parent and offspring or between siblings are more likely than others, and some are downright impossible. Age differences are calculated from the birth years (or other time unit) in LifeHistData, and are used both to filter potential relatives (and reduce the number of time-consuming computations) and during assignment, for example to distinguish between half-siblings, grandparent–grand-offspring and full avuncular pairs.

By default, for the parentage assignment only a 0/1 coded ‘flat’ ageprior is used to indicate which age-differences are possible. If any parents with known age are assigned, the age prior is updated before use during sibship clustering and grandparent assignment (Figure 1.1). The ageprior is not updated during or after the sibship clustering.

If an ageprior is provided as argument to sequoia (as an element of SeqList), this ageprior will be used instead of calling MakeAgePrior (details further down).

Table 1.1: Terms and abbreviations
Symbol Term Definition
\(BY_i\) Birth year Time unit (year, decade, month) of \(i\)’s birth/ hatching
\(A_{i,j}\) Age difference \(BY_i - BY_j\), i.e. if \(j\) is older than \(i\), than \(A_{i,j}\) is positive
\(R_{i,j}\) Relationship Relationship between \(i\) and \(j\), e.g. parent-offspring
\(\alpha_{A,R}\) Ageprior Probability ratio of relationship \(R\) given \(A\), versus \(R\) across all age differences

1.2 Interpretation

The ‘AgePrior’ matrix is a set of age-difference based probability ratios which can be interpreted as

For example, in a species where females lay large clutches but only breed for a few years, two individuals \(i\) and \(j\) born in the same year (\(BY_i=BY_j\), \(A_{i,j}=0\)) may be more likely to be maternal siblings (\(R_{i,j} =\)MS, relationships in Table 2.1) than two randomly picked individuals (\(\alpha_{0,MS} > 1\)). In contrast, in a species where singletons are the norm and twins are rare, individuals born in the same year are unlikely to be maternal siblings (\(\alpha_{0,MS} < 1\)).

2 Maths

The ageprior is defined as \[\begin{equation} \alpha_{A,R} = \frac{P(R|A)}{P(R)} \text{ .} \label{eq:AP} \end{equation}\]

Using Bayes’ rule that
\[\begin{equation} P(R|A) = \frac{P(A|R) P(R)}{P(A)} \end{equation}\]

it follows that \[\begin{equation} \alpha_{A,R} = \frac{P(R|A)}{P(R)} = \frac{P(A|R)}{P(A)} \text{ ,} \label{eq:AP2} \end{equation}\] When a pedigree and birth years are provided, \(P(A)\) and \(P(A|R)\) are simply calculated as the observed proportions of individual pairs with age-difference \(A\) (and relationship \(R\)) (for an example, see further).

When using the ageprior for relationships with and between dummy individuals (‘stand-ins’ for unsampled parents of sibship clusters), the assumption is made that the effect of age on relationship probability doesn’t differ (much) between sampled and unsampled individuals:

\[\begin{equation} \frac{P(R|A, \text{sampled})}{P(R| \text{sampled})} \approx \frac{P(R|A | \text{unsampled})}{P(R | \text{unsampled})} \label{eq:APsampled} \end{equation}\]

This does mean that sampling probability is assumed to be independent of \(A\): any sampling period is finite, thus there are always fewer pairs with large age differences than with small age differences. It does assume that the probability to be sampled is independent of relationship \(R\), so that

\[\begin{equation} \frac{P(A|R, \text{sampled})}{P(A| \text{sampled})} \approx \frac{P(A|R, \text{unsampled})}{P(A| \text{unsampled})} \text{.} \end{equation}\]

During pedigree reconstruction, this ageprior can simply be multiplied by the genotypes-dependend probability,

\[\begin{equation} P(R_{i,j}| G_i, G_j, A_{i,j}) = P(R_{i,j} | G_i, G_j) \times P(R_{i,j} | A_{i,j}) \text{ ,} \end{equation}\] as it can be reasonably assumed that conditional on the relationship \(R\), the genotypes \(G\) and age difference \(A\) are independent (at least on the time scales relevant for pedigree reconstruction). The calculation of \(P(R_{i,j} | G_i, G_j)\) is described in detail in (Huisman, 2017).

2.1 When & why small sample correction is needed

The main challenge is when there are very few or no sampled pairs with age difference \(A\), while that age difference is biologically plausible for one or more relationships in Table 2.1. In other words, if \(P(A|\text{sampled}) = 0\) in equation (??), this leads to division by zero and undefined \(\alpha_{A,R}\), while we know that \(P(R|A) >0\).

Or, as another extreme example, when no candidate fathers were sampled (\(P(A | R=P, \text{sampled}) = 0\)) then \(\alpha_{A=0,R=P} =0\). The dummy fathers that will stand in for these unsampled fathers will only be assigned when any \(\alpha_{A,R=P} >0\). Consequently, when simply using the ‘raw’ ratio from the observed data, no dummy fathers will be assigned.

To mitigate these issues, the function uses two types of small sample correction, adjustable with the arguments ‘Flatten’ and ‘Smooth’. See section Small sample correction for details.

Table 2.1: AgePriors column names. *: From sequoia v2.0 only in AgePriorExtra
Column Meaning Version
M Mother - offspring all
P Father - offspring all
FS Full siblings from 1.0
MS Maternal siblings (full + half) all
PS Paternal siblings (full + half) all
MGM Maternal grandmother up to 1.3 *
PGF Paternal grandfather up to 1.3 *
MGF Maternal grandfather (+ paternal grandmother) up to 1.3 *
UA avuncular (niece/nephew – aunt/uncle, full + half) up to 1.3
(M/P)(M/P/F)A Mother’s/Father’s Maternal/Paternal/Full sibling from 2.0 *

3 Implementation

The implementation of MakeAgePrior differs considerably depending on whether or not a pedigree is provided as input, and whether discrete or overlapping generations are assumed. The details are described for each of these situations, and a worked example is given for a pedigree with overlapping generations.

3.1 No pedigree, discrete generations

When generations do not overlap, it is often most straightforward to use generation number instead of birth year, such that parents and offspring are born 1 time unit apart. We can initialise this ageprior as follows:

MakeAgePrior(Discrete = TRUE)
## Ageprior: Default 0/1, discrete generations, MaxAgeParent=1,1
Ageprior for discrete generations

Figure 3.1: Ageprior for discrete generations

##   M P FS MS PS
## 0 0 0  1  1  1
## 1 1 1  0  0  0

and ‘1’ (pale green in heatmap) indicates that a age-relationship combination is possible, and 0 (black) the impossible combinations.

It is possible to use discrete generations with a larger time interval. For example, pink salmon have a strict two-year life-cycle with odd-year and even-year populations:

MakeAgePrior(Discrete = TRUE, MaxAgeParent = 2, Plot=FALSE)
## Ageprior: Default 0/1, discrete generations, MaxAgeParent=2,2
##   M P FS MS PS
## 0 0 0  1  1  1
## 1 0 0  0  0  0
## 2 1 1  0  0  0

3.2 No pedigree, (potentially) overlapping generations

When generations (may) overlap, and there is no pedigree yet, you must either provide an estimate of the maximum age of parents, or LifeHistData with a representative set of birth years.

For example, if females breed to maximum age 3, and males to maximum age 2,

MakeAgePrior(MaxAgeParent = c(3, 2))
## Ageprior: Default 0/1, overlapping generations, MaxAgeParent=3,2
Ageprior for overlapping generations, MaxAgeParent is 3 for dams and 2 for sires.

Figure 3.2: Ageprior for overlapping generations, MaxAgeParent is 3 for dams and 2 for sires.

##   M P FS MS PS
## 0 0 0  1  1  1
## 1 1 1  1  1  1
## 2 1 1  0  1  0
## 3 1 0  0  1  0

When MaxAgeParent is not provided, it is set to the maximum age difference observed in LifeHistData.

Note that when assuming overlapping generations, aunts and uncles may be born in the same year as their nieces/nephews (\(\alpha_{A=0,R=UA} >0\)). Considering these relationship alteratives (maternal/paternal full/half avuncular) may lower the assignment rate of siblings compared to a situation where discrete generations are assumed – both correct and incorrect ones.

3.3 With pedigree, discrete generations

When generations do not overlap the ageprior as estimated from a pedigree will look indistinguishable from the ageprior defined without any pedigree.

When a pedigree is provided and , a check is done to confirm that all parent-offspring pairs do indeed have an age difference of 1 (typically), and that all sibling pairs have an age difference of 0 (strictly); any violations result in an error. When (the default), and it is found that these conditions are met, is set to TRUE internally, and thereby and are set to FALSE, but only if there are at least 20 pairs each of mother-offspring, father-offspring, maternal siblings, and paternal siblings (full siblings count here as both maternal and as paternal siblings).

3.4 With pedigree, overlapping generations

When a large number of parents with known age is assigned, the age difference distribution per relationship can simply be calculated from a table of counts. When the number is small, a weighed average is taken between these observed proportions and the default flat 0/1 prior (see section Flatten). Even with moderate sample sizes bumps and gaps can exist in the empirical distribution purely by chance, which are ‘Smoothened’ out by default (see section Smooth).

The process is illustrated using an imaginary population of griffins were each year exactly 20 baby griffins are born. From 2001 to 2010, all juveniles are sampled, and parents assigned.

data(Ped_griffin)
tail(Ped_griffin)
##              id         dam        sire birthyear
## 195 i195_2010_M i157_2008_F i163_2009_M      2010
## 196 i196_2010_M i148_2008_F i127_2007_M      2010
## 197 i197_2010_F i170_2009_F i127_2007_M      2010
## 198 i198_2010_M i166_2009_F i178_2009_M      2010
## 199 i199_2010_F i165_2009_F i141_2008_M      2010
## 200 i200_2010_F i166_2009_F i142_2008_M      2010

We can calculate the ageprior matrix, and return all the intermediate steps as well:

AP.griffin <- MakeAgePrior(Ped_griffin, Smooth=FALSE, Flatten=FALSE, Return="all")
## Ageprior: Pedigree-based, overlapping generations, MaxAgeParent = 3,3
Ageprior for overlapping generations (griffin example)

Figure 3.3: Ageprior for overlapping generations (griffin example)

Were we see that both dams and sires have a maximum age of 3 years, and that for dams (M: Maternal) age 1 is most common (darkest green), while for sires (P:Paternal) age 2 is most common.

names(AP.griffin)
## [1] "BirthYearRange"    "MaxAgeParent"      "tblA.R"           
## [4] "PA.R"              "Weights"           "LR.RU.A.unweighed"
## [7] "LR.RU.A"           "Specs.AP"

3.4.1 Counting the number of pairs per age & relationship

The list element tblA.R contains the raw counts of each age \(A\) per relationship \(R\):

AP.griffin[["tblA.R"]]
##     M  P FS  MS PS    X
## 0   0  0  3  89 59 3800
## 1 105 58  2 103 85 3600
## 2  54 83  0  24 15 3200
## 3   8 22  0   0  0 2800
## 4   0  0  0   0  0 2400
## 5   0  0  0   0  0 2000
## 6   0  0  0   0  0 1600
## 7   0  0  0   0  0 1200
## 8   0  0  0   0  0  800
## 9   0  0  0   0  0  400

Maternal ages in this table are calculated as follows:

for (x in c("id", "dam", "sire")) {
  Ped_griffin[,x] <- as.character(Ped_griffin[,x])
}
Ped.g2 <- merge(Ped_griffin, 
                setNames(Ped_griffin[, c("id", "birthyear")], c("dam", "BY.dam")),
                all.x=TRUE)
Ped.g2$Age.dam <- with(Ped.g2, birthyear - BY.dam)
table(Ped.g2$Age.dam)
## 
##   1   2   3 
## 105  54   8

and analogous for paternal age.

Siblings are identified using function GetRelCat, and all pairwise age differences are calculated using outer:

# this is part of the code in 'MakeAgePrior':
Ped.R <- setNames(Ped_griffin, c("id", "dam", "sire", "BirthYear"))  
RCM <- sapply(seq_along(Ped.R$id), GetRelCat, Ped.R, GenBack=1)  
# NOTE: GetRelCat is slow for very large pedigrees
AgeDifM <- outer(Ped.R$BirthYear, Ped.R$BirthYear, "-")
diag(AgeDifM) <- NA

MaxT <- 9   # determined internally based on LifeHistData
RR <- c("M", "P", "FS", "MS", "PS")
tblA.R <- matrix(NA, MaxT+1, length(RR)+1, dimnames=list(0:MaxT, c(RR, "X")))

tblA.R[, "FS"] <- table(factor(abs(AgeDifM[RCM == "FS"]), levels=0:MaxT)) /2
tblA.R[, "MS"] <- table(factor(abs(AgeDifM[RCM %in% c("FS", "MHS")]), levels=0:MaxT)) /2
tblA.R[, "PS"] <- table(factor(abs(AgeDifM[RCM %in% c("FS", "PHS")]), levels=0:MaxT)) /2

# Reference: age difference distribution across all pairs of individuals
tblA.R[, "X"] <- table(factor(abs(AgeDifM), levels=0:MaxT))/2
  
tblA.R
##    M  P FS  MS PS    X
## 0 NA NA  3  89 59 1900
## 1 NA NA  2 103 85 3600
## 2 NA NA  0  24 15 3200
## 3 NA NA  0   0  0 2800
## 4 NA NA  0   0  0 2400
## 5 NA NA  0   0  0 2000
## 6 NA NA  0   0  0 1600
## 7 NA NA  0   0  0 1200
## 8 NA NA  0   0  0  800
## 9 NA NA  0   0  0  400

Taking the absolute value and dividing by 2 ensures that each pair \(i\), \(j\) is counted exactly once, both when they are born in different years and when they are born in the same year. Using factor with pre-set levels ensures that the sub-tables are lined up properly, and speeds up table enormously.

3.4.2 Calculating ageprior P(A|R)/P(A)

From the count data \(P(A|R)\) is calculated by simply dividing by the column sums:

tblA.R <- AP.griffin[["tblA.R"]]    # with dam & sire columns
PA.R <- sweep(tblA.R, 2, STATS=colSums(tblA.R), FUN="/")
round(PA.R, 2)
##      M    P  FS   MS   PS    X
## 0 0.00 0.00 0.6 0.41 0.37 0.17
## 1 0.63 0.36 0.4 0.48 0.53 0.17
## 2 0.32 0.51 0.0 0.11 0.09 0.15
## 3 0.05 0.13 0.0 0.00 0.00 0.13
## 4 0.00 0.00 0.0 0.00 0.00 0.11
## 5 0.00 0.00 0.0 0.00 0.00 0.09
## 6 0.00 0.00 0.0 0.00 0.00 0.07
## 7 0.00 0.00 0.0 0.00 0.00 0.06
## 8 0.00 0.00 0.0 0.00 0.00 0.04
## 9 0.00 0.00 0.0 0.00 0.00 0.02

and the age-difference based probability ratio is calculated by dividing each relationship column by the X column

LR.A.R <- sweep(PA.R[, 1:5], 1, STATS=PA.R[,"X"], FUN="/")
round(LR.A.R, 2)
##      M    P   FS   MS   PS
## 0 0.00 0.00 3.44 2.36 2.13
## 1 3.81 2.15 2.42 2.89 3.24
## 2 2.20 3.47 0.00 0.76 0.64
## 3 0.37 1.05 0.00 0.00 0.00
## 4 0.00 0.00 0.00 0.00 0.00
## 5 0.00 0.00 0.00 0.00 0.00
## 6 0.00 0.00 0.00 0.00 0.00
## 7 0.00 0.00 0.00 0.00 0.00
## 8 0.00 0.00 0.00 0.00 0.00
## 9 0.00 0.00 0.00 0.00 0.00

This is the non-flattened, non-smoothed ageprior, stored in list element [["LR.RU.A.unweighed"]]. Since in this example Smooth=FALSE and Flatten=FALSE, it is identical to list element [["LR.RU.A"]], which is the default output.

3.4.3 visualisation

The ageprior (or any similar matrix) can be visualised using PlotAgePrior, or using for example image in base R or ggplot2 ’s geom_tile().

4 Grandparents and avuncular relationships

4.1 change from v1.3.3 to v2.0

The ageprior distribution for avuncular pairs (aunts/uncles) has changed considerably from sequoia v1.3.3 to v2.0. Earlier, it was assumed that the ageprior for maternal and paternal full- and half- aunts and uncles was approximately similar, and approximately symmetrical around zero. From version 2.0 both these assumptions have been dropped, and there are now 6 avuncular classes (FS, MS, or PS of dam or sire) with agepriors that are not symmetrical around zero. The latter allows for the fact that aunts/uncles are often older than their nieces/nephews, but not necessarily so.

The agepriors for grandparents and avuncular relationships have always been estimated from parent and sibling age distributions, but this estimation has now been moved from MakeAgePrior in R (up to v.1.3.3) to the Fortran part. The extra columns are calculated just before parentage assignment or sibship clustering. This is in contrast to the ‘regular’ ageprior, which is updated just after parentage assignment (see also pipeline in Figure 1.1).

4.2 Mathematical details

The probability that individual \(i\) and say its maternal grandfather \(j\) have an age difference \(A_{i,j}\) is the sum over the possible ages of \(i\)’s ‘in-between’ mother \(k\), weighed by the agepriors for \(k\) being \(i\)’s mother, and \(j\) being \(k\)’s father, given their age differences.

Or, in mathematical notation:

\[\begin{equation} P(A_{i,j} = y | j \text{ MGF of }i) \propto \sum_{x=0}^{Tmax} \sum_{z=0}^{Tmax} I(x + z = y) P(A_{i,k} = x | k \text{ dam of } i) P(A_{k,j} = z | j \text{ sire of } k) \end{equation}\]

where \[\begin{align} P(A_{i,k} = x | k \text{ dam of } i) = \frac{\alpha_{x, dam}}{\sum \alpha_{., dam}} \\ P(A_{k,j} = z | j \text{ sire of } k) = \frac{\alpha_{z, sire}}{\sum \alpha_{., sire}} \end{align}\]

and \(I(x + z = y)\) is an indicator that equals \(1\) if \(x+z=y\), and \(0\) otherwise, and \(Tmax\) is the maximum age of parents.

For avuncular pairs, we consider that \(j\) might be either an older sibling or a younger sibling of \(i\)’s parent \(k\):

\[\begin{equation} P(A_{i,j} = y | j \text{ MFA of }i) \propto \sum_{x=0}^{Tmax} \sum_{z= -Tmax}^{Tmax} I(x + z = y) P(A_{i,k} = x | k \text{ dam of } i) P(A_{k,j} = z | j \text{ full sib of } k) \end{equation}\]

The grandparent and avuncular agepriors are then scaled to allow direct comparison with the other agepriors. The scaling factor used is the average of the sums of the mother and father agepriors; these may differ somewhat from each other if their range of possible ages differs.

4.3 Griffin example

The calculation of the extra columns can (currently) only be done by running sequoia.

GenoS <- SimGeno(Ped_griffin, nSnp=400, ParMis=0.4)

Ped_griffin$id <- as.character(Ped_griffin$id)
griffin.sex <- sapply(Ped_griffin$ID, function(x) substr(x, start=nchar(x), stop=nchar(x)))
LH.griffin <- data.frame(ID = Ped_griffin$ID,
                         Sex = ifelse(griffin.sex=="F", 1, 2),
                         BirthYear = Ped_griffin$birthyear)

SeqOUT.griffin <- sequoia(GenoS, LH.griffin,
                  MaxSibIter = 10,
                  args.AP = list(Smooth = FALSE)) 
data(SeqOUT_griffin)   # example output included with package
PlotAgePrior(SeqOUT_griffin$AgePriorExtra)
Ageprior with extra columns for Griffin example

Figure 4.1: Ageprior with extra columns for Griffin example

This AgePriorExtra (Figure 4.1) illustrates the following universal properties:

  • the ageprior for siblings is symmetrical around 0;
  • grandparents are at least 2 time units older than their grand-offspring;
  • aunts/uncles tend to be older than their nieces/nephews, but may be younger;

as well as subtle differences between the different avuncular classes, which may be more pronounced for some life histories (abbreviations: e.g. MPA = maternal-paternal avuncular, or paternal half-sibling of mother).

5 Small sample correction

With small to moderate sample sizes some age/relationship combinations may be over- or under- represented by chance, or even completely absent. In the latter case, the uncorrected ageprior \(\alpha_{A,R} = P(A|R)/P(A) = 0\) specifies for subsequent pedigree reconstruction specifies that this age/relationship combination is always impossible, including for unsampled pairs and for individuals of unknown (estimated) age. This can result of all kinds of false negatives, as well as false positives: e.g. when half-siblings \(i\) and \(j\) can (wrongly-presumed) not be maternal siblings, they must be paternal siblings, and subsequently more individuals may be added to this erroneous paternal sibship. Alternatively, certain age-differences may be biologically plausible but do not occur at all in the sample, causing the denominator in Equation (??) \(P(A)=0\), and \(\alpha_{A,R}\) to be undefined.

To minimise these problems, MakeAgePrior applies two generic, one-size-fits-most corrections: a fairly subtle Smoothing of dips and padding of the tails, and a more rigorous Flattening approach.

5.1 Smooth

By default, MakeAgePrior smooths the distribution, by ‘stretching’ the tails and smoothing over any dips. It is set to TRUE by default, as in many cases only a subset of the possible parental ages is sampled, and there is only rarely a detrimental effect of smoothing when none is required. Exception is when generations are discrete (non-overlapping); when this is detected or declared by the user, Smooth is set to FALSE automatically.

There is no way to change the behaviour of Smooth, except turning it on or off. If for example a longer tail is required at either end, or more thorough smoothing is needed, either consider using Flatten=TRUE, or edit the ageprior matrix by hand and supply it as an argument to sequoia:

MyAgePrior <- MakeAgePrior(LifeHistData)   # with or without scaffold/old pedigree 
# (...)  # edit by hand
SeqOUT <- sequoia(Genotypes, LifeHistData, SeqList(AgePriors = MyAgePrior))

5.2 Flatten

The ageprior is ‘flattened’ by taking a weighed average between the flat, default 0/1 ageprior without any pedigree information, and the ageprior based on the pedigree and birth years:

\[\begin{equation} \alpha* = W_R \times \frac{P(A|R, \text{sampled})}{P(A, \text{sampled})} + (1-W_R) \times I(A,R) \end{equation}\]

where \(W_R\) is a weight based on \(N_{R}\), the number of pairs with relationship \(R\) and known age difference, and \(I(A,R)\) is a 0/1 indicator whether the age-relationship combination is biologically possible (0 for \(A=0\) and parent-offspring, 1 otherwise).

The weighing factor \(W_R\) follows a sigmoid curve, calculated as \(W_R = 1 - \exp(-\lambda_{N,W} * N_R)\). By default \(\lambda_{N,W}=-\log(0.5)/100\), which corresponds to \(W_R< 0.5\) if \(N_R < 100\), but this can be altered with argument (see Figure 5.1). When \(\lambda_{N,W}\) is large (say \(>0.2\)), Flatten is effectively only applied for relationships with very small sample size and the ‘raw’ pedigree-based estimate used otherwise, while when \(\lambda_{N,W}\) is small (say \(<0.0001\)) there is effectively no contribution of the pedigree.

Weights versus number of pairs with relationship R, for weighed average between pedigree-derived ageprior and flat 0/1 ageprior

Figure 5.1: Weights versus number of pairs with relationship R, for weighed average between pedigree-derived ageprior and flat 0/1 ageprior

5.2.1 Why not use the sampling probability

In theory, the sampling probability per relationship could be estimated, and used to calculate some correction factor for equation (??). In practice, this requires some strong assumptions about the number of assignable parents (i.e., distinction between pedigree founders and non-founders) and the number of sibling pairs (i.e. distribution of sibship sizes). This correction factor would depend on the total number of sampled individuals in a non-nonsensical way: removing individuals without assigned parents from the pedigree would ‘magically’ increase our faith in the ageprior estimated from the remainder of the pedigree. Therefore, the degree of correction by Flatten depends only on the number of individuals per relationship with known age difference, \(N_R\).

5.3 Example: griffins

In the smoothed to the original ageprior, future parents may be age 4 (with an ageprior half that of age 3), and very rarely even age 5, (Figure 5.2), while in the scaffold pedigree they are never older than age 3 (see also section With pedigree, overlapping generations). Resultingly, the age difference between siblings can be larger too. If the observed parental ages had started at say age 7, the ageprior for age 6 would be set to a small value (0.001) too, but not age 5: it is assumed that larger age differences are easier to have been missed than smaller age differences.

Smoothed & Flattened agepriors for overlapping generations (griffin example). Note the different y-axes; non-displayed age differences have an ageprior of zero for all R.

Figure 5.2: Smoothed & Flattened agepriors for overlapping generations (griffin example). Note the different y-axes; non-displayed age differences have an ageprior of zero for all R.

With Flatten=TRUE the ageprior looks rather different, with (as is typical) negligible additional effect of Smooth or not (bottom row in Figure 5.2). The weight for maternal siblings (MS) is larger than for paternal siblings (PS),

AP.griffin$Weights
##      M      P     FS     MS     PS 
## 0.6857 0.6769 0.0341 0.7762 0.6678

as by chance there is a large maternal sibship (see SummarySeq(Ped_griffin)), resulting in more maternal than paternal sibling pairs:

N.R <- colSums(AP.griffin$tblA.R)
N.R
##     M     P    FS    MS    PS     X 
##   167   163     5   216   159 21800

and we can use these counts of pairs to double check the weights:

lambdaNW <- -log(0.5)/100   # the default
W.R <-  1 - exp(-lambdaNW * N.R)
round(W.R, 4)
##      M      P     FS     MS     PS      X 
## 0.6857 0.6769 0.0341 0.7762 0.6678 1.0000

5.4 Example 2: No candidate fathers genotyped

In some study systems, it is very difficult or too expensive to genotype the (candidate/known) parents of one sex. In this example we assume that no candidate fathers are genotyped in a population with discrete generations, but the same principle holds when mothers are not genotyped and/or generations overlap.

data(Ped_HSg5, LH_HSg5)
# Simulate data: 20% of mothers non-genotyped & 100% of fathers
GM <- SimGeno(Ped_HSg5, nSnp=400, ParMis=c(0.2, 1.0))  

ParOUT <- sequoia(GM, LH_HSg5, MaxSibIter = 0, Plot=FALSE)
PlotAgePrior(ParOUT$AgePriors)
Example: no fathers genotyped

Figure 5.3: Example: no fathers genotyped

In the output the maximum age of parents is stated as 1 for mothers (from the assigned mother-offspring pairs), and 5 for fathers (using as best guess the observed age range in the lifehistory data). Since there are relationships with fewer than 20 pairs with known age difference (P, PS and FS), Flatten is automatically set to TRUE.

The ageprior for fathers \(\alpha_{A,R=P}\) equals \(1\) for age differences \(A\) between 1 and MaxAgeParent, inclusive. Since there is no convincing evidence that the generations are non-overlapping, the default Smooth=TRUE is used, which adds a ‘tail’ of 2 extra possible years of age differences for all relationship categories.

If we know for sure that fathers cannot be older than 3 years, but want to estimate the maximum age of mothers from the data, we can use

ParOUT <- sequoia(GM, LH_HSg5, MaxSibIter = 0,
                  args.AP = list(MaxAgeParent=c(NA, 3),
                                 Smooth = FALSE))

Here Flatten=FALSE would simply be ignored, as the ageprior would be undefined. Using Flatten=FALSE with 5–20 pairs for some relationships is discouraged and results in a warning.