Zooarchaeology is concerned with the analysis and inference of faunal remains recovered from archaeological sites. The zooaRch package provides analytical tools for zooarchaeological data. Functions in this package allow users to:
Functions surv.func and mort.func create confidence intervals around survivorship and mortality data respectively. surv.func takes bootstrap samples of the original dataset, calculates the Kaplan-Meier Estimator (KME) for each, and then calculates the confidence interval. The function mort.func bootstraps the mortality profiles. Defaults are 1000 iterations for the bootstrap and 95% for the confidence interval.
The data must be coded in the long way (i.e., each specimen must have its own entry). The dataframe should include two columns: one called “Genus” and one called “Ageclass.” For example, the first six entries of the Marj Rabba sheep/goat data from Price et al.2 is:
Genus | Ageclass |
---|---|
Ovis/Capra | 1 |
Ovis/Capra | 2 |
Ovis/Capra | 3 |
Ovis/Capra | 3 |
Ovis/Capra | 3 |
Ovis/Capra | 3 |
Note: the Ageclass must be numeric (i.e., you cannot use “A”, “B”, “C”, etc. for age classes).
Function arguments are very similar for both surv.func and mort.func. Here we describe the surv.func arguments. See help(mort.func) and help(surv.func) for more details on the arguments for mortality profile and survivorship bootstrapping.
surv.func(SurviveData, labels = NULL, models = NULL, usermod = NULL, ci = 95, plot = TRUE, iter = 1000)
Labels The names of the age classes. By default, they are numeric from 1 to x, where x is the total number of age classes. One can define labels following this example for nine age classes A-I: Labels.ageclass<-c(“A”, “B”, “C”, “D”, “E”, “F”, “G”, “H”, “I”)
models User-defined models. Already existing models include:
model.security: .904, .904, .645, .38, .25, .239, .171, .118, 0
model.milk: .47, .42, .39, .35, .28, .23, .19, .1, 0
model.wool: .85, .75, .65, .63, .57, .50, .43, .20, 0
For nine age classes (Payne 1973; Marom and Bar-Oz 2009 3)
model.catastrophe: .86, .73, .60, .49, .39, .31, .23, .16, .11, .07, .03, .01, 0
model.attritional: .76, .53, .48, .46, .45, .42, .39, .34, .29, .20, .11, .03, 0
For 13 age classes (Stiner 1990; Speth 1983 4)
usermod The user can define new models following this example, which uses five age classes (note: models must be a list):
new.model <- c(.9, .7, .5, .2, .1) new.model2 <- c(.9, .8, .4, .3, .1) models <- list(Model1=new.model, Model2=new.model2) surv.func(SurviveData, models=NULL, usermod=models) This graphs the two new models and labels them as “Model1” and “Model2” in the legend.
ci The confidence interval. Change to, e.g., ci = 90 or ci = 68 as desired)
plot If TRUE, it plots the data on a survivorship curve. If FALSE, it does not.
iter The number of bootstrap iterations. Change to, e.g., iter = 10000.
The function calculates the KME values for the original dataset.
> survivorcurve.Eq4 <- function(data)
{
vector <- rep(1, N.ages+1)
for(i in 1:N.ages+1)
{
vector[i] <- vector[i-1]*(sum(data >= (i-1)) - sum(data == > (i-1)))/sum(data >= (i-1))
}
vector[is.na(vector)] <- 0
round(vector,4)
}
It then takes x samples of the dataset, with x equal to the number of iterations (Default = 1000). The samples are initially structured in the same way as the original dataset, with each sample size equal to the size of the original (e.g., in the Marj Rabba example, there are 42 entries in each sample). The function calculates the KME for each sample
survive.matrix <- matrix(NA, ncol = N.ages+1, nrow = 1000)
survive.matrix[1,] <- survivorcurve.Eq4(SurviveData$Ageclass)
for(i in 2:1000)
{
bootstrap <- sample(1:nrow(SurviveData), nrow(SurviveData), replace = TRUE)
survive.matrix[i,] <-
survivorcurve.Eq4(SurviveData$Ageclass[bootstrap])
}
Plotting and the calculation of the confidence intervals is straightforward:
plot(x = (1:N.ages), y = survive.matrix[1,-1], type = "l", lwd = 2, xlab = "Age Class",
ylab = "Proportion Survived", axes = FALSE, ylim =
c(0,1))
axis(side = 1, at = 1:N.ages,
labels = Labels.ageclass)
axis(side = 2)
lines(x = 1:N.ages, y = apply(survive.matrix[,-1], MARGIN = 2, FUN = quantile, probs = 0.025), lty = "dashed", ylim = c(0,1))
lines(x = 1:N.ages, y = apply(survive.matrix[,-1], MARGIN = 2, FUN = quantile, probs = 0.975), lty = "dashed", ylim = c(0,1))
Plotting models and a legend is also straightforward:
lines(x = 1:N.ages, y = model.catastrophe, col = "blue", lwd = 2, lty = "dotted")
lines(x = 1:N.ages, y = model.attritional, col = "red", lwd = 2, lty = "dotdash")
legend(x = "topright", cex = .75, lwd = c(2,1,2,2,2),lty =
c("solid", "dashed", "dotted", "dotdash"),
col = c("black", "black", "blue", "red"),
legend = c("Survivorship", "95% Confidence Interval",
"Catastrophe", "Attritional"))
Finally, the function makes a table of the results:
data.LowerCI <- apply(survive.matrix[,-1], MARGIN = 2, FUN =
quantile, prob = 0.025)
data.PointValue <- survive.matrix[1,-1]
data.UpperCI <- apply(survive.matrix[,-1], MARGIN = 2, FUN = quantile, prob = 0.975)
Taxon <- rep(unique(SurviveData$Genus), times = N.ages)
Output.Matrix <- data.frame(Taxon = Taxon, AgeClassLabs =
Labels.ageclass, LowerCI = data.LowerCI, PointValue =
data.PointValue, UpperCI = data.UpperCI)
Output.Matrix
The function calculates mortality profile values using the age distribution as a proportion from the original dataset.
> mortprof <- function(data){
vector <- rep(NA, N.ages)
for(i in 1:N.ages) {
vector[i] <- sum(data==i)/length(data)
}
vector[is.na(vector)] <- 0
round(vector,4)
}
It then takes x samples of the dataset, with x equal to the number of iterations (Default iter = 1000). The samples are initially structured in the same way as the original dataset, with each sample size equal to the size of the original (e.g., in the Marj Rabba example, there are 42 entries in each sample). The function calculates the mortality profile using the proportional age distribution for each sample.
mortality.matrix <- matrix(NA, ncol = N.ages, nrow = iter)
mortality.matrix[1,] <- mortprof(data)
for(i in 2:iter){
bootstrap <- sample(1:length(data), length(data), replace = TRUE)
mortality.matrix[i,] <- unlist(mortprof(data[bootstrap]))
}
Plotting and the calculation of the confidence intervals is straightforward:
bar<-barplot(mortality.matrix[1,],ylim=c(0,(max(upCI)+.1)),
names=Labels.ageclass,ylab=ylab,
xlab=xlab,beside=T)
g<-(max(bar)-min(bar))/110
for (i in 1:length(bar)) {
lines(c(bar[i],bar[i]),c(upCI[i],loCI[i]))
lines(c(bar[i]-g,bar[i]+g),c(upCI[i],upCI[i]))
lines(c(bar[i]-g,bar[i]+g),c(loCI[i],loCI[i]))
}
Plotting models and a legend and making a table of the results is also a sstraight forward as shown for surv.func.
The fuse.func function creates confidence intervals for the survivorship based on epiphyseal fusion, using the %Fused method. This method takes the ratio of fused to fused-plus-unfused bones for each fusion group (a fusion group is a group of elements that fuse around the same time). It is important to note that this is not equivalent to survivorship based on age classes. Fusion groups are not age classes, since the specimens pertinent to each fusion group are from animals of different ages. In effect, each fusion group is really just a survivorship curve with two age classes - fused and unfused.
The fuse.func function requires the user to load a .csv file with (“Data”, below). The user is then prompted to enter the number of fusion groups, the number of elements in each fusion group, and finally the names of the elements in each fusion group. The function then takes bootstrap samples of the original dataset for the %Fused in each fusion group. It then calculates the confidence interval. Defaults are 1000 iterations for the bootstrap and 95% for the confidence interval.
Note: the graphical output requires the package ggplot25. The code automatically installs and loads ggplot2 if not already loaded.
The data must be coded in the long way (i.e., each specimen must have its own entry) in a .csv file. The dataframe should include three columns: “Identification” (a unique identification number, this can be left blank); “Element” (these should be differentiated according to proximal and distal, when necessary); “Fusion” (“Fused” or “Unfused”). An example:
Identification | Element | Fusion |
---|---|---|
Specimen1 | Calcaneus | Fused |
Specimen2 | Px.Femur | Fused |
Specimen3 | Px.Femur | Unfused |
Specimen4 | Ds.Femur | Unfused |
Specimen5 | Px.Radius | Fused |
Specimen6 | Innominate | Unfused |
The fuse.func function is in the following format by default:
fuse.func(data, iter=100, ci = 95, plotci=TRUE, plot.title=NULL)
The function first requires the user to define the data. Then the function initiates a series of command prompts. First, for example,
> Enter number of fusion groups
> 5
Then
> Enter number of skeletal elements for fusion group A
> 2
.
.
.
> Enter number of skeletal elements for fusion group E
> 1
Then
> Enter the 2 names of skeletal elements for fusion group A then > press enter
> Px.Radius
> Ds.Humerus
.
.
.
> Enter the 1 names of skeletal elements for fusion group E then > press enter
> Phalanx1
The R code for this is as follows:
fuse.func<-function(data,iter=1000,plotci=TRUE,plot.title=NULL){
require(ggplot2)
cat(paste("Enter number of fusion groups"), "\n")
ans<-readLines(n = 1)
ans <- as.numeric(ans)
fu.grps<-LETTERS[1:ans]
ske.n<-numeric(length(fu.grps))
for(i in 1:length(ske.n)){
cat(paste("Enter number of skeletal elements for fusion group"), fu.grps[i],"\n")
ans<-readLines(n = 1)
ske.n[i] <- as.numeric(ans)
}
ele.list<-as.list(rep(NA,length(ske.n)))
names(ele.list)<-fu.grps
for(i in 1: length(ske.n)){
cat(paste("Enter the", ske.n[i],"names of skeletal elements for fusion group"), fu.grps[i],"then press enter","\n")
ele.list[[i]]<-readLines(n = ske.n[i])
}
Then the function calculates the %Fused for each fusion group:
pctfuse<-function(dat){
pct.ufu<-n<-numeric(length(ele.list))
names(pct.ufu)<-fu.grps
wh<-function(it){which(dat$Element==ele.list[[i]][it])}
for (i in 1:length(pct.ufu)){
tab<-table(dat$Fusion[unlist(lapply(1:ske.n[i],wh))])
fu<-tab[which(names(table(dat$Fusion[unlist(lapply(1:ske.n[i],wh))] ))=="Fused")]
ufu<-tab[which(names(table(dat$Fusion[unlist(lapply(1:ske.n[i],wh))] ))=="Unfused")]
if (is.nan(fu/(fu+ufu))){
pct.ufu[i]<-0} else {pct.ufu[i]<-fu/(fu+ufu)}
n[i]<-(fu+ufu)
}
return(list(pct.ufu,n))
}
It then bootstraps the %Fused values for each fusion group (an analogous process to part 1.4). It provides a table of the results, with confidence intervals (default = 95%).
boot <- matrix(NA, ncol = length(ele.list), nrow = iter)
boot[1,] <- pctfuse(data)[[1]]
for(i in 2:iter){
data.boot<-data[sample(1:dim(data)[1],dim(data)[1],replace=T),]
boot[i,] <- pctfuse(data.boot)[[1]]
}
### Provide a Table of the Bootstrap Results
quantilematrix <- matrix(NA, ncol = 2, nrow = length(fu.grps))
for(i in 1:ncol(boot)){
quantilematrix[i,] <- quantile(boot[,i], probs = c(0.025,0.975), na.rm = T)
}
outputtable <- data.frame(Fusion.groups = fu.grps,
Data = round(boot[1,],2),
LowerCI = round(quantilematrix[,1],2), UpperCI = round(quantilematrix[,2],2),
Count = pctfuse(data)[[2]])
Plotting the results is the final step. The fuse.func function uses ggplot2 to graph the results. The resulting graph shows the %Fused on the y-axis, with lines representing the confidence intervals. The x-axis shows the fusion groups (which are labeled “A” , “B” , “C”, etc.). Directly above the graphed %Fused values and confidence intervals are the numbers of specimens per fusion group.
### Plotting the %Fusion data
ciplot<-ggplot(outputtable, aes(x = Fusion.groups, y = Data))+
#now add the points
geom_point(size = 3)+
#add in the 95% confidence interval bars
geom_errorbar(aes(ymax = UpperCI, ymin = LowerCI), width = 0.2)+
#add in the sample size label for each fusion group
#this uses the previously-made function
geom_text(aes(x = Fusion.groups, y = rep(1.05, length(Fusion.groups)), label = Count))+
#add in the theme (all of the background plotting details)
#the size for element_text is font size, for element_line it is the thickness
#element_blank() makes it so there is no background color to the plot
theme(panel.background = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA, color = "black"),
axis.title = element_text(color = "black", size = 20),
axis.text = element_text(color = "black", size = 15),
axis.ticks = element_line(color = "black", size = 0.75),
plot.title = element_text(color = "black", size = 24))+
#add in the titles/labels
ylab("%Fused")+xlab("Fusion Group")+
ggtitle(plot.title)
if(plotci==TRUE){print(ciplot)}
list(Output = outputtable, Bootstrap.Data = boot)
}
The surv.func function is only intended for traditional survivorship analyses, while mort.func is the analog for analyses of mortality profiles.That is, the data must be assignable to discrete age classes. In zooarchaeology, this is traditionally dental eruption/wear data. Epiphyseal fusion data are treated in section 2.↩
Price, M.D., Buckley, M., Rowan, Y.M., Kersel, M., 2013. Animal Management Strategies during the Chalcolithic in the Lower Galilee: New Data from Marj Rabba, Paleorient 39, 183-200.↩
Payne, S., 1973. Kill-off Patterns in Sheep and Goats: The Mandibles from Asvan Kale, Anatolian Studies 23, 281-303; Marom, N., Bar-Oz, G., 2009. Culling Profiles: The Indeterminacy of Archaeozoological Data to Survivorship Curve Modelling of Sheep and Goat Herd Maintenance Strategies, Journal of Archaeological Science 36, 1184-1187.↩
Stiner, M.C., 1990. The Use of Mortality Patterns in Archaeological Studies of Homonid Predatory Adaptations, Journal of Anthropological Archaeology 9, 305-351. Speth, J.D., 1983. Bison Kills and Bone Counts: Decision Making by Ancient Hunters, University of Chicago Press, London.↩
Wickman, H., 2009. ggplot2: elegant graphics for data analysis, Springer, New York.↩