The networkreporting package has several tools for analyzing survey data that have been collected using the network scale-up method.
This introduction will assume that you already have the networkreporting package installed. If you don't, please refer to the introductory vignette (“getting started”) for instructions on how to do this.
For the purposes of this vignette, we'll assume that you have conducted a survey using network scale-up questions in order to estimate the size of a hidden population. Analytically, using the scale-up estimator involves two steps:
We'll quickly review each of these steps, and then we'll show how to use the package to carry the estimation out.
Here, we will use the known population estimator for respondents' degrees (Killworth et al., 1998; Feehan and Salganik, 2016). In order to estimate the degree of the \(i\) th survey respondent, we use
\[ \begin{align} \label{eqn:kpdegree} \hat{d_i} = \sum_{j=1}^{K} y_{ij} \times \frac{N}{\sum_{j=1}^{K} N_j}, \end{align} \]
where \(N\) is the total size of the population, \(N_j\) is the size of the \(j\) th population of known size, and \(y_{ij}\) is the number of connections that survey respondent \(i\) reports between herself and members of the \(j\) th population of known size.
Once we have the estimates of the respondents' degrees, we use them to produce an estimate for the size of the hidden population:
\[ \begin{align} \label{eqn:nsum} \hat{N}_h = \frac{ \sum_{i \in s} y_{ih} }{ \sum_{i \in s} \hat{d_i} }, \end{align} \]
where \(N_h\) is the size of the population of interest (which we want to estimate), \(s\) is the set of respondents in our sample, and \(\hat{d_i}\) is the estimate of the size of respondent \(i\)'s degree, obtained using the known population method.
In order to use the package, we will assume that you start with two datasets: the first is a survey containing information collected from respondents about their personal networks; the second is information about the sizes of several populations.
The example data for this vignette are provided with the networkreporting
package, and can be loaded by typing
library(networkreporting)
library(surveybootstrap)
## column names for connections to hidden population numbers
hidden.q <- c("sex.workers", "msm", "idu", "clients")
## column names for connections to groups of known size
hm.q <- c("widower", "nurse.or.doctor", "male.community.health", "teacher",
"woman.smoke", "priest", "civil.servant", "woman.gave.birth",
"muslim", "incarcerated", "judge", "man.divorced", "treatedfortb",
"nsengimana", "murekatete", "twahirwa", "mukandekezi", "nsabimana",
"mukamana", "ndayambaje", "nyiraneza", "bizimana", "nyirahabimana",
"ndagijimana", "mukandayisenga", "died")
## size of the entire population
tot.pop.size <- 10718378
The example data include two datasets: one has all of the responses from a network scale-up survey, and the other has the known population sizes for use with the known population estimator.
The demo known population data are in example.knownpop.dat
:
example.knownpop.dat
## known.popn size
## 1 widower 36147
## 2 nurse.or.doctor 7807
## 3 male.community.health 22000
## 4 teacher 47745
## 5 woman.smoke 119438
## 6 priest 1004
## 7 woman.gave.birth 256164
## 8 muslim 195449
## 9 incarcerated 68000
## 10 man.divorced 50698
## 11 nsengimana 32528
## 12 murekatete 30531
## 13 twahirwa 10420
## 14 mukandekezi 10520
## 15 nsabimana 48560
## 16 mukamana 51449
## 17 ndayambaje 22724
## 18 nyiraneza 21705
## 19 bizimana 38497
## 20 nyirahabimana 42727
## 21 ndagijimana 37375
## 22 mukandayisenga 35055
example.knownpop.dat
is very simple: one column has a name for each known population,
and the other has its toal size. We expect that users will typically start with
a small dataset like this one. When using the networkreporting
package, it is
more useful to have a vector whose entries are known population sizes and whose
names are the known population names. The df.to.kpvec
function makes it easy
for us to create it:
kp.vec <- df.to.kpvec(example.knownpop.dat, kp.var="known.popn", kp.value="size")
kp.vec
## widower nurse.or.doctor male.community.health
## 36147 7807 22000
## teacher woman.smoke priest
## 47745 119438 1004
## woman.gave.birth muslim incarcerated
## 256164 195449 68000
## man.divorced nsengimana murekatete
## 50698 32528 30531
## twahirwa mukandekezi nsabimana
## 10420 10520 48560
## mukamana ndayambaje nyiraneza
## 51449 22724 21705
## bizimana nyirahabimana ndagijimana
## 38497 42727 37375
## mukandayisenga
## 35055
Finally, we also need to know the total size of the population we are making estimates about. In this case, let's assume that we're working in a country of 10 million people:
# total size of the population
tot.pop.size <- 10e6
Now let's take a look at the demo survey dataset, which is called
example.survey
:
head(example.survey)
## id cluster region indweight sex age.cat widower nurse.or.doctor
## 1 1.1.1 1 Kigali City 0.330602 Male [25,35) 3 2
## 2 1.1.2 1 Kigali City 0.330602 Male [25,35) 0 2
## 3 1.1.3 1 Kigali City 0.330602 Male [25,35) 2 8
## 4 1.1.4 1 Kigali City 0.330602 Male [25,35) 0 1
## 5 1.1.5 1 Kigali City 0.330602 Male [25,35) 0 0
## 6 1.1.6 1 Kigali City 0.330602 Male [25,35) 7 4
## male.community.health teacher woman.smoke priest civil.servant
## 1 1 5 1 0 5
## 2 1 5 0 0 5
## 3 0 3 0 0 50
## 4 0 0 0 0 5
## 5 0 0 0 0 5
## 6 0 8 2 0 6
## woman.gave.birth muslim incarcerated judge man.divorced treatedfortb
## 1 3 4 2 3 2 0
## 2 3 0 2 3 1 0
## 3 4 3 0 0 2 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 1 0
## 6 3 4 3 0 1 0
## nsengimana murekatete twahirwa mukandekezi nsabimana mukamana ndayambaje
## 1 0 0 2 1 2 3 1
## 2 3 2 0 0 1 0 0
## 3 0 0 0 1 2 0 0
## 4 1 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 1 1 0 0 0 0 0
## nyiraneza bizimana nyirahabimana ndagijimana mukandayisenga died
## 1 0 2 0 1 0 0
## 2 0 0 0 0 0 1
## 3 0 0 0 0 0 2
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 4
## sex.workers msm idu clients
## 1 0 0 0 2
## 2 0 0 0 1
## 3 0 0 0 0
## 4 0 0 0 0
## 5 0 0 0 0
## 6 0 0 0 10
The columns fall into a few categories:
id
cluster
, region
, and indweight
. sex
and age.cat
widower
, …,
mukandayisenga
died
, …, clients
This is the general form that your survey dataset should have.
Many network scale-up studies have topcoded the responses to the aggregate relational data questions. This means that researchers considered any responses above a certain value, called the topcode, to be implausible. Before proceeding with the analysis, researchers substitute the maximum plausible value in for the implausible ones. For example, in many studies, researchers replaced responses with the value 31 or higher with the value 30 before conducting their analysis (see Zheng, Salganik, and Gelman 2006).
We won't discuss whether or not this is advisable here, but this is currently a
common practice in scale-up studies. If you wish to follow it, you can use the
topcode.data
function. For example, let's topcode the responses to
the questions about populations of known size to the value 30. First, we'll
examine the distribution of the responses before topcoding:
## make a vector with the list of known population names from
## our dataset of known population totals
known.popn.vars <- paste(example.knownpop.dat$known.popn)
## before topcoding: max. response for several popns is > 30
summary(example.survey[,known.popn.vars])
## widower nurse.or.doctor male.community.health
## Min. : 0.0000 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 0.0000 Median : 0.0000 Median : 0.000
## Mean : 0.6101 Mean : 0.5112 Mean : 0.724
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 1.000
## Max. :95.0000 Max. :40.0000 Max. :95.000
##
## teacher woman.smoke priest woman.gave.birth
## Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 0.000 Median : 0.000 Median : 0.0000 Median : 1.000
## Mean : 1.356 Mean : 1.022 Mean : 0.1484 Mean : 1.885
## 3rd Qu.: 1.000 3rd Qu.: 1.000 3rd Qu.: 0.0000 3rd Qu.: 3.000
## Max. :95.000 Max. :95.000 Max. :25.0000 Max. :30.000
##
## muslim incarcerated man.divorced nsengimana
## Min. : 0.000 Min. : 0.0000 Min. : 0.0000 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.:0.0000
## Median : 0.000 Median : 0.0000 Median : 0.0000 Median :0.0000
## Mean : 2.094 Mean : 0.4348 Mean : 0.3367 Mean :0.3603
## 3rd Qu.: 1.000 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.:0.0000
## Max. :95.000 Max. :95.0000 Max. :20.0000 Max. :8.0000
## NA's :1
## murekatete twahirwa mukandekezi nsabimana
## Min. : 0.0000 Min. : 0.0000 Min. :0.000 Min. : 0.0000
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.:0.000 1st Qu.: 0.0000
## Median : 0.0000 Median : 0.0000 Median :0.000 Median : 0.0000
## Mean : 0.3425 Mean : 0.2394 Mean :0.165 Mean : 0.4705
## 3rd Qu.: 1.0000 3rd Qu.: 0.0000 3rd Qu.:0.000 3rd Qu.: 1.0000
## Max. :12.0000 Max. :10.0000 Max. :7.000 Max. :20.0000
##
## mukamana ndayambaje nyiraneza bizimana
## Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 0.0000 Median : 0.0000 Median : 0.0000 Median : 0.0000
## Mean : 0.4144 Mean : 0.3296 Mean : 0.2685 Mean : 0.4331
## 3rd Qu.: 1.0000 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 1.0000
## Max. :15.0000 Max. :30.0000 Max. :10.0000 Max. :12.0000
##
## nyirahabimana ndagijimana mukandayisenga
## Min. : 0.000 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 0.000 Median : 0.0000 Median : 0.0000
## Mean : 0.261 Mean : 0.3279 Mean : 0.2577
## 3rd Qu.: 0.000 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :17.000 Max. :10.0000 Max. :20.0000
##
Several populations, including widower
, male.community.health
, teacher
,
woman.smoke
, muslim
, and incarcerated
have maximum values that are very
high. (It turns out that 95 is the highest value that could be recorded during
the interviews; if respondents said that they were connected to more than 95
people in the group, the interviewers wrote 95 down.)
Now we use the topcode.data
function to topcode all of the responses
at 30:
example.survey <- topcode.data(example.survey,
vars=known.popn.vars,
max=30)
## after topcoding: max. response for all popns is 30
summary(example.survey[,known.popn.vars])
## widower nurse.or.doctor male.community.health
## Min. : 0.0000 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 0.0000 Median : 0.0000 Median : 0.000
## Mean : 0.5831 Mean : 0.5062 Mean : 0.653
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 1.000
## Max. :30.0000 Max. :30.0000 Max. :30.000
##
## teacher woman.smoke priest woman.gave.birth
## Min. : 0.000 Min. : 0.0000 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 0.000 Median : 0.0000 Median : 0.0000 Median : 1.000
## Mean : 1.216 Mean : 0.9638 Mean : 0.1484 Mean : 1.885
## 3rd Qu.: 1.000 3rd Qu.: 1.0000 3rd Qu.: 0.0000 3rd Qu.: 3.000
## Max. :30.000 Max. :30.0000 Max. :25.0000 Max. :30.000
##
## muslim incarcerated man.divorced nsengimana
## Min. : 0.000 Min. : 0.0000 Min. : 0.0000 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.:0.0000
## Median : 0.000 Median : 0.0000 Median : 0.0000 Median :0.0000
## Mean : 1.468 Mean : 0.3807 Mean : 0.3367 Mean :0.3603
## 3rd Qu.: 1.000 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.:0.0000
## Max. :30.000 Max. :30.0000 Max. :20.0000 Max. :8.0000
## NA's :1
## murekatete twahirwa mukandekezi nsabimana
## Min. : 0.0000 Min. : 0.0000 Min. :0.000 Min. : 0.0000
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.:0.000 1st Qu.: 0.0000
## Median : 0.0000 Median : 0.0000 Median :0.000 Median : 0.0000
## Mean : 0.3425 Mean : 0.2394 Mean :0.165 Mean : 0.4705
## 3rd Qu.: 1.0000 3rd Qu.: 0.0000 3rd Qu.:0.000 3rd Qu.: 1.0000
## Max. :12.0000 Max. :10.0000 Max. :7.000 Max. :20.0000
##
## mukamana ndayambaje nyiraneza bizimana
## Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 0.0000 Median : 0.0000 Median : 0.0000 Median : 0.0000
## Mean : 0.4144 Mean : 0.3296 Mean : 0.2685 Mean : 0.4331
## 3rd Qu.: 1.0000 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 1.0000
## Max. :15.0000 Max. :30.0000 Max. :10.0000 Max. :12.0000
##
## nyirahabimana ndagijimana mukandayisenga
## Min. : 0.000 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 0.000 Median : 0.0000 Median : 0.0000
## Mean : 0.261 Mean : 0.3279 Mean : 0.2577
## 3rd Qu.: 0.000 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :17.000 Max. :10.0000 Max. :20.0000
##
If you look at the help page for topcode.data
, you'll see that it can also
handle situations where the variables can take on special codes for missing
values, refusals, and so forth.
Now that we have finished preparing the data, we turn to esimating the sizes of
each respondent's personal network. To do this using the known population
estimator, we use the kp.degree.estimator
function:
d.hat <- kp.degree.estimator(survey.data=example.survey,
known.popns=kp.vec,
total.popn.size=tot.pop.size,
missing="complete.obs")
## Warning in kp.degree.estimator(survey.data = example.survey, known.popns =
## kp.vec, : kp.degree.estimator will be deprecated soon!
summary(d.hat)
## V1
## Min. : 0.00
## 1st Qu.: 25.28
## Median : 58.99
## Mean : 101.22
## 3rd Qu.: 126.42
## Max. :1904.69
We can examine the results with a histogram
library(ggplot2) # we'll use qplot from ggplot2 for plots
theme_set(theme_minimal())
qplot(d.hat, binwidth=25)
Now let's append the degree estimates to the survey reports dataframe:
example.survey$d.hat <- d.hat
TODO
Now that you have estimated degrees, you can use them to produce estimates of the
size of the hidden population. Here, we'll take the example of injecting drug
users, idu
idu.est <- nsum.estimator(survey.data=example.survey,
d.hat.vals=d.hat,
total.popn.size=tot.pop.size,
y.vals="idu",
missing="complete.obs")
Note that we had to specify that we should use only rows in our dataset with no
missing values through the missing = "complete.obs"
option, and also that we
had to pass in the total population size using the total.popn.size
option.
The resulting estimate is
idu.est
## $estimate
## [1] 1067.656
##
## $tot.connections
## [1] 26
##
## $sum.d.hat
## [1] 243524.2
This returns the estimate, and also the numerator and denominator used to compute it.
In order to estimate the sampling uncertainty of our estimated totals, we can use the rescaled bootstrap technique; see Feehan and Salganik 2016 for more about the rescaled boostrap and how it can be applied to the network scale-up method. In order to use the rescaled boostrap, you need to be able to specify the sampling design of your study. In particular, you need to be able to describe the stratifcation (if any) and the primary sampling units used in the study.
idu.est <- bootstrap.estimates(## this describes the sampling design of the
## survey; here, the PSUs are given by the
## variable cluster, and the strata are given
## by the variable region
survey.design = ~ cluster + strata(region),
## the number of bootstrap resamples to obtain
## (NOTE: in practice, you should use more than 100.
## this keeps building the package relatively fast)
num.reps=100,
## this is the name of the function
## we want to use to produce an estimate
## from each bootstrapped dataset
estimator.fn="nsum.estimator",
## these are the sampling weights
weights="indweight",
## this is the name of the type of bootstrap
## we wish to use
bootstrap.fn="rescaled.bootstrap.sample",
## our dataset
survey.data=example.survey,
## other parameters we need to pass
## to the nsum.estimator function
d.hat.vals=d.hat,
total.popn.size=tot.pop.size,
y.vals="idu",
missing="complete.obs")
By default, bootstrap.estimates
produces a list with num.reps
entries; each
entry is the result of calling the estimator function on one bootstrap
resample.
Next, you can write a bit of code that will help us put all of these results together, for plotting and summarizing
library(plyr)
## combine the estimates together in one data frame
## (bootstrap.estimates gives us a list)
all.idu.estimates <- ldply(idu.est,
function(x) { data.frame(estimate=x$estimate) })
We can examine the summarized results with a histogram or with summarize
.
## look at a histogram of the results
qplot(all.idu.estimates$estimate, binwidth=50)
## summarize the results
summary(all.idu.estimates$estimate)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 155.4 395.5 542.5 593.5 715.8 1615.9
To produce 95% intervals using the percentile method you can do something like this
quantile(all.idu.estimates$estimate, probs=c(0.025, 0.975))
## 2.5% 97.5%
## 190.3353 1199.1265
If you want to run internal validation checks (see e.g. Salganik et al., 2011, Fig 3), you can use the
nsum.internal.validation
function. We specify that we wish to use only
complete observations (ie, we will remove rows that have any missing values
from our calculations).
iv.result <- nsum.internal.validation(survey.data=example.survey,
known.popns=kp.vec,
missing="complete.obs",
killworth.se=TRUE,
total.popn.size=tot.pop.size,
kp.method=TRUE,
return.plot=TRUE)
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
## Warning in kp.degree.estimator(survey.data = survey.data, known.popns =
## kp.minus, : kp.degree.estimator will be deprecated soon!
Now iv.result
is a list that has a summary of the results in the entry results
iv.result$results
## name nsum.holdout.est known.size d.hat.sum
## 1 widower 58707.75 36147 238980.4
## 2 nurse.or.doctor 51873.04 7807 234804.1
## 3 male.community.health 66955.19 22000 234634.5
## 4 teacher 128261.75 47745 228049.3
## 5 woman.smoke 93113.92 119438 249049.8
## 6 priest 14830.56 1004 240719.2
## 7 woman.gave.birth 173202.90 256164 261831.6
## 8 muslim 137934.15 195449 255929.3
## 9 incarcerated 36619.52 68000 250139.8
## 10 man.divorced 32758.68 50698 247262.7
## 11 nsengimana 35697.29 32528 242875.6
## 12 murekatete 33933.49 30531 242828.0
## 13 twahirwa 23921.82 10420 240784.3
## 14 mukandekezi 16382.83 10520 242326.9
## 15 nsabimana 46399.42 48560 243968.6
## 16 mukamana 40565.23 51449 245777.0
## 17 ndayambaje 32841.14 22724 241465.4
## 18 nyiraneza 26637.40 21705 242516.1
## 19 bizimana 42948.88 38497 242614.0
## 20 nyirahabimana 25411.65 42727 247130.7
## 21 ndagijimana 32259.55 37375 244578.7
## 22 mukandayisenga 25249.07 35055 245553.6
## killworth.se killworth.se.wgtdenom err abserr sqerr
## 1 1562.7448 1562.7448 22560.754 22560.754 508987620
## 2 1482.4796 1482.4796 44066.040 44066.040 1941815910
## 3 1683.5939 1683.5939 44955.188 44955.188 2020968903
## 4 2356.3024 2356.3024 80516.747 80516.747 6482946587
## 5 1924.5661 1924.5661 -26324.077 26324.077 692957033
## 6 784.3342 784.3342 13826.559 13826.559 191173736
## 7 2549.6027 2549.6027 -82961.098 82961.098 6882543711
## 8 2305.5174 2305.5174 -57514.848 57514.848 3307957792
## 9 1207.7257 1207.7257 -31380.477 31380.477 984734336
## 10 1149.1360 1149.1360 -17939.316 17939.316 321819057
## 11 1210.1778 1210.1778 3169.293 3169.293 10044418
## 12 1180.1217 1180.1217 3402.489 3402.489 11576935
## 13 995.5497 995.5497 13501.822 13501.822 182299200
## 14 821.5568 821.5568 5862.829 5862.829 34372766
## 15 1375.8765 1375.8765 -2160.579 2160.579 4668100
## 16 1282.1053 1282.1053 -10883.772 10883.772 118456503
## 17 1164.3067 1164.3067 10117.144 10117.144 102356610
## 18 1046.6378 1046.6378 4932.402 4932.402 24328592
## 19 1327.6492 1327.6492 4451.879 4451.879 19819224
## 20 1012.7461 1012.7461 -17315.346 17315.346 299821223
## 21 1146.6165 1146.6165 -5115.449 5115.449 26167818
## 22 1012.7460 1012.7460 -9805.932 9805.932 96156302
## relerr
## 1 0.62413904
## 2 5.64442684
## 3 2.04341762
## 4 1.68639119
## 5 0.22039951
## 6 13.77147320
## 7 0.32385932
## 8 0.29427036
## 9 0.46147760
## 10 0.35384662
## 11 0.09743276
## 12 0.11144376
## 13 1.29576028
## 14 0.55730315
## 15 0.04449297
## 16 0.21154488
## 17 0.44521846
## 18 0.22724728
## 19 0.11564223
## 20 0.40525538
## 21 0.13686820
## 22 0.27972991
Since we passed the argument return.plot=TRUE
to the function, we also get a plot:
print(iv.result$plot)
This plot is a ggplot2
object, so we can customize it if we want. As a very simple
example, we can change the title:
print(iv.result$plot + ggtitle("internal validation checks"))
The ggplot2 website has more information on modifying ggplot2 objects.
Several of the functions we demonstrated above required us to pass in
the vector containing the known population sizes and also the size of
the total population. We can avoid this step by attaching these two
pieces of information to the survey dataframe using the add.kp
function:
example.survey <- add.kp(example.survey, kp.vec, tot.pop.size)
d.hat.new <- kp.degree.estimator(survey.data=example.survey,
# we don't need this anymore, since we
# them to survey.data's attributes using add.kp
#known.popns=kp.vec,
#total.popn.size=tot.pop.size,
missing="complete.obs")
## Warning in kp.degree.estimator(survey.data = example.survey, missing =
## "complete.obs"): kp.degree.estimator will be deprecated soon!
summary(d.hat.new)
## V1
## Min. : 0.00
## 1st Qu.: 25.28
## Median : 58.99
## Mean : 101.22
## 3rd Qu.: 126.42
## Max. :1904.69
This is exactly the same result we obtained before.