Finite Design

Tom Kincaid

2020-06-15

Preliminaries

This document presents example GRTS survey designs for a finite resource. The finite resource used in the designs is lakes in the southern New England region of the U.S. Four survey designs will be presented: (1) an unstratified, equal probability design; (2) a stratified, equal probability design; (3) an unstratified, unequal probability design with an oversample; and (4) an unstratified, unequal probability design with an oversample and a panel structure for survey over time. The sampling frame used for the survey designs is contained in either an ESRI shapefile, a data frame, an sf package object or an sp package object. The frame contains the coordinates for a set of points that define the finite resource in addition to attribute data associated with the points. The coordinate system for the set of points in the sampling frame is an equal area projection rather than latitude and longitude. An equal area projection is used so that calculation of distance between points is valid. Use of the three sources for the sampling frame will be illustrated in the example survey designs.

The initial step is to use the library function to load the spsurvey package. After the package is loaded, a message is printed to the R console indicating that the spsurvey package was loaded successfully.

Load the spsurvey package:

library(spsurvey)
library(sf)

Read the sf object

For creating a survey design using the spsurvey package, the standard form of input regarding the resource is a simple features (sf) object. An sf data set for creating the survey designs in this vignette is included in the data directory of the package. The data function is used to load the data set stored in the data directory into an object named NE_lakes. Note that sf objects loaded from the data sets in the data directory are stored in a format that is defined in the sf package. See documentation for the sf package for additional information regarding format of those objects.

Load the sf object in the data directory:

data(NE_lakes)

Attribute data

Two attributes, state name and lake area category, that will be used to define, respectively, stratum codes and unequal selection probability (multidensity) categories for the survey designs are examined. State name is contained in a variable named “state”, and lake area category is contained in a variable named “area_cat”. For lake area category, lakes are classified by surface area measured in hectares. The lake area categories are coded using values such as “(5,10]”, which indicates that lake area is greater than five hectares but less than or equal to ten hectares. The table and addmargin functions are used to produce a table displaying number of lakes for each combination of values for the strata and multidensity category variables.

Display the initial six features in the sf object:

head(NE_lakes)
#> Simple feature collection with 6 features and 4 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 2008789 ymin: 2468603 xmax: 2015009 ymax: 2474343
#> projected CRS:  NAD83 / Conus Albers
#>    xcoord  ycoord State Area_Cat                geometry
#> 1 2012313 2474271    MA  (10,50] POINT (2012313 2474271)
#> 2 2013905 2474343    MA    (1,5] POINT (2013905 2474343)
#> 3 2008789 2472920    MA  (10,50] POINT (2008789 2472920)
#> 4 2009814 2472036    MA (50,500] POINT (2009814 2472036)
#> 5 2014014 2471614    MA   (5,10] POINT (2014014 2471614)
#> 6 2015009 2468603    MA  (10,50] POINT (2015009 2468603)

Display number of lakes cross-classified by strata and multidensity category:

with(NE_lakes, addmargins(table("State"=State, "Lake Area Category"=Area_Cat)))
#>      Lake Area Category
#> State (0,1] (1,5] (10,50] (5,10] (50,500] (500,1e+04]  Sum
#>   CT    483  1181     284    270       90           4 2312
#>   MA    194  1658     693    545      209           6 3305
#>   RI     11   256     108     85       41           3  504
#>   Sum   688  3095    1085    900      340          13 6121

Lakes in the southern New England region are displayed in the figure below:

Location of lakes in the southern New England region.

Unstratified, equal probability, GRTS survey design

The first survey design is an unstratified, equal probability design. The set.seed function is called so that, if necessary, the designs can be replicated.

The initial step is to create a list named Equaldsgn that contains information for specifying the survey design. Since the survey design is unstratified, the list contains a single item named “None” that also is a list. The “None” list includes two items: panel, which is used to specify the sample size for each panel, and seltype, which is used to input the type of random selection for the design. For this example, panel is assigned a single value named “PanelOne” that is set equal to 300, and seltype is assigned the value “Equal”, which indicates equal probability selection.

The grts function in the spsurvey package is called to select the survey design. The following arguments are included in the call to grts: (1) design: the named list of stratum design specifications, which is assigned the Equaldsgn list; (2) DesignID: name for the design, which is used to create a site ID for each site and is assigned the value “EQUAL”; (3) type.frame: the type of frame, which is assigned the value “finite” to indicate a finite resource; (4) src.frame: source of the frame, which is assigned the value “sf.object” to indicate an sf object frame; (5) sf.object: the sf object, which is assigned the value NE_lakes; and (6) shapefile: option to create a shapefile containing the survey design information, which is assigned FALSE.

During execution of the grts function, messages are printed that indicate the initial number of hierarchical levels used for the GRTS grid, the current number of levels, and the final number of levels. The set of messages is printed for each stratum, and is labeled with the stratum name. For this example, the set of messages is labeled “None”, i.e., the name used in the Equaldsgn list. Upon completion of the call to grts, the initial six sites for the survey design and a design summary are printed. The output object created by the grts function is assigned class “SpatialDesign”. The design summary is created using the summary method for that class. In addition to summary, a plot method is available for the SpatialDesign class. For assistance using the summary and plot methods, see documentation for “SpatialDesign-class” on the R help page for spsurvey.

Call the set.seed function so that the design can be replicated:

set.seed(4447864)

Create the design list:

Equaldsgn <- list(None=list(panel=c(PanelOne=100), seltype="Equal"))

Select the sample:

Equalsites <- grts(design=Equaldsgn,
                   DesignID="EQUAL",
                   type.frame="finite",
                   src.frame="sf.object",
                   sf.object=NE_lakes,
                   shapefile=FALSE)
#> 
#> Stratum: None 
#> Current number of levels: 4 
#> Current number of levels: 5 
#> Final number of levels: 5

Print the initial six lines of the survey design:

head(Equalsites)
#>          coordinates    siteID  xcoord  ycoord mdcaty   wgt stratum    panel
#> 1 (2122717, 2369405) EQUAL-001 2122717 2369405  Equal 61.21    None PanelOne
#> 2 (2016917, 2392127) EQUAL-002 2016917 2392127  Equal 61.21    None PanelOne
#> 3 (1924406, 2304683) EQUAL-003 1924406 2304683  Equal 61.21    None PanelOne
#> 4 (1866194, 2297691) EQUAL-004 1866194 2297691  Equal 61.21    None PanelOne
#> 5 (2095181, 2403351) EQUAL-005 2095181 2403351  Equal 61.21    None PanelOne
#> 6 (1968415, 2285874) EQUAL-006 1968415 2285874  Equal 61.21    None PanelOne
#>   EvalStatus EvalReason State Area_Cat
#> 1    NotEval               MA  (10,50]
#> 2    NotEval               MA    (1,5]
#> 3    NotEval               CT (50,500]
#> 4    NotEval               CT  (10,50]
#> 5    NotEval               MA    (1,5]
#> 6    NotEval               CT    (1,5]

Print the survey design summary:

summary(Equalsites)
#> 
#> 
#> Design Summary: Number of Sites
#> 
#> stratum
#> None  Sum 
#>  100  100

Stratified, equal probability, GRTS survey design

The second survey design is a stratified, equal probability design. The state attribute is used to identify strata. List Stratdsgn is assigned design specifications. Stratdsgn includes six lists, one for each stratum. The names for the lists match the levels of the stratum variable, i.e., the unique values of the state attribute. Each list in Stratdsgn contains two items: panel and seltype. The value for panel is the same as the equal probability design (50), and seltype is assigned “Equal”.

For this survey design, a data frame will be used as the sampling frame. A data frame named NE_lakes_df is creaated by dropping the geometry column from the NE_lakes object. Note that the NE_lakes object includes spatial coordinates among its attributes. The following arguments are included in the call to grts: (1) design: assigned the Stratdsgn list; (2) DesignID: assigned the value “STRATIFIED”; (3) type.frame: assigned the value “finite”; (4) src.frame: assigned the value “att.frame” to indicate that the sampling frame is provided by argument att.frame; (5) att.frame: assigned the E_lakes_df data frame; (6) xcoord: name of the column in the attributes data frame that identifies x-coordinates, which is assigned the value “xcoord”; (7) ycoord: name of the column in the attributes data frame that identifies y-coordinates, which is assigned the value “ycoord”; (8) stratum: name of the column in the attributes data frame that identifies the stratum code for each element in the frame, which is assigned the value “State”; and (9) shapefile: assigned the value FALSE. Upon completion of the call to grts, the initial six sites for the survey design and a design summary are printed.

Create the data frame:

geom_name <- attr(NE_lakes, "sf_column")
NE_lakes_df <- subset(NE_lakes, select=names(NE_lakes) != geom_name, drop = TRUE)

Create the design list:

Stratdsgn <- list(CT=list(panel=c(PanelOne=40), seltype="Equal"),
                  MA=list(panel=c(PanelOne=40), seltype="Equal"),
                  RI=list(panel=c(PanelOne=20), seltype="Equal"))

Select the sample:

Stratsites <- grts(design=Stratdsgn,
                   DesignID="STRATIFIED",
                   type.frame="finite",
                   src.frame="att.frame",
                   att.frame=NE_lakes_df,
                   xcoord="xcoord",
                   ycoord="ycoord",
                   stratum="State",
                   shapefile=FALSE)
#> 
#> Stratum: CT 
#> Current number of levels: 3 
#> Current number of levels: 4 
#> Final number of levels: 4 
#> 
#> Stratum: MA 
#> Current number of levels: 3 
#> Current number of levels: 4 
#> Current number of levels: 5 
#> Final number of levels: 5 
#> 
#> Stratum: RI 
#> Current number of levels: 3 
#> Current number of levels: 4 
#> Final number of levels: 4

Print the initial six lines of the survey design:

head(Stratsites)
#>          coordinates         siteID  xcoord  ycoord mdcaty  wgt stratum
#> 1 (1845107, 2225902) STRATIFIED-001 1845107 2225902  Equal 57.8      CT
#> 2 (1882596, 2261000) STRATIFIED-002 1882596 2261000  Equal 57.8      CT
#> 3 (1852837, 2252845) STRATIFIED-003 1852837 2252845  Equal 57.8      CT
#> 4 (1976930, 2289337) STRATIFIED-004 1976930 2289337  Equal 57.8      CT
#> 5 (1862983, 2230104) STRATIFIED-005 1862983 2230104  Equal 57.8      CT
#> 6 (1901969, 2269839) STRATIFIED-006 1901969 2269839  Equal 57.8      CT
#>      panel EvalStatus EvalReason Area_Cat
#> 1 PanelOne    NotEval               (1,5]
#> 2 PanelOne    NotEval               (1,5]
#> 3 PanelOne    NotEval               (1,5]
#> 4 PanelOne    NotEval             (10,50]
#> 5 PanelOne    NotEval               (1,5]
#> 6 PanelOne    NotEval               (1,5]

Print the survey design summary:

summary(Stratsites)
#> 
#> 
#> Design Summary: Number of Sites
#> 
#> stratum
#>  CT  MA  RI Sum 
#>  40  40  20 100

Unstratified, unequal probability, GRTS survey design with an oversample

The third survey design is an unstratified, unequal probability design with an oversample. Lake area classes are used to identify multidensity categories. List Unequaldsgn is assigned design specifications. Since the survey design is unstratified, Unequaldsgn includes a single list named “None” that contains four items: panel, seltype, caty.n, and over. The value for panel is the same as the equal probability design, and seltype is assigned “Unequal” to indicate unequal selection probabilities. The third item, caty.n, assigns sample sizes for each of the six multidensity categories. Note that the sum of sample sizes provided in caty.n must equal the value in panel. The fourth item, over, is assigned the value 120, which specifies an oversample of 120 sites. An oversample is replacement sites for the survey design. The grts function attempts to distribute the oversample proportionately among sample sizes for the multidensity categories. If the oversample proportion for one or more categories is not a whole number, a warning message is printed and the proportion is rounded to the next higher integer. For this example, the oversample is proportionate to the category sample sizes, and the warning message is not printed.

For this survey design, an sp package object will be used as the sampling frame. The sf package function as_Spatial is used to create an sp object named NE_lakes_sp. The following arguments are included in the call to grts: (1) design: assigned the Unequaldsgn list; (2) DesignID: assigned the value “UNEQUAL”; (3) type.frame: assigned the value “finite”; (4) src.frame: assigned the value “sp.object” to indicate that the sampling frame is provided by an sp object; (5) sp.object: the sp object, which is assigned the NE_lakes_sp object; (6) mdcaty: name of the column in the attributes data frame that identifies the unequal probability category for each element in the frame, which is assigned the value “Area_cat”; and (7) shapefile: assigned the value FALSE. Upon completion of the call to grts, the initial six sites for the survey design and a design summary are printed.

Create the sp object:

NE_lakes_sp <- as_Spatial(NE_lakes)
#> Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
#> that

Create the design list:

Unequaldsgn <- list(None=list(panel=c(PanelOne=90),
                              seltype="Unequal",
                              caty.n=c("(0,1]"=15, "(1,5]"=30, "(5,10]"=15,
                                       "(10,50]"=15, "(50,500]"=10,
                                       "(500,1e+04]"=5),
                              over=10))

Select the sample:

Unequalsites <- grts(design=Unequaldsgn,
                     DesignID="UNEQUAL",
                     type.frame="finite",
                     src.frame="sp.object",
                     sp.object=NE_lakes_sp,
                     mdcaty="Area_Cat",
                     shapefile=FALSE)
#> 
#> Stratum: None
#> Warning in grts(design = Unequaldsgn, DesignID = "UNEQUAL", type.frame = "finite", : 
#> Oversample size is not proportional to category sample sizes for stratum
#> "None".
#> Current number of levels: 4 
#> Current number of levels: 5 
#> Current number of levels: 6 
#> Final number of levels: 6

Print the initial six lines of the survey design:

head(Unequalsites)
#>          coordinates      siteID  xcoord  ycoord      mdcaty       wgt stratum
#> 1 (2004228, 2463625) UNEQUAL-001 2004228 2463625     (10,50]  72.33333    None
#> 2 (2026832, 2388748) UNEQUAL-002 2026832 2388748       (1,5] 103.16667    None
#> 3 (1864531, 2272856) UNEQUAL-003 1864531 2272856 (500,1e+04]   2.60000    None
#> 4 (1937863, 2395837) UNEQUAL-004 1937863 2395837     (10,50]  72.33333    None
#> 5 (2112877, 2371740) UNEQUAL-005 2112877 2371740     (10,50]  72.33333    None
#> 6 (1994584, 2367712) UNEQUAL-006 1994584 2367712      (5,10]  60.00000    None
#>      panel EvalStatus EvalReason State
#> 1 PanelOne    NotEval               MA
#> 2 PanelOne    NotEval               MA
#> 3 PanelOne    NotEval               CT
#> 4 PanelOne    NotEval               MA
#> 5 PanelOne    NotEval               MA
#> 6 PanelOne    NotEval               RI

Print the survey design summary:

summary(Unequalsites)
#> 
#> 
#> Design Summary: Number of Sites Classified by mdcaty (Multidensity Category) 
#>  and panel
#> 
#>              panel
#> mdcaty        OverSamp PanelOne Sum
#>   (0,1]              1       13  14
#>   (1,5]              4       32  36
#>   (10,50]            1       25  26
#>   (5,10]             3       11  14
#>   (50,500]           4        4   8
#>   (500,1e+04]        0        5   5
#>   Sum               13       90 103

Unstratified, unequal probability, GRTS survey design with an oversample and a panel structure for survey over time

The fourth survey design is an unstratified, unequal probability design with an oversample and a panel structure for survey over time. List Paneldsgn is assigned design specifications. Since the survey design is unstratified, Paneldsgn includes a single list named “None” that contains four items: panel, seltype, caty.n, and over. A vector identifying sample sizes for five panels is assigned to panel. The value “Unequal” is assigned to seltype, which indicates unequal selection probabilities. The third item, caty.n, assigns sample sizes for each of six multidensity categories, where lake area classes are used as the categories. The value 100 is assigned to over, which specifies an oversample of 100 sites. For this example, the oversample is not proportionate to the category sample sizes, and the warning message is printed by calling the warnings function.

For this survey design, a shapefile will be used as the sampling frame. The sf package function st_write is used to create the shapefile. The following arguments are included in the call to grts: (1) design: assigned the Paneldsgn list; (2) DesignID: assigned the value “UNEQUAL”; (3) type.frame: assigned the value “finite”; (4) src.frame: assigned the value “shapefile”; (5) in.shape: assigned the value “NE_lakes.shp”; (6) mdcaty: assigned the value “Area_cat”; and (7) shapefile: assigned the value FALSE. Upon completion of the call to grts, the initial six sites for the survey design and a design summary are printed.

Create the shapefile:

st_write(NE_lakes, "NE_lakes.shp", quiet = TRUE, delete_dsn = TRUE)
#> Warning in CPL_write_ogr(obj, dsn, layer, driver,
#> as.character(dataset_options), : GDAL Error 1: NE_lakes.shp does not appear to
#> be a file or directory.

Create the design list:

Paneldsgn <- list(None=list(panel=c(Annual=15, Year1=15, Year2=15, Year3=15,
                                    Year4=15, Year5=15),
                            seltype="Unequal",
                            caty.n=c("(0,1]"=15, "(1,5]"=30, "(5,10]"=15,
                                     "(10,50]"=15, "(50,500]"=10,
                                     "(500,1e+04]"=5),
                            over=10))

Select the sample:

Panelsites <- grts(design=Paneldsgn,
                   DesignID="UNEQUAL",
                   type.frame="finite",
                   src.frame="shapefile",
                   in.shape="NE_lakes.shp",
                   mdcaty="Area_Cat",
                   shapefile=FALSE)
#> 
#> Stratum: None
#> Warning in grts(design = Paneldsgn, DesignID = "UNEQUAL", type.frame = "finite", : 
#> Oversample size is not proportional to category sample sizes for stratum
#> "None".
#> Current number of levels: 4 
#> Current number of levels: 5 
#> Current number of levels: 6 
#> Final number of levels: 6

Print the warning message:

warnings()

Print the initial six lines of the survey design:

head(Panelsites)
#>          coordinates      siteID  xcoord  ycoord mdcaty       wgt stratum
#> 1 (2005249, 2373261) UNEQUAL-001 2005249 2373261  (0,1]  45.86667    None
#> 2 (2123424, 2320893) UNEQUAL-002 2123424 2320893 (5,10]  60.00000    None
#> 3 (1850363, 2264998) UNEQUAL-003 1850363 2264998  (0,1]  45.86667    None
#> 4 (1960952, 2295800) UNEQUAL-004 1960952 2295800  (1,5] 103.16667    None
#> 5 (2007017, 2444237) UNEQUAL-005 2007017 2444237  (0,1]  45.86667    None
#> 6 (1966055, 2340658) UNEQUAL-006 1966055 2340658  (1,5] 103.16667    None
#>    panel EvalStatus EvalReason State
#> 1 Annual    NotEval               MA
#> 2 Annual    NotEval               MA
#> 3 Annual    NotEval               CT
#> 4 Annual    NotEval               CT
#> 5 Annual    NotEval               MA
#> 6 Annual    NotEval               CT

Print the survey design summary

summary(Panelsites)
#> 
#> 
#> Design Summary: Number of Sites Classified by mdcaty (Multidensity Category) 
#>  and panel
#> 
#>              panel
#> mdcaty        Annual OverSamp Year1 Year2 Year3 Year4 Year5 Sum
#>   (0,1]            5        2     3     3     1     4     2  20
#>   (1,5]            7        8     6     3     6     2     7  39
#>   (10,50]          1        2     4     4     0     0     1  12
#>   (5,10]           2        1     2     1     5     4     2  17
#>   (50,500]         0        0     0     3     2     4     0   9
#>   (500,1e+04]      0        0     0     1     1     1     3   6
#>   Sum             15       13    15    15    15    15    15 103