In this vignette, I use the space-time permutation scan to show how the rsatscan package can be used to simplify the process of making data in R, running SaTScan on the generated data, and collecting the results, presumably leading to quicker and easier accumulation of results.
I begin by making data on a 10*10 grid of locations, over 30 days. Each day, each location has a 0.1 probability of having a single case.
set.seed(42)
mygeo = expand.grid(1:10,1:10)
daysbase = 30
locid = rep(1:100, times=daysbase)
basecas = rbinom(3000, 1, .1)
day = rep(1:30, each = 100)
mycas = data.frame(locid,basecas, day)
Here’s what the geo and case files look like. I’m using generic time, for convenience.
head(mygeo)
## Var1 Var2
## 1 1 1
## 2 2 1
## 3 3 1
## 4 4 1
## 5 5 1
## 6 6 1
head(mycas)
## locid basecas day
## 1 1 1 1
## 2 2 1 1
## 3 3 0 1
## 4 4 0 1
## 5 5 0 1
## 6 6 0 1
Now I can write the data into the OS; the row names in the mygeo data.frame object are the location IDs for SaTSCan, so I’m using the userownames
option to use, rather than ignore, the row names from R in the geography file; in the case file, there is an explicit column with the same information included.
library("rsatscan")
td = tempdir()
write.geo(mygeo, location = td, file = "mygeo", userownames=TRUE)
write.cas(mycas, location = td, file = "mycas")
Now I’m ready to build the parameter file. This is adapted pretty closely from the NYCfever
example in the rsatscan
vignette.
invisible(ss.options(reset=TRUE))
ss.options(list(CaseFile="mycas.cas", PrecisionCaseTimes=4))
ss.options(list(StartDate="1", CoordinatesType=0, TimeAggregationUnits=4))
ss.options(list(EndDate="30", CoordinatesFile="mygeo.geo", AnalysisType=4, ModelType=2))
ss.options(list(UseDistanceFromCenterOption="y", MaxSpatialSizeInDistanceFromCenter=3))
ss.options(list(NonCompactnessPenalty=0, MaxTemporalSizeInterpretation=1, MaxTemporalSize=7))
ss.options(list(ProspectiveStartDate="30", ReportGiniClusters="n", LogRunToHistoryFile="n"))
Then I write the parameter file into the OS and run SaTScan using it. I’ll peek in the summary cluster table to see what we got.
write.ss.prm(td, "mybase")
# This step omitted in compliance with CRAN policies
# Please install SaTScan and run the vignette with this and following code uncommented
# SaTScan can be downloaded from www.satscan.org, free of charge
# you will also find there fully compiled versions of this vignette with results
# mybase = satscan(td, "mybase")
# mybase$col[3:10]
As one would hope, there’s no evidence of a meaningful cluster.
Now, let’s add a day just like the others. I’ll stick it onto the end of the previous data, then write out a new case file.
newday = data.frame(locid = 1:100, basecas = rbinom(100,1,.1), day = 31)
newcas = rbind(mycas,newday)
write.cas(newcas, location = td, file = "mycas")
I don’t need to re-assign any parameter values that don’t change between runs. In this case, since I used the same name for the data file, I only need to change the end date of the surveillance period.
ss.options(list(EndDate="31"))
write.ss.prm(td, "day1")
# day1 = satscan(td, "day1")
# day1$col[3:10]
Again, no clusters, as we would expect.
But now let’s make a cluster appear. I create an additional time unit as before, but then select a location to get a heap of extra cases. Glue the new day to the end of the old case file, write it to the OS, change the end date, and re-run SaTScan.
newday = data.frame(locid = 1:100, basecas = rbinom(100,1,.1), day = 32)
newday$basecas[20] =5
newcas = rbind(mycas,newday)
write.cas(newcas, location = td, file = "mycas")
ss.options(list(EndDate="32"))
write.ss.prm(td, "day2")
# day2 = satscan(td,"day2")
# day2$col[3:10]
This demonstrates that I did detect what I inserted. I can also extract the wordier section of the report about this cluster.
# summary(day2)
# cat(day2$main[20:31],fill=1)