We create some data and replace one column with NA
.
data <- matrix(rnorm(120), ncol = 10)
data[, 3] <- NA
print(data)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.52725690 -0.5650635 NA -0.8330864 -1.84076675 -0.70387009
## [2,] -0.32897935 -0.8141488 NA 2.8772045 0.50445371 0.35653669
## [3,] -0.03722701 -1.4251536 NA -0.9783538 0.08552368 -0.63458325
## [4,] 0.86656273 0.1338692 NA 0.7363414 -1.46301556 1.29947467
## [5,] -0.27848127 0.9784333 NA -1.8402411 0.36585998 0.06301041
## [6,] -0.72859091 -2.0063301 NA -0.1671825 2.91277834 1.08138601
## [7,] 0.14320714 -2.1895088 NA 0.6708241 0.45407746 1.00644550
## [8,] -0.02699480 0.3412810 NA 0.5598077 -0.53392468 1.29904163
## [9,] -0.92760608 0.2585080 NA 0.6081378 0.91698313 0.20770775
## [10,] 1.16113976 0.9334869 NA -0.3883073 0.24693741 1.93688179
## [11,] 0.32968508 -0.5004544 NA 0.1780987 -0.90708610 0.16230491
## [12,] -0.52280539 -1.3532271 NA -0.5578688 0.91938632 -0.60554273
## [,7] [,8] [,9] [,10]
## [1,] -0.66505318 2.0955300 -0.7208257 -1.39526062
## [2,] -0.68950095 0.9864506 -0.5673924 -1.37842012
## [3,] 0.76582097 2.1555987 1.7947341 0.04310134
## [4,] 2.28198335 -0.9557152 -0.5284728 -0.53421657
## [5,] -1.37036796 -0.9473371 0.9176235 1.01011782
## [6,] -0.79153297 -0.3549076 -0.4968285 -0.42689409
## [7,] -0.84289415 2.1293334 0.4371836 -0.36992825
## [8,] 0.18424855 -1.4000551 -0.3569141 -0.10540388
## [9,] -0.05918566 1.1119620 1.3792888 0.96269766
## [10,] 2.21864791 -1.0228976 -1.0114390 -0.52767763
## [11,] -0.99578761 -2.0396835 -2.6784326 0.88223119
## [12,] -1.34198710 0.6878626 0.4217538 -2.60461988
The covariance, with the implicit use = 'everything'
will give us a “cross” of NA
in the covariance matrix.
cov(data)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.39602529 0.2257976 NA -0.06122212 -0.53820619 0.2100530
## [2,] 0.22579757 1.1574737 NA -0.20122460 -0.52853820 0.2497122
## [3,] NA NA NA NA NA NA
## [4,] -0.06122212 -0.2012246 NA 1.40460774 0.03063429 0.3310987
## [5,] -0.53820619 -0.5285382 NA 0.03063429 1.56726935 0.1512848
## [6,] 0.21005302 0.2497122 NA 0.33109869 0.15128476 0.7530054
## [7,] 0.51649386 0.5455969 NA 0.13282563 -0.48553962 0.6073985
## [8,] -0.23184734 -0.8958199 NA 0.07973833 0.13344307 -0.6786274
## [9,] -0.37055100 -0.1355354 NA -0.36983937 0.47931412 -0.3403516
## [10,] -0.05042305 0.4487861 NA -0.23752735 -0.01404820 0.1623234
## [,7] [,8] [,9] [,10]
## [1,] 0.51649386 -0.23184734 -0.37055100 -0.05042305
## [2,] 0.54559691 -0.89581991 -0.13553538 0.44878607
## [3,] NA NA NA NA
## [4,] 0.13282563 0.07973833 -0.36983937 -0.23752735
## [5,] -0.48553962 0.13344307 0.47931412 -0.01404820
## [6,] 0.60739853 -0.67862744 -0.34035158 0.16232337
## [7,] 1.59426005 -0.36715377 -0.05793457 0.07672173
## [8,] -0.36715377 2.25995991 1.05910559 -0.56368506
## [9,] -0.05793457 1.05910559 1.44190733 0.09457831
## [10,] 0.07672173 -0.56368506 0.09457831 1.14601966
The jackknife covariance does the same thing.
jackknife_cov(data)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.9932550 2.276792 NA -0.6173230 -5.4269124 2.118035 5.2079797
## [2,] 2.2767922 11.671193 NA -2.0290147 -5.3294268 2.517932 5.5014355
## [3,] NA NA NA NA NA NA NA
## [4,] -0.6173230 -2.029015 NA 14.1631280 0.3088957 3.338579 1.3393251
## [5,] -5.4269124 -5.329427 NA 0.3088957 15.8032993 1.525455 -4.8958579
## [6,] 2.1180346 2.517932 NA 3.3385785 1.5254546 7.592805 6.1246019
## [7,] 5.2079797 5.501435 NA 1.3393251 -4.8958579 6.124602 16.0754555
## [8,] -2.3377940 -9.032851 NA 0.8040281 1.3455510 -6.842827 -3.7021338
## [9,] -3.7363893 -1.366648 NA -3.7292136 4.8330840 -3.431878 -0.5841735
## [10,] -0.5084325 4.525260 NA -2.3950674 -0.1416527 1.636761 0.7736108
## [,8] [,9] [,10]
## [1,] -2.3377940 -3.7363893 -0.5084325
## [2,] -9.0328507 -1.3666484 4.5252595
## [3,] NA NA NA
## [4,] 0.8040281 -3.7292136 -2.3950674
## [5,] 1.3455510 4.8330840 -0.1416527
## [6,] -6.8428267 -3.4318785 1.6367607
## [7,] -3.7021338 -0.5841735 0.7736108
## [8,] 22.7879291 10.6793147 -5.6838244
## [9,] 10.6793147 14.5392322 0.9536647
## [10,] -5.6838244 0.9536647 11.5556982
When we have some NA
values in a row, we have a conceptual problem with the jackknife as the width of the jackknife distribution is linked to the number of measurements.
data <- matrix(rnorm(120), ncol = 10)
data[2, ] <- NA
print(data)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.2319684 -1.0417411 0.7841038 -1.4569667 -0.51423719 -0.4738826
## [2,] NA NA NA NA NA NA
## [3,] 1.5753159 0.7606100 0.2877911 -1.5020441 -0.06586191 1.6701913
## [4,] -0.9682803 -0.7941394 0.2280203 1.7791729 -0.55453876 1.1581078
## [5,] -0.5653530 0.5563140 -0.1897482 -0.2250793 0.36231650 0.4517224
## [6,] -0.8466961 -1.8471158 0.1835276 2.0840631 -0.55140704 0.8695333
## [7,] -2.1527329 -0.2399547 1.6030800 1.7218317 -0.43925685 -0.2125655
## [8,] 0.2142895 1.3195920 -1.0153580 1.3608098 0.14743768 -0.3355199
## [9,] -2.3461886 2.0155852 0.1644110 0.4275195 0.99522225 0.6089167
## [10,] -0.5712919 -0.5154676 0.1823142 0.3993543 -2.03591803 0.7316732
## [11,] 0.1782927 -0.9164950 0.1212507 0.4301902 -0.17158141 -1.3095372
## [12,] 0.3616790 2.6653700 -0.2467147 -0.7238332 0.17495490 1.7830029
## [,7] [,8] [,9] [,10]
## [1,] -1.6976902 0.09728423 0.9038715 -0.68359889
## [2,] NA NA NA NA
## [3,] 1.7016404 0.57188894 1.3800839 -0.96380028
## [4,] -1.9856220 0.30292925 -0.2245294 -0.74710414
## [5,] 1.0290240 -0.53308489 -2.0868422 0.55156460
## [6,] -1.0102936 -2.86162295 0.5879080 0.07937102
## [7,] 1.7484263 1.76558417 1.1075010 -1.87207295
## [8,] -1.9183902 -0.41129021 1.4201490 0.47395157
## [9,] -0.1378898 -1.14469597 -0.3479164 0.30071595
## [10,] 1.1611336 -1.68471148 -0.5275343 0.51733299
## [11,] 0.7758523 -0.36545530 1.0559696 1.15103955
## [12,] -0.5175131 0.92250274 -0.5816746 -1.04687059
Also here we get the same behavior by default:
cov(data)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] NA NA NA NA NA NA NA NA NA NA
## [2,] NA NA NA NA NA NA NA NA NA NA
## [3,] NA NA NA NA NA NA NA NA NA NA
## [4,] NA NA NA NA NA NA NA NA NA NA
## [5,] NA NA NA NA NA NA NA NA NA NA
## [6,] NA NA NA NA NA NA NA NA NA NA
## [7,] NA NA NA NA NA NA NA NA NA NA
## [8,] NA NA NA NA NA NA NA NA NA NA
## [9,] NA NA NA NA NA NA NA NA NA NA
## [10,] NA NA NA NA NA NA NA NA NA NA
jackknife_cov(data)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] NA NA NA NA NA NA NA NA NA NA
## [2,] NA NA NA NA NA NA NA NA NA NA
## [3,] NA NA NA NA NA NA NA NA NA NA
## [4,] NA NA NA NA NA NA NA NA NA NA
## [5,] NA NA NA NA NA NA NA NA NA NA
## [6,] NA NA NA NA NA NA NA NA NA NA
## [7,] NA NA NA NA NA NA NA NA NA NA
## [8,] NA NA NA NA NA NA NA NA NA NA
## [9,] NA NA NA NA NA NA NA NA NA NA
## [10,] NA NA NA NA NA NA NA NA NA NA
When we use complete
, we get the same thing as just dropping the NA
rows.
cov(data, use = 'complete')
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.30103326 0.1012675 -0.29252024 -0.89866029 -0.076512587 0.155496110
## [2,] 0.10126753 1.9736935 -0.38545492 -0.60637630 0.659830950 0.501881191
## [3,] -0.29252024 -0.3854549 0.41540469 0.01875502 -0.133684585 -0.086282734
## [4,] -0.89866029 -0.6063763 0.01875502 1.61290704 -0.147707213 -0.210080850
## [5,] -0.07651259 0.6598309 -0.13368458 -0.14770721 0.578198468 0.006436648
## [6,] 0.15549611 0.5018812 -0.08628273 -0.21008085 0.006436648 0.905334387
## [7,] -0.04589359 0.1312257 0.36493410 -0.41069069 -0.118906980 0.041467917
## [8,] 0.18973429 0.6066817 0.30052077 -0.45079884 0.200204968 -0.001201813
## [9,] 0.34931254 -0.3779087 0.14965746 0.12118299 -0.069572557 -0.375162256
## [10,] 0.06293107 -0.1298983 -0.35864639 0.10397578 0.020286857 -0.340013673
## [,7] [,8] [,9] [,10]
## [1,] -0.04589359 0.189734288 0.34931254 0.06293107
## [2,] 0.13122573 0.606681709 -0.37790870 -0.12989828
## [3,] 0.36493410 0.300520774 0.14965746 -0.35864639
## [4,] -0.41069069 -0.450798836 0.12118299 0.10397578
## [5,] -0.11890698 0.200204968 -0.06957256 0.02028686
## [6,] 0.04146792 -0.001201813 -0.37516226 -0.34001367
## [7,] 2.07083507 0.360719472 -0.15059524 -0.08703427
## [8,] 0.36071947 1.630639705 0.27910828 -0.81477057
## [9,] -0.15059524 0.279108280 1.19123578 -0.22909654
## [10,] -0.08703427 -0.814770573 -0.22909654 0.83171570
all(cov(data, use = 'complete') == cov(data[complete.cases(data), ]))
## [1] TRUE
With our jackknife function we get a failure, which should not happen!
jackknife_cov(data, na.rm = TRUE)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 157.425025 12.25337 -35.394949 -108.737895 -9.2580230 18.8150293
## [2,] 12.253371 238.81691 -46.640046 -73.371532 79.8395449 60.7276241
## [3,] -35.394949 -46.64005 50.263967 2.269357 -16.1758347 -10.4402108
## [4,] -108.737895 -73.37153 2.269357 195.161752 -17.8725728 -25.4197828
## [5,] -9.258023 79.83954 -16.175835 -17.872573 69.9620147 0.7788344
## [6,] 18.815029 60.72762 -10.440211 -25.419783 0.7788344 109.5454609
## [7,] -5.553124 15.87831 44.157026 -49.693574 -14.3877445 5.0176179
## [8,] 22.957849 73.40849 36.363014 -54.546659 24.2248011 -0.1454194
## [9,] 42.266818 -45.72695 18.108553 14.663142 -8.4182795 -45.3946330
## [10,] 7.614660 -15.71769 -43.396213 12.581070 2.4547097 -41.1416545
## [,7] [,8] [,9] [,10]
## [1,] -5.553124 22.9578489 42.266818 7.61466
## [2,] 15.878313 73.4084868 -45.726953 -15.71769
## [3,] 44.157026 36.3630136 18.108553 -43.39621
## [4,] -49.693574 -54.5466591 14.663142 12.58107
## [5,] -14.387745 24.2248011 -8.418279 2.45471
## [6,] 5.017618 -0.1454194 -45.394633 -41.14165
## [7,] 250.571044 43.6470561 -18.222024 -10.53115
## [8,] 43.647056 197.3074044 33.772102 -98.58724
## [9,] -18.222024 33.7721019 144.139530 -27.72068
## [10,] -10.531146 -98.5872394 -27.720682 100.63760