Introduction

The icd package for R includes ICD-10-CM definitions, sample ICD-10-CM data, and very fast comorbidity calculations from ICD-10 diagnostic and procedure codes (and ICD-9, or other schemes) to the standard comorbidities defined in the literature following Charlson [@charlson_new_1987], Quan, Deyo [@quan_coding_2005], Elixhauser [@elixhauser_comorbidity_1998], pediatric complex chronic conditions [@Feudtner_Pediatriccomplexchronic_2014] and the US AHRQ [@AgencyforHealthcareResearchandQuality_Elixhausercomorbiditysoftware_2018]. There are also 2018 ICD-10-CM procedure codes, and a mapping to categorize these. The sample data are from the US Transuranium and Uranium Registries where deidentified diagnoses are given for a few hundred pathology cases relating to uranium exposure.

The sample data is in the ‘long’ format, i.e., multiple rows per case.

uranium_pathology[1:10, ]
#>    case icd10
#> 1     1 F17.9
#> 2     1 I21.9
#> 3     1 K75.9
#> 4     1   R55
#> 5     2 I25.1
#> 6     2 I35.8
#> 7     2 I63.9
#> 8     2   I64
#> 9     2 J43.9
#> 10    2 J84.1

Pick a code, and see what it means.

explain_code("R55")
#> [1] "Syncope and collapse"

icd can work with ‘long’ or ‘wide’ format data without modification. If the source database is not normalized, i.e., has multiple diagnostic code columns, icd can detect and efficiently work on all the columns at once.

head(uranium_pathology)
#>   case icd10
#> 1    1 F17.9
#> 2    1 I21.9
#> 3    1 K75.9
#> 4    1   R55
#> 5    2 I25.1
#> 6    2 I35.8

Now map these diagnoses to disease groups as defined by Quan et al:

quan_comorbidities <- comorbid(uranium_pathology, icd10_map_quan_elix)
# see the first few rows and columns:
quan_comorbidities[1:6, c(1, 3:10)]
#>     CHF Valvular  PHTN   PVD   HTN HTNcx Paralysis NeuroOther Pulmonary
#> 1 FALSE    FALSE FALSE FALSE FALSE FALSE     FALSE      FALSE     FALSE
#> 2 FALSE     TRUE FALSE FALSE FALSE FALSE     FALSE      FALSE      TRUE
#> 3 FALSE    FALSE FALSE FALSE FALSE FALSE     FALSE      FALSE     FALSE
#> 4 FALSE    FALSE FALSE FALSE FALSE FALSE     FALSE      FALSE     FALSE
#> 5 FALSE    FALSE FALSE FALSE FALSE FALSE     FALSE      FALSE     FALSE
#> 6 FALSE    FALSE FALSE  TRUE FALSE  TRUE     FALSE      FALSE      TRUE

Tidy results

The ‘tidyverse’ is oriented around tidy data. icd by default returns matrices for comorbidity calculations, since all the data is logical, and this is most efficient for memory and subsequent manipulation. However, setting return_df = TRUE in calls to any of the comorbidity functions will return a ‘tidy’ data frame with an ‘id’ column and a column for each of the comorbdities.

comorbid_charlson(uranium_pathology, return_df = TRUE)[1:5, 1:5]
#>   case    MI   CHF   PVD Stroke
#> 1    1  TRUE FALSE FALSE  FALSE
#> 2    2 FALSE FALSE FALSE   TRUE
#> 3    3  TRUE FALSE FALSE  FALSE
#> 4    4 FALSE FALSE FALSE  FALSE
#> 5    5 FALSE FALSE FALSE  FALSE

Working with big data

icd is carefully optimized to give accurate results as quickly as possible, and shines with huge data sets. For users working with millions of rows of data and higher orders of magnitude, some options can improve throughput.

In this example, we also return ‘binary’ numeric flags instead of a matrix of logical values. For very large data sets, matrices are faster both for icd and subsequent manipulation and analysis. Here we also show that we can request the ‘id’ field is return in the order the data was presented. Note that setting restore_id_order to FALSE does not sort: it simply returns the data as soon as it was calculated, and since parallel threads complete at different times, it is quicker, although not deterministic. Again, this is really for huge data sets, where the additional sorting and re-ordering may be very time consuming. Most users will not need to worry about this.

# shuffle the rows:
set.seed(1441)
u <- uranium_pathology[sample(seq_len(nrow(uranium_pathology))), ]
head(u)
#>      case  icd10
#> 339    55  J18.0
#> 371    60 S32.80
#> 1100  231  J43.9
#> 433    83  I26.9
#> 1255  257    C64
#> 929   205  C78.8
quan_comorbidities <- comorbid(u,
                               icd10_map_quan_elix,
                               return_df = TRUE,
                               return_binary = TRUE,
                               restore_id_order = FALSE)
# see the first few rows and columns:
quan_comorbidities[1:6, c(1, 3:9)]
#>   case Arrhythmia Valvular PHTN PVD HTN HTNcx Paralysis
#> 1 1041          0        0    0   0   0     0         0
#> 2   23          0        0    0   0   0     0         0
#> 3  255          0        0    0   1   0     0         0
#> 4   60          0        0    0   0   0     0         0
#> 5  153          0        0    0   0   0     1         0
#> 6  331          0        0    0   0   1     0         0

The ICD-10-CM mappings are recorded a bit differently from the ICD-9-CM mappings in this package. The ICD-9 mappings included all possible permutations of child codes. Since ICD-10 codes contain letters, and are seven characters long, this became impractical. Therefore, the current mappings include only codes for the most recent update of ICD-10-CM. The code which assigns comorbidities for ICD-10 therefore doesn’t rely on all the possible codes being listed in the mappings, so it will (more slowly) search for each possible parent of the given code, up to the three digit ‘major’ (e.g. if Cholera was in the comorbidity mapping, then A0034212647 would eventually match A00)

# create trivial comorbidity map:
cholera_typhoid_map <- list(cholera = "A00", typhoid = "A01")
patients <- data.frame(patient = c("0001", "0001", "0002"),
                       code = c("A001234567", "A01", "A019"))
comorbid(patients , map = cholera_typhoid_map)
#>      cholera typhoid
#> 0001    TRUE    TRUE
#> 0002   FALSE    TRUE

Here are the codes for hypertension with complications from Quan et al. Note that the vector has class icd10 and has the attribute icd_short_diag indicating there are no decimal point delimiters in the codes.

icd10_map_quan_elix$HTNcx
#>  [1] "I11"   "I110"  "I119"  "I12"   "I120"  "I129"  "I13"   "I130"  "I131" 
#> [10] "I1310" "I1311" "I132"  "I15"   "I150"  "I151"  "I152"  "I158"  "I159"

Procedure codes

The AHRQ publishes an annually updated categorization of ICD-10-CM procedure codes into four classes, representing diagnostic and therapeutic procedures, each being either minor or major.

#>   Minor Diagnostic Minor Therapeutic Major Diagnostic Major Therapeutic
#> K                1                 0                0                 0
#> I                0                 0                0                 1
#> G                0                 1                0                 0
#> J                0                 0                0                 1
#> E                0                 1                0                 0
#> D                0                 0                0                 1
#> X                0                 0                0                 1
#> B                0                 1                0                 0
#> P                0                 0                0                 1
#> T                0                 0                0                 1
#>  Minor Diagnostic Minor Therapeutic  Major Diagnostic Major Therapeutic 
#>                 1                 3                 0                 6

For more information on working with ICD-10 codes, see the introduction vignette, and function examples. E.g.:

?comorbid
?explain_code

ICD-10 comorbidities

Jack O. Wasey

2020-05-30

Introduction

Tidy results

Working with big data

Procedure codes

References