Regression

The Regression() function performs multiple facets of a complete regression analysis. Abbreviate with reg().

Default Analysis

Brief output

The brief version provides just the basic analysis, what Excel provides, plus a scatterplot with the regression line.

reg_brief(Salary ~ Years + Pre)

## >>> Suggestion
## # Create an R markdown file for interpretative output with  Rmd = "file_name"
## reg(Salary ~ Years + Pre, Rmd="eg")  
## 
## 
##   BACKGROUND
## 
## Data Frame:  d 
##  
## Response Variable: Salary 
## Predictor Variable 1: Years 
## Predictor Variable 2: Pre 
##  
## Number of cases (rows) of data:  37 
## Number of cases retained for analysis:  36 
## 
## 
##   BASIC ANALYSIS
## 
##              Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95% 
## (Intercept) 44140.971  13666.115    3.230    0.003   16337.052   71944.891 
##       Years  3251.408    347.529    9.356    0.000    2544.355    3958.462 
##         Pre   -18.265    167.652   -0.109    0.914    -359.355     322.825 
## 
## 
## Standard deviation of residuals:  11753.478 for 33 degrees of freedom 
##  
## R-squared:  0.726    Adjusted R-squared:  0.710    PRESS R-squared:  0.659 
## 
## Null hypothesis that all population slope coefficients are 0:
##   F-statistic: 43.827     df: 2 and 33     p-value:  0.000 
## 
## 
##             df           Sum Sq          Mean Sq   F-value   p-value 
##     Years    1  12107157290.292  12107157290.292    87.641     0.000 
##       Pre    1      1639658.444      1639658.444     0.012     0.914 
##  
## Model        2  12108796948.736   6054398474.368    43.827     0.000 
## Residuals   33   4558759843.773    138144237.690 
## Salary      35  16667556792.508    476215908.357 
## 
## 
##   K-FOLD CROSS-VALIDATION
## 
##   RELATIONS AMONG THE VARIABLES
## 
##   RESIDUALS AND INFLUENCE
## 
##   FORECASTING ERROR

Full output

The full output is extensive: Summary of the analysis, estimated model, fit indices, ANOVA, correlation matrix, collinearity analysis, best subset regression, residuals and influence statistics, and prediction intervals.

reg(Salary ~ Years + Pre)

## >>> Suggestion
## # Create an R markdown file for interpretative output with  Rmd = "file_name"
## reg(Salary ~ Years + Pre, Rmd="eg")  
## 
## 
##   BACKGROUND
## 
## Data Frame:  d 
##  
## Response Variable: Salary 
## Predictor Variable 1: Years 
## Predictor Variable 2: Pre 
##  
## Number of cases (rows) of data:  37 
## Number of cases retained for analysis:  36 
## 
## 
##   BASIC ANALYSIS
## 
##              Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95% 
## (Intercept) 44140.971  13666.115    3.230    0.003   16337.052   71944.891 
##       Years  3251.408    347.529    9.356    0.000    2544.355    3958.462 
##         Pre   -18.265    167.652   -0.109    0.914    -359.355     322.825 
## 
## 
## Standard deviation of residuals:  11753.478 for 33 degrees of freedom 
##  
## R-squared:  0.726    Adjusted R-squared:  0.710    PRESS R-squared:  0.659 
## 
## Null hypothesis that all population slope coefficients are 0:
##   F-statistic: 43.827     df: 2 and 33     p-value:  0.000 
## 
## 
##             df           Sum Sq          Mean Sq   F-value   p-value 
##     Years    1  12107157290.292  12107157290.292    87.641     0.000 
##       Pre    1      1639658.444      1639658.444     0.012     0.914 
##  
## Model        2  12108796948.736   6054398474.368    43.827     0.000 
## Residuals   33   4558759843.773    138144237.690 
## Salary      35  16667556792.508    476215908.357 
## 
## 
##   K-FOLD CROSS-VALIDATION
## 
##   RELATIONS AMONG THE VARIABLES
## 
##          Salary Years  Pre 
##   Salary   1.00  0.85 0.03 
##    Years   0.85  1.00 0.05 
##      Pre   0.03  0.05 1.00 
## 
## 
##         Tolerance       VIF 
##   Years     0.998     1.002 
##     Pre     0.998     1.002 
## 
## 
##  Years Pre    R2adj    X's 
##      1   0    0.718      1 
##      1   1    0.710      2 
##      0   1   -0.028      1 
##  
## [based on Thomas Lumley's leaps function from the leaps package] 
##  
## 
## 
##   RESIDUALS AND INFLUENCE
## 
## Data, Fitted, Residual, Studentized Residual, Dffits, Cook's Distance 
##    [sorted by Cook's Distance] 
##    [res_rows = 20, out of 36 rows of data, or do res_rows="all"] 
## ----------------------------------------------------------------------------------------- 
##                        Years     Pre     Salary     fitted      resid rstdnt dffits cooks 
##       Correll, Trevon     21      97 134419.230 110648.843  23770.387  2.424  1.217 0.430 
##         James, Leslie     18      70 122563.380 101387.773  21175.607  1.998  0.714 0.156 
##         Capelle, Adam     24      83 108138.430 120658.778 -12520.348 -1.211 -0.634 0.132 
##           Hoang, Binh     15      96 111074.860  91158.659  19916.201  1.860  0.649 0.131 
##    Korhalkar, Jessica      2      74  72502.500  49292.181  23210.319  2.171  0.638 0.122 
##        Billing, Susan      4      91  72675.260  55484.493  17190.767  1.561  0.472 0.071 
##          Singh, Niral      2      59  61055.440  49566.155  11489.285  1.064  0.452 0.068 
##        Skrotzki, Sara     18      63  91352.330 101515.627 -10163.297 -0.937 -0.397 0.053 
##      Saechao, Suzanne      8      98  55545.250  68362.271 -12817.021 -1.157 -0.390 0.050 
##         Kralik, Laura     10      74  92681.190  75303.447  17377.743  1.535  0.287 0.026 
##   Anastasiou, Crystal      2      59  56508.320  49566.155   6942.165  0.636  0.270 0.025 
##     Langston, Matthew      5      94  49188.960  58681.106  -9492.146 -0.844 -0.268 0.024 
##        Afshari, Anbar      6     100  69441.930  61822.925   7619.005  0.689  0.264 0.024 
##   Cassinelli, Anastis     10      80  57562.360  75193.857 -17631.497 -1.554 -0.265 0.022 
##      Osterman, Pascal      5      69  49704.790  59137.730  -9432.940 -0.826 -0.216 0.016 
##   Bellingar, Samantha     10      67  66337.830  75431.301  -9093.471 -0.793 -0.198 0.013 
##          LaRoe, Maria     10      80  61961.290  75193.857 -13232.567 -1.148 -0.195 0.013 
##      Ritchie, Darnell      7      82  53788.260  65403.102 -11614.842 -1.006 -0.190 0.012 
##        Sheppard, Cory     14      66  95027.550  88455.199   6572.351  0.579  0.176 0.011 
##        Downs, Deborah      7      90  57139.900  65256.982  -8117.082 -0.706 -0.174 0.010 
## 
## 
##   FORECASTING ERROR
## 
## Data, Predicted, Standard Error of Forecast, 95% Prediction Intervals 
##    [sorted by lower bound of prediction interval] 
##    [to see all intervals do pred_rows="all"] 
## -------------------------------------------------------------------------------------------------- 
##                        Years    Pre     Salary       pred        sf    pi:lwr     pi:upr     width 
##          Hamide, Bita      1     83  51036.850  45876.388 12290.483 20871.211  70881.564 50010.352 
##          Singh, Niral      2     59  61055.440  49566.155 12619.291 23892.014  75240.296 51348.281 
##   Anastasiou, Crystal      2     59  56508.320  49566.155 12619.291 23892.014  75240.296 51348.281 
## ... 
##          Link, Thomas     10     83  66312.890  75139.062 11933.518 50860.137  99417.987 48557.849 
##          LaRoe, Maria     10     80  61961.290  75193.857 11918.048 50946.405  99441.308 48494.903 
##   Cassinelli, Anastis     10     80  57562.360  75193.857 11918.048 50946.405  99441.308 48494.903 
## ... 
##       Correll, Trevon     21     97 134419.230 110648.843 12881.876 84440.470 136857.217 52416.747 
##         Capelle, Adam     24     83 108138.430 120658.778 12955.608 94300.394 147017.161 52716.767 
## 
## 
## ---------------------------------- 
## Plot 1: Distribution of Residuals 
## Plot 2: Residuals vs Fitted Values 
## Plot 3: ScatterPlot Matrix 
## ----------------------------------

k-fold cross-validation

The standard output includes $R^2_{press}, the value of $R^2$ when applied to new, previously unseen data. Still, a cross-validation option is also offered with the kfold parameter. Here specify three folds.

reg(Salary ~ Years, kfold=3)

##   K-FOLD CROSS-VALIDATION
## 
##        Model from Training Data              Applied to Testing Data 
##        ----------------------------------   ---------------------------------- 
## fold    n        se           MSE    Rsq     n        se           MSE    Rsq 
##   1 |  24 12273.934 150649453.294  0.731 |  12 11306.800 127843727.961  0.703 
##   2 |  24 10936.028 119596701.753  0.777 |  12 14446.144 208691069.124  0.571 
##   3 |  24 11646.282 135635890.275  0.676 |  12 12965.769 168111155.301  0.774 
##       ----------------------------------    ---------------------------------- 
## Mean      11618.748 135294015.107  0.728       12906.237 168215317.462  0.683

Output as a Stored Object

The output of Regression() can be stored into an R object, here named r. The output object consists of various components.

r <- reg(Salary ~ Years + Pre)

Entering the name of the object displays the full output.

## >>> Suggestion
## # Create an R markdown file for interpretative output with  Rmd = "file_name"
## reg(Salary ~ Years + Pre, Rmd="eg")  
## 
## 
##   BACKGROUND
## 
## Data Frame:  d 
##  
## Response Variable: Salary 
## Predictor Variable 1: Years 
## Predictor Variable 2: Pre 
##  
## Number of cases (rows) of data:  37 
## Number of cases retained for analysis:  36 
## 
## 
##   BASIC ANALYSIS
## 
##              Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95% 
## (Intercept) 44140.971  13666.115    3.230    0.003   16337.052   71944.891 
##       Years  3251.408    347.529    9.356    0.000    2544.355    3958.462 
##         Pre   -18.265    167.652   -0.109    0.914    -359.355     322.825 
## 
## 
## Standard deviation of residuals:  11753.478 for 33 degrees of freedom 
##  
## R-squared:  0.726    Adjusted R-squared:  0.710    PRESS R-squared:  0.659 
## 
## Null hypothesis that all population slope coefficients are 0:
##   F-statistic: 43.827     df: 2 and 33     p-value:  0.000 
## 
## 
##             df           Sum Sq          Mean Sq   F-value   p-value 
##     Years    1  12107157290.292  12107157290.292    87.641     0.000 
##       Pre    1      1639658.444      1639658.444     0.012     0.914 
##  
## Model        2  12108796948.736   6054398474.368    43.827     0.000 
## Residuals   33   4558759843.773    138144237.690 
## Salary      35  16667556792.508    476215908.357 
## 
## 
##   K-FOLD CROSS-VALIDATION
## 
##   RELATIONS AMONG THE VARIABLES
## 
##          Salary Years  Pre 
##   Salary   1.00  0.85 0.03 
##    Years   0.85  1.00 0.05 
##      Pre   0.03  0.05 1.00 
## 
## 
##         Tolerance       VIF 
##   Years     0.998     1.002 
##     Pre     0.998     1.002 
## 
## 
##  Years Pre    R2adj    X's 
##      1   0    0.718      1 
##      1   1    0.710      2 
##      0   1   -0.028      1 
##  
## [based on Thomas Lumley's leaps function from the leaps package] 
##  
## 
## 
##   RESIDUALS AND INFLUENCE
## 
## Data, Fitted, Residual, Studentized Residual, Dffits, Cook's Distance 
##    [sorted by Cook's Distance] 
##    [res_rows = 20, out of 36 rows of data, or do res_rows="all"] 
## ----------------------------------------------------------------------------------------- 
##                        Years     Pre     Salary     fitted      resid rstdnt dffits cooks 
##       Correll, Trevon     21      97 134419.230 110648.843  23770.387  2.424  1.217 0.430 
##         James, Leslie     18      70 122563.380 101387.773  21175.607  1.998  0.714 0.156 
##         Capelle, Adam     24      83 108138.430 120658.778 -12520.348 -1.211 -0.634 0.132 
##           Hoang, Binh     15      96 111074.860  91158.659  19916.201  1.860  0.649 0.131 
##    Korhalkar, Jessica      2      74  72502.500  49292.181  23210.319  2.171  0.638 0.122 
##        Billing, Susan      4      91  72675.260  55484.493  17190.767  1.561  0.472 0.071 
##          Singh, Niral      2      59  61055.440  49566.155  11489.285  1.064  0.452 0.068 
##        Skrotzki, Sara     18      63  91352.330 101515.627 -10163.297 -0.937 -0.397 0.053 
##      Saechao, Suzanne      8      98  55545.250  68362.271 -12817.021 -1.157 -0.390 0.050 
##         Kralik, Laura     10      74  92681.190  75303.447  17377.743  1.535  0.287 0.026 
##   Anastasiou, Crystal      2      59  56508.320  49566.155   6942.165  0.636  0.270 0.025 
##     Langston, Matthew      5      94  49188.960  58681.106  -9492.146 -0.844 -0.268 0.024 
##        Afshari, Anbar      6     100  69441.930  61822.925   7619.005  0.689  0.264 0.024 
##   Cassinelli, Anastis     10      80  57562.360  75193.857 -17631.497 -1.554 -0.265 0.022 
##      Osterman, Pascal      5      69  49704.790  59137.730  -9432.940 -0.826 -0.216 0.016 
##   Bellingar, Samantha     10      67  66337.830  75431.301  -9093.471 -0.793 -0.198 0.013 
##          LaRoe, Maria     10      80  61961.290  75193.857 -13232.567 -1.148 -0.195 0.013 
##      Ritchie, Darnell      7      82  53788.260  65403.102 -11614.842 -1.006 -0.190 0.012 
##        Sheppard, Cory     14      66  95027.550  88455.199   6572.351  0.579  0.176 0.011 
##        Downs, Deborah      7      90  57139.900  65256.982  -8117.082 -0.706 -0.174 0.010 
## 
## 
##   FORECASTING ERROR
## 
## Data, Predicted, Standard Error of Forecast, 95% Prediction Intervals 
##    [sorted by lower bound of prediction interval] 
##    [to see all intervals do pred_rows="all"] 
## -------------------------------------------------------------------------------------------------- 
##                        Years    Pre     Salary       pred        sf    pi:lwr     pi:upr     width 
##          Hamide, Bita      1     83  51036.850  45876.388 12290.483 20871.211  70881.564 50010.352 
##          Singh, Niral      2     59  61055.440  49566.155 12619.291 23892.014  75240.296 51348.281 
##   Anastasiou, Crystal      2     59  56508.320  49566.155 12619.291 23892.014  75240.296 51348.281 
## ... 
##          Link, Thomas     10     83  66312.890  75139.062 11933.518 50860.137  99417.987 48557.849 
##          LaRoe, Maria     10     80  61961.290  75193.857 11918.048 50946.405  99441.308 48494.903 
##   Cassinelli, Anastis     10     80  57562.360  75193.857 11918.048 50946.405  99441.308 48494.903 
## ... 
##       Correll, Trevon     21     97 134419.230 110648.843 12881.876 84440.470 136857.217 52416.747 
##         Capelle, Adam     24     83 108138.430 120658.778 12955.608 94300.394 147017.161 52716.767 
## 
## 
## ---------------------------------- 
## Plot 1: Distribution of Residuals 
## Plot 2: Residuals vs Fitted Values 
## Plot 3: ScatterPlot Matrix 
## ----------------------------------

Or, work with the components individually. Use the base R names() function to identify all of the components. Component names that begin with out_ are part of the standard output. Other components include just data and statistics designed to be input in additional procedures.

names(r)

##  [1] "out_suggest"     "call"            "formula"         "out_title_bck"   "out_background"  "out_title_basic"
##  [7] "out_estimates"   "out_fit"         "out_anova"       "out_title_kfold" "out_kfold"       "out_title_rel"  
## [13] "out_cor"         "out_collinear"   "out_subsets"     "out_title_res"   "out_residuals"   "out_title_pred" 
## [19] "out_predict"     "out_ref"         "out_Rmd"         "out_Word"        "out_pdf"         "out_odt"        
## [25] "out_rtf"         "out_plots"       "n.vars"          "n.obs"           "n.keep"          "coefficients"   
## [31] "sterrs"          "tvalues"         "pvalues"         "cilb"            "ciub"            "anova_model"    
## [37] "anova_residual"  "anova_total"     "se"              "resid_range"     "Rsq"             "Rsqadj"         
## [43] "PRESS"           "RsqPRESS"        "m_se"            "m_MSE"           "m_Rsq"           "cor"            
## [49] "tolerances"      "vif"             "resid.max"       "pred_min_max"    "residuals"       "fitted"         
## [55] "cooks.distance"  "model"           "terms"

Here just display the estimates as part of the standard output.

r$out_estimates

##              Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95%
## (Intercept) 44140.971  13666.115    3.230    0.003   16337.052   71944.891
##       Years  3251.408    347.529    9.356    0.000    2544.355    3958.462
##         Pre   -18.265    167.652   -0.109    0.914    -359.355     322.825

Here display the coefficients.

r$coefficients

## (Intercept)       Years         Pre 
## 44140.97140  3251.40825   -18.26496

Interpreted Output

The parameter Rmd creates an R markdown file that is automatically generated and html document from knitting the various output components together with full interpretation. A new, much more complete form of computer output.

reg(Salary ~ Years + Pre, Rmd="eg")

Logistic Regression

Logit(Gender ~ Salary)

## 
## Response Variable:   Gender
## Predictor Variable 1:  Salary
## 
## Number of cases (rows) of data:  37 
## Number of cases retained for analysis:  37 
## 
## 
## 
##    BASIC ANALYSIS 
## 
## Model Coefficients
## 
##              Estimate    Std Err  z-value  p-value   Lower 95%   Upper 95%
## (Intercept)   -2.6191     1.3715   -1.910    0.056     -5.3073      0.0691 
##      Salary    0.0000     0.0000    1.904    0.057     -0.0000      0.0001 
## 
## 
## Odds ratios and confidence intervals
## 
##              Odds Ratio   Lower 95%   Upper 95%
## (Intercept)      0.0729      0.0050      1.0715 
##      Salary      1.0000      1.0000      1.0001 
## 
## 
## Model Fit
## 
##     Null deviance: 51.266 on 36 degrees of freedom
## Residual deviance: 46.918 on 35 degrees of freedom
## 
## AIC: 50.91807 
## 
## Number of iterations to convergence: 4 
## 
## 
## 
## 
##    ANALYSIS OF RESIDUALS AND INFLUENCE 
## Data, Fitted, Residual, Studentized Residual, Dffits, Cook's Distance
##    [sorted by Cook's Distance]
##    [res_rows = 20 out of 37 cases (rows) of data]
## --------------------------------------------------------------------
##                     Salary Gender fitted residual rstudent  dffits   cooks
## James, Leslie       122563      F 0.8424  -0.8424  -2.1213 -0.7143 0.46299
## Langston, Matthew    49189      M 0.2900   0.7100   1.6237  0.3646 0.08559
## Osterman, Pascal     49705      M 0.2938   0.7062   1.6139  0.3586 0.08225
## Kralik, Laura        92681      F 0.6522  -0.6522  -1.4942 -0.3313 0.06402
## Ritchie, Darnell     53788      M 0.3243   0.6757   1.5380  0.3136 0.05962
## Skrotzki, Sara       91352      F 0.6416  -0.6416  -1.4698 -0.3161 0.05736
## Cassinelli, Anastis  57562      M 0.3539   0.6461   1.4703  0.2761 0.04409
## Link, Thomas         66313      M 0.4267   0.5733   1.3223  0.2111 0.02335
## Anderson, David      69548      M 0.4547   0.5453   1.2706  0.1967 0.01962
## Stanley, Grayson     69625      M 0.4553   0.5447   1.2694  0.1965 0.01955
## Capelle, Adam       108138      M 0.7632   0.2368   0.7586  0.2236 0.01954
## Knox, Michael        99063      M 0.7011   0.2989   0.8637  0.2179 0.01935
## Hoang, Binh         111075      M 0.7813   0.2187   0.7265  0.2228 0.01919
## Sheppard, Cory       95028      M 0.6706   0.3294   0.9132  0.2119 0.01869
## Wu, James            94495      M 0.6665   0.3335   0.9199  0.2110 0.01859
## Campagna, Justin     72321      M 0.4788   0.5212   1.2275  0.1888 0.01759
## Fulton, Scott        87786      M 0.6124   0.3876   1.0066  0.1980 0.01706
## Adib, Hassan         83014      M 0.5720   0.4280   1.0715  0.1892 0.01613
## Pham, Scott          81871      M 0.5622   0.4378   1.0875  0.1875 0.01599
## Portlock, Ryan       77715      M 0.5261   0.4739   1.1469  0.1841 0.01593
## 
## 
##    FORECASTS 
## 
## Probability threshold for predicting M: 0.5
## 
##  0: F
##  1: M
## 
## Data, Fitted Values, Standard Errors
##    [sorted by fitted value]
## --------------------------------------------------------------------
##                     Salary Gender predict fitted std.err
## Stanley, Emma        46125      F       0 0.2684  0.1161
## Langston, Matthew    49189      M       0 0.2900  0.1126
## Osterman, Pascal     49705      M       0 0.2938  0.1119
## Gvakharia, Kimberly  49869      F       0 0.2949  0.1117
## 
## ... for the rows of data where fitted is close to 0.5 ...
## 
##                    Salary Gender predict fitted std.err
## Campagna, Justin    72321      M       0 0.4788 0.08710
## Korhalkar, Jessica  72502      F       0 0.4804 0.08713
## Billing, Susan      72675      F       0 0.4819 0.08718
## Portlock, Ryan      77715      M       1 0.5261 0.09079
## Pham, Scott         81871      M       1 0.5622 0.09670
## 
## ... for the last 4 rows of sorted data ...
## 
##                 Salary Gender predict fitted std.err
## Capelle, Adam   108138      M       1 0.7632  0.1355
## Hoang, Binh     111075      M       1 0.7813  0.1364
## James, Leslie   122563      F       1 0.8424  0.1318
## Correll, Trevon 134419      M       1 0.8901  0.1174
## --------------------------------------------------------------------
## 
## 
## Confusion Matrix for Gender 
## 
## Probability threshold for predicting M: 0.5
## 
##                Baseline         Predicted 
## ---------------------------------------------------
##               Total  %Tot        0      1  %Correct 
## ---------------------------------------------------
##          0       19  51.4       16      3     84.2 
## Gender   1       18  48.6        8     10     55.6 
## ---------------------------------------------------
##        Total     37                           70.3 
## 
## Accuracy: 70.27 
## Recall: 55.56 
## Precision: 76.92

Specify multiple logistic regression with the usual R formula syntax. Specify additional probability thresholds beyond just the default 0.5 with the prob_cut parameter.

Logit(Gender ~ Years + Salary, prob_cut=c(.3, .5 .7))