Mean Inference

David Gerbing

library("lessR")

First read the Employee data included as part of lessR.

d <- Read("Employee")
## 
## >>> Suggestions
## Details about your data, Enter:  details()  for d, or  details(name)
## 
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
## 
##     Variable                  Missing  Unique 
##         Name     Type  Values  Values  Values   First and last values
## ------------------------------------------------------------------------------------------
##  1     Years   integer     36       1      16   7  NA  15 ... 1  2  10
##  2    Gender character     37       0       2   M  M  M ... F  F  M
##  3      Dept character     36       1       5   ADMN  SALE  SALE ... MKTG  SALE  FINC
##  4    Salary    double     37       0      37   53788.26  94494.58 ... 56508.32  57562.36
##  5    JobSat character     35       2       3   med  low  low ... high  low  high
##  6      Plan   integer     37       0       3   1  1  3 ... 2  2  1
##  7       Pre   integer     37       0      27   82  62  96 ... 83  59  80
##  8      Post   integer     37       0      22   92  74  97 ... 90  71  87
## ------------------------------------------------------------------------------------------

One-Sample t-test

Obtain the summary statistics and 95% confidence interval for a single variable by specifying that variable with ttest().

ttest(Salary)
## 
## 
## ------ Description ------
## 
## Salary:  n.miss = 0,  n = 37,   mean = 73795.557,  sd = 21799.533
## 
## 
## ------ Normality Assumption ------
## 
## Sample mean assumed normal because n>30, so no test needed.
## 
## 
## ------ Inference ------
## 
## t-cutoff for 95% range of variation: tcut =  2.028 
## Standard Error of Mean: SE =  3583.821 
## 
## Margin of Error for 95% Confidence Level:  7268.326
## 95% Confidence Interval for Mean:  66527.230 to 81063.883

Add a hypothesis test to the above.

ttest(Salary, mu=52000)
## 
## 
## ------ Description ------
## 
## Salary:  n.miss = 0,  n = 37,   mean = 73795.557,  sd = 21799.533
## 
## 
## ------ Normality Assumption ------
## 
## Sample mean assumed normal because n>30, so no test needed.
## 
## 
## ------ Inference ------
## 
## t-cutoff for 95% range of variation: tcut =  2.028 
## Standard Error of Mean: SE =  3583.821 
## 
## Hypothesized Value H0: mu = 52000 
## Hypothesis Test of Mean:  t-value = 6.082,  df = 36,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  7268.326
## 95% Confidence Interval for Mean:  66527.230 to 81063.883
## 
## 
## ------ Effect Size ------
## 
## Distance of sample mean from hypothesized:  21795.557
## Standardized Distance, Cohen's d:  1.000
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for 12035.673
## --------------------------------------------------

Analysis of the above from summary statistics only.

ttest(n=37, m=73795.557, s=21799.533, Ynm="Salary", mu=52000)
## 
## 
## ------ Description ------
## 
## Salary: n = 37,   mean = 73795.56,  sd = 21799.53
## 
## 
## ------ Inference ------
## 
## t-cutoff for 95% range of variation: tcut =  2.028 
## Standard Error of Mean: SE =  3583.821 
## 
## Hypothesized Value H0: mu = 52000 
## Hypothesis Test of Mean:  t-value = 6.082,  df = 36,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  7268.326
## 95% Confidence Interval for Mean:  66527.231 to 81063.883
## 
## 
## ------ Effect Size ------
## 
## Distance of sample mean from hypothesized:  21795.557
## Standardized Distance, Cohen's d:  1.000

Two-Samples t-test

Independent groups

Full analysis with ttest() function, abbreviated as tt(), with formula mode.

ttest(Salary ~ Gender)
## 
## Compare Salary across Gender levels M and F 
## 
## ------ Describe ------
## 
## Salary for Gender M:  n.miss = 0,  n = 18,  mean = 81147.458,  sd = 23128.436
## Salary for Gender F:  n.miss = 0,  n = 19,  mean = 66830.598,  sd = 18438.456
## 
## Mean Difference of Salary:  14316.860
## 
## Weighted Average Standard Deviation:   20848.636 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of Salary.
## Group M  Shapiro-Wilk normality test:  W = 0.962,  p-value = 0.647
## Group F  Shapiro-Wilk normality test:  W = 0.828,  p-value = 0.003
## 
## Null hypothesis is equal variances of Salary, i.e., homogeneous.
## Variance Ratio test:  F = 534924536.348/339976675.129 = 1.573,  df = 17;18,  p-value = 0.349
## Levene's test, Brown-Forsythe:  t = 1.302,  df = 35,  p-value = 0.201
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of Salary for each Gender 
## 
## t-cutoff for 95% range of variation: tcut =  2.030 
## Standard Error of Mean Difference: SE =  6857.494 
## 
## Hypothesis Test of 0 Mean Diff:  t = 2.088,  df = 35,  p-value = 0.044
## 
## Margin of Error for 95% Confidence Level:  13921.454
## 95% Confidence Interval for Mean Difference:  395.406 to 28238.314
## 
## 
## --- Do not assume equal population variances of Salary for each Gender 
## 
## t-cutoff: tcut =  2.036 
## Standard Error of Mean Difference: SE =  6900.112 
## 
## Hypothesis Test of 0 Mean Diff:  t = 2.075,  df = 32.505, p-value = 0.046
## 
## Margin of Error for 95% Confidence Level:  14046.505
## 95% Confidence Interval for Mean Difference:  270.355 to 28363.365
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of Salary for each Gender 
## 
## Standardized Mean Difference of Salary, Cohen's d:  0.687
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for Gender M: 14777.329
## Density bandwidth for Gender F: 11630.959

Brief version of the output contains just the basics.

tt_brief(Salary ~ Gender)
## 
## Compare Salary across Gender levels M and F 
## 
##  --- Describe ---
## 
## Salary for Gender M:  n.miss = 0,  n = 18,  mean = 81147.458,  sd = 23128.436
## Salary for Gender F:  n.miss = 0,  n = 19,  mean = 66830.598,  sd = 18438.456
## 
## Mean Difference of Salary:  14316.860
## Weighted Average Standard Deviation:   20848.636 
## Standardized Mean Difference of Salary: 0.687
## 
##  --- Infer ---
## 
## t-cutoff for 95% range of variation: tcut =  2.030 
## Standard Error of Mean Difference: SE =  6857.494 
## 
## Hypothesis Test of 0 Mean Diff:  t = 2.088,  df = 35,  p-value = 0.044
## 
## Margin of Error for 95% Confidence Level:  13921.454
## 95% Confidence Interval for Mean Difference:  395.406 to 28238.314

Dependent groups

tt_brief(Pre, Post)
## 
## Compare Y across X levels Group2 and Group1 
## 
##  --- Describe ---
## 
## Y for X Group2:  n.miss = 0,  n = 37,  mean = 81.000,  sd = 11.593
## Y for X Group1:  n.miss = 0,  n = 37,  mean = 78.784,  sd = 12.037
## 
## Mean Difference of Y:  2.216
## Weighted Average Standard Deviation:   11.817 
## Standardized Mean Difference of Y: 0.188
## 
##  --- Infer ---
## 
## t-cutoff for 95% range of variation: tcut =  1.993 
## Standard Error of Mean Difference: SE =  2.747 
## 
## Hypothesis Test of 0 Mean Diff:  t = 0.807,  df = 72,  p-value = 0.423
## 
## Margin of Error for 95% Confidence Level:  5.477
## 95% Confidence Interval for Mean Difference:  -3.261 to 7.693

ANOVA

Analysis of variance applies to the inferential analysis of means across groups. The lessR function ANOVA(), abbreviated av(), provides this analysis, based on the base R function aov().

The data for these examples is the warpbreaks data set included with the R datasets package. The data are from a weaving device called a loom for a fixed length of yarn. The response variable is the number of times the yarn broke during the weaving. Independent variables are the type of wool – A or B –and the level of tension – L, M, or H.

Because warpbreaks is not the default data frame, specify with the data parameter (or set d equal to warpbreaks).

One-way independent groups

First, for illustrative purposes, ignore the type of wool and only examine the impact of tension on breaks.

The output includes descriptive statistics, ANOVA table, effect size indices, Tukey’s multiple comparisons of means, and residuals, as well as the scatterplot of the response variable with the levels of the independent variable, and a visualization of the mean comparisons.

ANOVA(breaks ~ tension, data=warpbreaks)

##   BACKGROUND
## 
## Response Variable: breaks 
##  
## Factor Variable: tension 
##   Levels: L M H 
##  
## Number of cases (rows) of data:  54 
## Number of cases retained for analysis:  54 
## 
## 
##   DESCRIPTIVE STATISTICS 
## 
##     n    mean      sd     min     max 
## L  18   36.39   16.45   14.00   70.00 
## M  18   26.39    9.12   12.00   42.00 
## H  18   21.67    8.35   10.00   43.00 
##  
## Grand Mean: 28.148 
## 
## 
##   BASIC ANALYSIS
## 
##              df    Sum Sq   Mean Sq   F-value   p-value 
## tension       2   2034.26   1017.13      7.21    0.0018 
## Residuals    51   7198.56    141.15 
## 
## 
## R Squared: 0.22 
## R Sq Adjusted: 0.19 
## Omega Squared: 0.19 
##  
## Cohen's f: 0.48 
## 
## 
##   TUKEY MULTIPLE COMPARISONS OF MEANS
## 
## Family-wise Confidence Level:  
## ------------------------------- 
##         diff    lwr   upr p adj 
##   M-L -10.00 -19.56 -0.44  0.04 
##   H-L -14.72 -24.28 -5.16  0.00 
##   H-M  -4.72 -14.28  4.84  0.46 
## 
## 
##   RESIDUALS
## 
## Fitted Values, Residuals, Standardized Residuals 
##    [sorted by Standardized Residuals, ignoring + or - sign] 
##    [res_rows = 20, out of 54 cases (rows) of data, or res_rows="all"] 
## ------------------------------------------- 
##      tension breaks fitted residual z-resid 
##    5       L  70.00  36.39    33.61    2.91 
##    9       L  67.00  36.39    30.61    2.65 
##   29       L  14.00  36.39   -22.39   -1.94 
##   24       H  43.00  21.67    21.33    1.85 
##    3       L  54.00  36.39    17.61    1.53 
##   31       L  19.00  36.39   -17.39   -1.51 
##   35       L  20.00  36.39   -16.39   -1.42 
##   37       M  42.00  26.39    15.61    1.35 
##    6       L  52.00  36.39    15.61    1.35 
##    7       L  51.00  36.39    14.61    1.27 
##   14       M  12.00  26.39   -14.39   -1.25 
##   19       H  36.00  21.67    14.33    1.24 
##   41       M  39.00  26.39    12.61    1.09 
##   44       M  39.00  26.39    12.61    1.09 
##   23       H  10.00  21.67   -11.67   -1.01 
##    4       L  25.00  36.39   -11.39   -0.99 
##    8       L  26.00  36.39   -10.39   -0.90 
##   40       M  16.00  26.39   -10.39   -0.90 
##    1       L  26.00  36.39   -10.39   -0.90 
##   18       M  36.00  26.39     9.61    0.83 
## 
## 
## ---------------------------------------- 
## Plot 1: Scatterplot with Cell Means 
## Plot 2: 95% family-wise confidence level 
## ----------------------------------------

The brief version forgoes the multiple comparisons and the residuals.

av_brief(breaks ~ tension, data=warpbreaks)

##   BACKGROUND
## 
## Response Variable: breaks 
##  
## Factor Variable: tension 
##   Levels: L M H 
##  
## Number of cases (rows) of data:  54 
## Number of cases retained for analysis:  54 
## 
## 
##   DESCRIPTIVE STATISTICS 
## 
##     n    mean      sd     min     max 
## L  18   36.39   16.45   14.00   70.00 
## M  18   26.39    9.12   12.00   42.00 
## H  18   21.67    8.35   10.00   43.00 
##  
## Grand Mean: 28.148 
## 
## 
##   BASIC ANALYSIS
## 
##              df    Sum Sq   Mean Sq   F-value   p-value 
## tension       2   2034.26   1017.13      7.21    0.0018 
## Residuals    51   7198.56    141.15 
## 
## 
## R Squared: 0.22 
## R Sq Adjusted: 0.19 
## Omega Squared: 0.19 
##  
## Cohen's f: 0.48 
## 
## 
##   TUKEY MULTIPLE COMPARISONS OF MEANS
## 
##   RESIDUALS

Two-way independent groups

Specify the second independent variable preceded by a * sign.

ANOVA(breaks ~ tension * wool, data=warpbreaks)

##   BACKGROUND
## 
## Response Variable: breaks 
##  
## Factor Variable 1: tension 
##   Levels: L M H 
##  
## Factor Variable 2: wool 
##   Levels: A B 
##  
## Number of cases (rows) of data:  54 
## Number of cases retained for analysis:  54 
##  
## The design is balanced 
##  
## Two-way Between Groups ANOVA 
## 
## 
##   DESCRIPTIVE STATISTICS 
## 
## Cell Sample Size: 9 
## 
## 
##       tension 
##  wool     L     M     H 
##     A 44.56 24.00 24.56 
##     B 28.22 28.78 18.78 
## 
## 
## tension 
## --------------------- 
##         L     M     H 
##   1 36.39 26.39 21.67 
##  
## wool 
## --------------- 
##         A     B 
##   1 31.04 25.26 
## 
## 
## 28.148 
## 
## 
##       tension 
##  wool     L    M     H 
##     A 18.10 8.66 10.27 
##     B  9.86 9.43  4.89 
## 
## 
##   BASIC ANALYSIS
## 
##              df    Sum Sq   Mean Sq   F-value   p-value 
##      tension  2   2034.26   1017.13      8.50    0.0007 
##         wool  1    450.67    450.67      3.77    0.0582 
## tension:wool  2   1002.78    501.39      4.19    0.0210 
##    Residuals 48   5745.11    119.69 
## 
## 
## Partial Omega Squared for tension: 0.22 
## Partial Omega Squared for wool: 0.05 
## Partial Omega Squared for tension & wool: 0.11 
##  
## Cohen's f for tension: 0.53 
## Cohen's f for wool: 0.23 
## Cohen's f for tension_&_wool: 0.34 
## 
## 
##   TUKEY MULTIPLE COMPARISONS OF MEANS
## 
## Family-wise Confidence Level:  
## 
## Factor: tension 
## ------------------------------- 
##         diff    lwr   upr p adj 
##   M-L -10.00 -18.82 -1.18  0.02 
##   H-L -14.72 -23.54 -5.90  0.00 
##   H-M  -4.72 -13.54  4.10  0.40 
## 
## Factor: wool 
## ----------------------------- 
##        diff    lwr  upr p adj 
##   B-A -5.78 -11.76 0.21  0.06 
## 
## Cell Means 
## ------------------------------------ 
##             diff    lwr    upr p adj 
##   M:A-L:A -20.56 -35.86  -5.25  0.00 
##   H:A-L:A -20.00 -35.31  -4.69  0.00 
##   L:B-L:A -16.33 -31.64  -1.03  0.03 
##   M:B-L:A -15.78 -31.08  -0.47  0.04 
##   H:B-L:A -25.78 -41.08 -10.47  0.00 
##   H:A-M:A   0.56 -14.75  15.86  1.00 
##   L:B-M:A   4.22 -11.08  19.53  0.96 
##   M:B-M:A   4.78 -10.53  20.08  0.94 
##   H:B-M:A  -5.22 -20.53  10.08  0.91 
##   L:B-H:A   3.67 -11.64  18.97  0.98 
##   M:B-H:A   4.22 -11.08  19.53  0.96 
##   H:B-H:A  -5.78 -21.08   9.53  0.87 
##   M:B-L:B   0.56 -14.75  15.86  1.00 
##   H:B-L:B  -9.44 -24.75   5.86  0.46 
##   H:B-M:B -10.00 -25.31   5.31  0.39 
## 
## 
##   RESIDUALS
## 
## Fitted Values, Residuals, Standardized Residuals 
##    [sorted by Standardized Residuals, ignoring + or - sign] 
##    [res_rows = 20, out of 54 cases (rows) of data, or res_rows="all"] 
## ------------------------------------------------ 
##      tension wool breaks fitted residual z-resid 
##    5       L    A  70.00  44.56    25.44    2.47 
##    9       L    A  67.00  44.56    22.44    2.18 
##    4       L    A  25.00  44.56   -19.56   -1.90 
##    8       L    A  26.00  44.56   -18.56   -1.80 
##    1       L    A  26.00  44.56   -18.56   -1.80 
##   24       H    A  43.00  24.56    18.44    1.79 
##   36       L    B  44.00  28.22    15.78    1.53 
##   23       H    A  10.00  24.56   -14.56   -1.41 
##    2       L    A  30.00  44.56   -14.56   -1.41 
##   29       L    B  14.00  28.22   -14.22   -1.38 
##   37       M    B  42.00  28.78    13.22    1.28 
##   34       L    B  41.00  28.22    12.78    1.24 
##   40       M    B  16.00  28.78   -12.78   -1.24 
##   14       M    A  12.00  24.00   -12.00   -1.16 
##   18       M    A  36.00  24.00    12.00    1.16 
##   19       H    A  36.00  24.56    11.44    1.11 
##   16       M    A  35.00  24.00    11.00    1.07 
##   41       M    B  39.00  28.78    10.22    0.99 
##   44       M    B  39.00  28.78    10.22    0.99 
##   39       M    B  19.00  28.78    -9.78   -0.95

Randomized block design

d <- read.csv(header=TRUE, text="
Person,sup1,sup2,sup3,sup4
p1,2,4,4,3
p2,2,5,4,6
p3,8,6,7,9
p4,4,3,5,7
p5,2,1,2,3
p6,5,5,6,8
p7,2,3,2,4")

Reshape data from wide form to long form with base R reshape() according to the following parameters. (R refers to time, which is only one specific application.)

Do not need the row names.

d <- reshape(d, direction="long",
        idvar="Person", v.names="Reps",
        varying=list(2:5), timevar="Supplement", times=names(d)[2:5])
row.names(d) <- NULL
d[1:10,]
##    Person Supplement Reps
## 1      p1       sup1    2
## 2      p2       sup1    2
## 3      p3       sup1    8
## 4      p4       sup1    4
## 5      p5       sup1    2
## 6      p6       sup1    5
## 7      p7       sup1    2
## 8      p1       sup2    4
## 9      p2       sup2    5
## 10     p3       sup2    6

Specify the blocking variable preceded by a + sign.

ANOVA(Reps ~ Supplement + Person)
## 
## >>> Note: Converting Supplement to a factor for this analysis only.
## 
## >>> Note: Converting Person to a factor for this analysis only.

##   BACKGROUND
## 
## Response Variable: Reps 
##  
## Factor Variable 1: Supplement 
##   Levels: sup1 sup2 sup3 sup4 
##  
## Factor Variable 2: Person 
##   Levels: p1 p2 p3 p4 p5 p6 p7 
##  
## Number of cases (rows) of data:  28 
## Number of cases retained for analysis:  28 
##  
## The design is balanced 
##  
## Randomized Blocks ANOVA 
##   Factor of Interest:  Supplement 
##   Blocking Factor:     Person 
##  
## Note: For the resulting F statistic for Supplement to be distributed as F,
##       the population covariances of Reps must be spherical. 
## 
## 
##   DESCRIPTIVE STATISTICS 
## 
## Supplement 
## ----------------------- 
##     sup1 sup2 sup3 sup4 
##   1 3.57 3.86 4.29 5.71 
##  
## Person 
## -------------------------------------- 
##       p1   p2   p3   p4   p5   p6   p7 
##   1 3.25 4.25 7.50 4.75 2.00 6.00 2.75 
## 
## 
## 4.357 
## 
## 
##   BASIC ANALYSIS
## 
##            df    Sum Sq   Mean Sq   F-value   p-value 
## Supplement  3     19.00      6.33      6.71    0.0031 
##     Person  6     88.43     14.74     15.61    0.0000 
##  Residuals 18     17.00      0.94 
## 
## 
## Partial Omega Squared for Supplement: 0.38 
## Partial Intraclass Correlation for Person: 0.79 
##  
## Cohen's f for Supplement: 0.78 
## Cohen's f for Person: 1.91 
## 
## 
##   TUKEY MULTIPLE COMPARISONS OF MEANS
## 
## Family-wise Confidence Level:  
## 
## Factor: Supplement 
## --------------------------------- 
##             diff   lwr  upr p adj 
##   sup2-sup1 0.29 -1.18 1.75  0.95 
##   sup3-sup1 0.71 -0.75 2.18  0.53 
##   sup4-sup1 2.14  0.67 3.61  0.00 
##   sup3-sup2 0.43 -1.04 1.90  0.84 
##   sup4-sup2 1.86  0.39 3.33  0.01 
##   sup4-sup3 1.43 -0.04 2.90  0.06 
## 
## 
##   RESIDUALS
## 
## Fitted Values, Residuals, Standardized Residuals 
##    [sorted by Standardized Residuals, ignoring + or - sign] 
##    [res_rows = 20, out of 28 cases (rows) of data, or res_rows="all"] 
## --------------------------------------------------- 
##      Supplement Person Reps fitted residual z-resid 
##   22       sup4     p1    3   4.61    -1.61   -2.06 
##    2       sup1     p2    2   3.46    -1.46   -1.88 
##    3       sup1     p3    8   6.71     1.29    1.65 
##    9       sup2     p2    5   3.75     1.25    1.60 
##    8       sup2     p1    4   2.75     1.25    1.60 
##   11       sup2     p4    3   4.25    -1.25   -1.60 
##   10       sup2     p3    6   7.00    -1.00   -1.28 
##   25       sup4     p4    7   6.11     0.89    1.15 
##   15       sup3     p1    4   3.18     0.82    1.05 
##    5       sup1     p5    2   1.21     0.79    1.01 
##   14       sup2     p7    3   2.25     0.75    0.96 
##   21       sup3     p7    2   2.68    -0.68   -0.87 
##   27       sup4     p6    8   7.36     0.64    0.83 
##   13       sup2     p6    5   5.50    -0.50   -0.64 
##   12       sup2     p5    1   1.50    -0.50   -0.64 
##    1       sup1     p1    2   2.46    -0.46   -0.60 
##   17       sup3     p3    7   7.43    -0.43   -0.55 
##   23       sup4     p2    6   5.61     0.39    0.50 
##   26       sup4     p5    3   3.36    -0.36   -0.46 
##   18       sup3     p4    5   4.68     0.32    0.41 
## 
## 
## ------------------------ 
## Plot 1: Interaction Plot 
## Plot 2: Fitted Values 
## ------------------------

Full Manual

Use the base R help() function to view the full manual for ttest() or ANOVA(). Simply enter a question mark followed by the name of the function.

?ttest
?ANOVA