First read the Employee data included as part of lessR.
##
## >>> Suggestions
## Details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 Years integer 36 1 16 7 NA 15 ... 1 2 10
## 2 Gender character 37 0 2 M M M ... F F M
## 3 Dept character 36 1 5 ADMN SALE SALE ... MKTG SALE FINC
## 4 Salary double 37 0 37 53788.26 94494.58 ... 56508.32 57562.36
## 5 JobSat character 35 2 3 med low low ... high low high
## 6 Plan integer 37 0 3 1 1 3 ... 2 2 1
## 7 Pre integer 37 0 27 82 62 96 ... 83 59 80
## 8 Post integer 37 0 22 92 74 97 ... 90 71 87
## ------------------------------------------------------------------------------------------
Obtain the summary statistics and 95% confidence interval for a single variable by specifying that variable with ttest()
.
##
##
## ------ Description ------
##
## Salary: n.miss = 0, n = 37, mean = 73795.557, sd = 21799.533
##
##
## ------ Normality Assumption ------
##
## Sample mean assumed normal because n>30, so no test needed.
##
##
## ------ Inference ------
##
## t-cutoff for 95% range of variation: tcut = 2.028
## Standard Error of Mean: SE = 3583.821
##
## Margin of Error for 95% Confidence Level: 7268.326
## 95% Confidence Interval for Mean: 66527.230 to 81063.883
Add a hypothesis test to the above.
##
##
## ------ Description ------
##
## Salary: n.miss = 0, n = 37, mean = 73795.557, sd = 21799.533
##
##
## ------ Normality Assumption ------
##
## Sample mean assumed normal because n>30, so no test needed.
##
##
## ------ Inference ------
##
## t-cutoff for 95% range of variation: tcut = 2.028
## Standard Error of Mean: SE = 3583.821
##
## Hypothesized Value H0: mu = 52000
## Hypothesis Test of Mean: t-value = 6.082, df = 36, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 7268.326
## 95% Confidence Interval for Mean: 66527.230 to 81063.883
##
##
## ------ Effect Size ------
##
## Distance of sample mean from hypothesized: 21795.557
## Standardized Distance, Cohen's d: 1.000
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for 12035.673
## --------------------------------------------------
Analysis of the above from summary statistics only.
##
##
## ------ Description ------
##
## Salary: n = 37, mean = 73795.56, sd = 21799.53
##
##
## ------ Inference ------
##
## t-cutoff for 95% range of variation: tcut = 2.028
## Standard Error of Mean: SE = 3583.821
##
## Hypothesized Value H0: mu = 52000
## Hypothesis Test of Mean: t-value = 6.082, df = 36, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 7268.326
## 95% Confidence Interval for Mean: 66527.231 to 81063.883
##
##
## ------ Effect Size ------
##
## Distance of sample mean from hypothesized: 21795.557
## Standardized Distance, Cohen's d: 1.000
Full analysis with ttest()
function, abbreviated as tt()
, with formula mode.
##
## Compare Salary across Gender levels M and F
##
## ------ Describe ------
##
## Salary for Gender M: n.miss = 0, n = 18, mean = 81147.458, sd = 23128.436
## Salary for Gender F: n.miss = 0, n = 19, mean = 66830.598, sd = 18438.456
##
## Mean Difference of Salary: 14316.860
##
## Weighted Average Standard Deviation: 20848.636
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of Salary.
## Group M Shapiro-Wilk normality test: W = 0.962, p-value = 0.647
## Group F Shapiro-Wilk normality test: W = 0.828, p-value = 0.003
##
## Null hypothesis is equal variances of Salary, i.e., homogeneous.
## Variance Ratio test: F = 534924536.348/339976675.129 = 1.573, df = 17;18, p-value = 0.349
## Levene's test, Brown-Forsythe: t = 1.302, df = 35, p-value = 0.201
##
##
## ------ Infer ------
##
## --- Assume equal population variances of Salary for each Gender
##
## t-cutoff for 95% range of variation: tcut = 2.030
## Standard Error of Mean Difference: SE = 6857.494
##
## Hypothesis Test of 0 Mean Diff: t = 2.088, df = 35, p-value = 0.044
##
## Margin of Error for 95% Confidence Level: 13921.454
## 95% Confidence Interval for Mean Difference: 395.406 to 28238.314
##
##
## --- Do not assume equal population variances of Salary for each Gender
##
## t-cutoff: tcut = 2.036
## Standard Error of Mean Difference: SE = 6900.112
##
## Hypothesis Test of 0 Mean Diff: t = 2.075, df = 32.505, p-value = 0.046
##
## Margin of Error for 95% Confidence Level: 14046.505
## 95% Confidence Interval for Mean Difference: 270.355 to 28363.365
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of Salary for each Gender
##
## Standardized Mean Difference of Salary, Cohen's d: 0.687
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for Gender M: 14777.329
## Density bandwidth for Gender F: 11630.959
Brief version of the output contains just the basics.
##
## Compare Salary across Gender levels M and F
##
## --- Describe ---
##
## Salary for Gender M: n.miss = 0, n = 18, mean = 81147.458, sd = 23128.436
## Salary for Gender F: n.miss = 0, n = 19, mean = 66830.598, sd = 18438.456
##
## Mean Difference of Salary: 14316.860
## Weighted Average Standard Deviation: 20848.636
## Standardized Mean Difference of Salary: 0.687
##
## --- Infer ---
##
## t-cutoff for 95% range of variation: tcut = 2.030
## Standard Error of Mean Difference: SE = 6857.494
##
## Hypothesis Test of 0 Mean Diff: t = 2.088, df = 35, p-value = 0.044
##
## Margin of Error for 95% Confidence Level: 13921.454
## 95% Confidence Interval for Mean Difference: 395.406 to 28238.314
##
## Compare Y across X levels Group2 and Group1
##
## --- Describe ---
##
## Y for X Group2: n.miss = 0, n = 37, mean = 81.000, sd = 11.593
## Y for X Group1: n.miss = 0, n = 37, mean = 78.784, sd = 12.037
##
## Mean Difference of Y: 2.216
## Weighted Average Standard Deviation: 11.817
## Standardized Mean Difference of Y: 0.188
##
## --- Infer ---
##
## t-cutoff for 95% range of variation: tcut = 1.993
## Standard Error of Mean Difference: SE = 2.747
##
## Hypothesis Test of 0 Mean Diff: t = 0.807, df = 72, p-value = 0.423
##
## Margin of Error for 95% Confidence Level: 5.477
## 95% Confidence Interval for Mean Difference: -3.261 to 7.693
Analysis of variance applies to the inferential analysis of means across groups. The lessR function ANOVA()
, abbreviated av()
, provides this analysis, based on the base R function aov()
.
The data for these examples is the warpbreaks data set included with the R datasets package. The data are from a weaving device called a loom for a fixed length of yarn. The response variable is the number of times the yarn broke during the weaving. Independent variables are the type of wool – A or B –and the level of tension – L, M, or H.
Because warpbreaks is not the default data frame, specify with the data
parameter (or set d equal to warpbreaks).
First, for illustrative purposes, ignore the type of wool and only examine the impact of tension on breaks.
The output includes descriptive statistics, ANOVA table, effect size indices, Tukey’s multiple comparisons of means, and residuals, as well as the scatterplot of the response variable with the levels of the independent variable, and a visualization of the mean comparisons.
## BACKGROUND
##
## Response Variable: breaks
##
## Factor Variable: tension
## Levels: L M H
##
## Number of cases (rows) of data: 54
## Number of cases retained for analysis: 54
##
##
## DESCRIPTIVE STATISTICS
##
## n mean sd min max
## L 18 36.39 16.45 14.00 70.00
## M 18 26.39 9.12 12.00 42.00
## H 18 21.67 8.35 10.00 43.00
##
## Grand Mean: 28.148
##
##
## BASIC ANALYSIS
##
## df Sum Sq Mean Sq F-value p-value
## tension 2 2034.26 1017.13 7.21 0.0018
## Residuals 51 7198.56 141.15
##
##
## R Squared: 0.22
## R Sq Adjusted: 0.19
## Omega Squared: 0.19
##
## Cohen's f: 0.48
##
##
## TUKEY MULTIPLE COMPARISONS OF MEANS
##
## Family-wise Confidence Level:
## -------------------------------
## diff lwr upr p adj
## M-L -10.00 -19.56 -0.44 0.04
## H-L -14.72 -24.28 -5.16 0.00
## H-M -4.72 -14.28 4.84 0.46
##
##
## RESIDUALS
##
## Fitted Values, Residuals, Standardized Residuals
## [sorted by Standardized Residuals, ignoring + or - sign]
## [res_rows = 20, out of 54 cases (rows) of data, or res_rows="all"]
## -------------------------------------------
## tension breaks fitted residual z-resid
## 5 L 70.00 36.39 33.61 2.91
## 9 L 67.00 36.39 30.61 2.65
## 29 L 14.00 36.39 -22.39 -1.94
## 24 H 43.00 21.67 21.33 1.85
## 3 L 54.00 36.39 17.61 1.53
## 31 L 19.00 36.39 -17.39 -1.51
## 35 L 20.00 36.39 -16.39 -1.42
## 37 M 42.00 26.39 15.61 1.35
## 6 L 52.00 36.39 15.61 1.35
## 7 L 51.00 36.39 14.61 1.27
## 14 M 12.00 26.39 -14.39 -1.25
## 19 H 36.00 21.67 14.33 1.24
## 41 M 39.00 26.39 12.61 1.09
## 44 M 39.00 26.39 12.61 1.09
## 23 H 10.00 21.67 -11.67 -1.01
## 4 L 25.00 36.39 -11.39 -0.99
## 8 L 26.00 36.39 -10.39 -0.90
## 40 M 16.00 26.39 -10.39 -0.90
## 1 L 26.00 36.39 -10.39 -0.90
## 18 M 36.00 26.39 9.61 0.83
##
##
## ----------------------------------------
## Plot 1: Scatterplot with Cell Means
## Plot 2: 95% family-wise confidence level
## ----------------------------------------
The brief version forgoes the multiple comparisons and the residuals.
## BACKGROUND
##
## Response Variable: breaks
##
## Factor Variable: tension
## Levels: L M H
##
## Number of cases (rows) of data: 54
## Number of cases retained for analysis: 54
##
##
## DESCRIPTIVE STATISTICS
##
## n mean sd min max
## L 18 36.39 16.45 14.00 70.00
## M 18 26.39 9.12 12.00 42.00
## H 18 21.67 8.35 10.00 43.00
##
## Grand Mean: 28.148
##
##
## BASIC ANALYSIS
##
## df Sum Sq Mean Sq F-value p-value
## tension 2 2034.26 1017.13 7.21 0.0018
## Residuals 51 7198.56 141.15
##
##
## R Squared: 0.22
## R Sq Adjusted: 0.19
## Omega Squared: 0.19
##
## Cohen's f: 0.48
##
##
## TUKEY MULTIPLE COMPARISONS OF MEANS
##
## RESIDUALS
Specify the second independent variable preceded by a *
sign.
## BACKGROUND
##
## Response Variable: breaks
##
## Factor Variable 1: tension
## Levels: L M H
##
## Factor Variable 2: wool
## Levels: A B
##
## Number of cases (rows) of data: 54
## Number of cases retained for analysis: 54
##
## The design is balanced
##
## Two-way Between Groups ANOVA
##
##
## DESCRIPTIVE STATISTICS
##
## Cell Sample Size: 9
##
##
## tension
## wool L M H
## A 44.56 24.00 24.56
## B 28.22 28.78 18.78
##
##
## tension
## ---------------------
## L M H
## 1 36.39 26.39 21.67
##
## wool
## ---------------
## A B
## 1 31.04 25.26
##
##
## 28.148
##
##
## tension
## wool L M H
## A 18.10 8.66 10.27
## B 9.86 9.43 4.89
##
##
## BASIC ANALYSIS
##
## df Sum Sq Mean Sq F-value p-value
## tension 2 2034.26 1017.13 8.50 0.0007
## wool 1 450.67 450.67 3.77 0.0582
## tension:wool 2 1002.78 501.39 4.19 0.0210
## Residuals 48 5745.11 119.69
##
##
## Partial Omega Squared for tension: 0.22
## Partial Omega Squared for wool: 0.05
## Partial Omega Squared for tension & wool: 0.11
##
## Cohen's f for tension: 0.53
## Cohen's f for wool: 0.23
## Cohen's f for tension_&_wool: 0.34
##
##
## TUKEY MULTIPLE COMPARISONS OF MEANS
##
## Family-wise Confidence Level:
##
## Factor: tension
## -------------------------------
## diff lwr upr p adj
## M-L -10.00 -18.82 -1.18 0.02
## H-L -14.72 -23.54 -5.90 0.00
## H-M -4.72 -13.54 4.10 0.40
##
## Factor: wool
## -----------------------------
## diff lwr upr p adj
## B-A -5.78 -11.76 0.21 0.06
##
## Cell Means
## ------------------------------------
## diff lwr upr p adj
## M:A-L:A -20.56 -35.86 -5.25 0.00
## H:A-L:A -20.00 -35.31 -4.69 0.00
## L:B-L:A -16.33 -31.64 -1.03 0.03
## M:B-L:A -15.78 -31.08 -0.47 0.04
## H:B-L:A -25.78 -41.08 -10.47 0.00
## H:A-M:A 0.56 -14.75 15.86 1.00
## L:B-M:A 4.22 -11.08 19.53 0.96
## M:B-M:A 4.78 -10.53 20.08 0.94
## H:B-M:A -5.22 -20.53 10.08 0.91
## L:B-H:A 3.67 -11.64 18.97 0.98
## M:B-H:A 4.22 -11.08 19.53 0.96
## H:B-H:A -5.78 -21.08 9.53 0.87
## M:B-L:B 0.56 -14.75 15.86 1.00
## H:B-L:B -9.44 -24.75 5.86 0.46
## H:B-M:B -10.00 -25.31 5.31 0.39
##
##
## RESIDUALS
##
## Fitted Values, Residuals, Standardized Residuals
## [sorted by Standardized Residuals, ignoring + or - sign]
## [res_rows = 20, out of 54 cases (rows) of data, or res_rows="all"]
## ------------------------------------------------
## tension wool breaks fitted residual z-resid
## 5 L A 70.00 44.56 25.44 2.47
## 9 L A 67.00 44.56 22.44 2.18
## 4 L A 25.00 44.56 -19.56 -1.90
## 8 L A 26.00 44.56 -18.56 -1.80
## 1 L A 26.00 44.56 -18.56 -1.80
## 24 H A 43.00 24.56 18.44 1.79
## 36 L B 44.00 28.22 15.78 1.53
## 23 H A 10.00 24.56 -14.56 -1.41
## 2 L A 30.00 44.56 -14.56 -1.41
## 29 L B 14.00 28.22 -14.22 -1.38
## 37 M B 42.00 28.78 13.22 1.28
## 34 L B 41.00 28.22 12.78 1.24
## 40 M B 16.00 28.78 -12.78 -1.24
## 14 M A 12.00 24.00 -12.00 -1.16
## 18 M A 36.00 24.00 12.00 1.16
## 19 H A 36.00 24.56 11.44 1.11
## 16 M A 35.00 24.00 11.00 1.07
## 41 M B 39.00 28.78 10.22 0.99
## 44 M B 39.00 28.78 10.22 0.99
## 39 M B 19.00 28.78 -9.78 -0.95
d <- read.csv(header=TRUE, text="
Person,sup1,sup2,sup3,sup4
p1,2,4,4,3
p2,2,5,4,6
p3,8,6,7,9
p4,4,3,5,7
p5,2,1,2,3
p6,5,5,6,8
p7,2,3,2,4")
Reshape data from wide form to long form with base R reshape()
according to the following parameters. (R refers to time, which is only one specific application.)
idvar
: Identify the blocking variable in the wide form datavarying
: Identify the variables in wide format gathered into a single variable in long formatv.names
: Name the response variable in the long formtimevar
: Name the corresponding long form variabletimes
: Name the values of the corresponding long form variable, otherwise numbered consecutivelyDo not need the row names.
d <- reshape(d, direction="long",
idvar="Person", v.names="Reps",
varying=list(2:5), timevar="Supplement", times=names(d)[2:5])
row.names(d) <- NULL
## Person Supplement Reps
## 1 p1 sup1 2
## 2 p2 sup1 2
## 3 p3 sup1 8
## 4 p4 sup1 4
## 5 p5 sup1 2
## 6 p6 sup1 5
## 7 p7 sup1 2
## 8 p1 sup2 4
## 9 p2 sup2 5
## 10 p3 sup2 6
Specify the blocking variable preceded by a +
sign.
##
## >>> Note: Converting Supplement to a factor for this analysis only.
##
## >>> Note: Converting Person to a factor for this analysis only.
## BACKGROUND
##
## Response Variable: Reps
##
## Factor Variable 1: Supplement
## Levels: sup1 sup2 sup3 sup4
##
## Factor Variable 2: Person
## Levels: p1 p2 p3 p4 p5 p6 p7
##
## Number of cases (rows) of data: 28
## Number of cases retained for analysis: 28
##
## The design is balanced
##
## Randomized Blocks ANOVA
## Factor of Interest: Supplement
## Blocking Factor: Person
##
## Note: For the resulting F statistic for Supplement to be distributed as F,
## the population covariances of Reps must be spherical.
##
##
## DESCRIPTIVE STATISTICS
##
## Supplement
## -----------------------
## sup1 sup2 sup3 sup4
## 1 3.57 3.86 4.29 5.71
##
## Person
## --------------------------------------
## p1 p2 p3 p4 p5 p6 p7
## 1 3.25 4.25 7.50 4.75 2.00 6.00 2.75
##
##
## 4.357
##
##
## BASIC ANALYSIS
##
## df Sum Sq Mean Sq F-value p-value
## Supplement 3 19.00 6.33 6.71 0.0031
## Person 6 88.43 14.74 15.61 0.0000
## Residuals 18 17.00 0.94
##
##
## Partial Omega Squared for Supplement: 0.38
## Partial Intraclass Correlation for Person: 0.79
##
## Cohen's f for Supplement: 0.78
## Cohen's f for Person: 1.91
##
##
## TUKEY MULTIPLE COMPARISONS OF MEANS
##
## Family-wise Confidence Level:
##
## Factor: Supplement
## ---------------------------------
## diff lwr upr p adj
## sup2-sup1 0.29 -1.18 1.75 0.95
## sup3-sup1 0.71 -0.75 2.18 0.53
## sup4-sup1 2.14 0.67 3.61 0.00
## sup3-sup2 0.43 -1.04 1.90 0.84
## sup4-sup2 1.86 0.39 3.33 0.01
## sup4-sup3 1.43 -0.04 2.90 0.06
##
##
## RESIDUALS
##
## Fitted Values, Residuals, Standardized Residuals
## [sorted by Standardized Residuals, ignoring + or - sign]
## [res_rows = 20, out of 28 cases (rows) of data, or res_rows="all"]
## ---------------------------------------------------
## Supplement Person Reps fitted residual z-resid
## 22 sup4 p1 3 4.61 -1.61 -2.06
## 2 sup1 p2 2 3.46 -1.46 -1.88
## 3 sup1 p3 8 6.71 1.29 1.65
## 9 sup2 p2 5 3.75 1.25 1.60
## 8 sup2 p1 4 2.75 1.25 1.60
## 11 sup2 p4 3 4.25 -1.25 -1.60
## 10 sup2 p3 6 7.00 -1.00 -1.28
## 25 sup4 p4 7 6.11 0.89 1.15
## 15 sup3 p1 4 3.18 0.82 1.05
## 5 sup1 p5 2 1.21 0.79 1.01
## 14 sup2 p7 3 2.25 0.75 0.96
## 21 sup3 p7 2 2.68 -0.68 -0.87
## 27 sup4 p6 8 7.36 0.64 0.83
## 13 sup2 p6 5 5.50 -0.50 -0.64
## 12 sup2 p5 1 1.50 -0.50 -0.64
## 1 sup1 p1 2 2.46 -0.46 -0.60
## 17 sup3 p3 7 7.43 -0.43 -0.55
## 23 sup4 p2 6 5.61 0.39 0.50
## 26 sup4 p5 3 3.36 -0.36 -0.46
## 18 sup3 p4 5 4.68 0.32 0.41
##
##
## ------------------------
## Plot 1: Interaction Plot
## Plot 2: Fitted Values
## ------------------------
Use the base R help()
function to view the full manual for ttest()
or ANOVA()
. Simply enter a question mark followed by the name of the function.
?ttest
?ANOVA