To illustrate, first read the Employee data included as part of lessR.
##
## >>> Suggestions
## Details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 Years integer 36 1 16 7 NA 15 ... 1 2 10
## 2 Gender character 37 0 2 M M M ... F F M
## 3 Dept character 36 1 5 ADMN SALE SALE ... MKTG SALE FINC
## 4 Salary double 37 0 37 53788.26 94494.58 ... 56508.32 57562.36
## 5 JobSat character 35 2 3 med low low ... high low high
## 6 Plan integer 37 0 3 1 1 3 ... 2 2 1
## 7 Pre integer 37 0 27 82 62 96 ... 83 59 80
## 8 Post integer 37 0 22 92 74 97 ... 90 71 87
## ------------------------------------------------------------------------------------------
lessR provides many versions of a scatter plot with its Plot()
function.
The regular scatterplot.
## >>> Suggestions
## Plot(Years, Salary, fit="lm", fit_se=c(.90,.99)) # fit line, standard errors
## Plot(Years, Salary, out_cut=.10) # label top 10% potential outliers
## Plot(Years, Salary, enhance=TRUE) # many options
##
##
## >>> Pearson's product-moment correlation
##
## Number of paired values with neither missing, n = 36
##
##
## Sample Correlation of Years and Salary: r = 0.852
##
##
## Hypothesis Test of 0 Correlation: t = 9.501, df = 34, p-value = 0.000
## 95% Confidence Interval for Correlation: 0.727 to 0.923
The enhanced scatterplot with parameter enhance
.
## [Ellipse with Murdoch and Chow's function ellipse from the ellipse package]
## >>> Suggestions
## Plot(Years, Salary, fit="lm", fit_se=c(.90,.99)) # fit line, standard errors
## Plot(Years, Salary, out_cut=.10) # label top 10% potential outliers
##
##
## >>> Pearson's product-moment correlation
##
## Number of paired values with neither missing, n = 36
##
##
## Sample Correlation of Years and Salary: r = 0.852
##
##
## Hypothesis Test of 0 Correlation: t = 9.501, df = 34, p-value = 0.000
## 95% Confidence Interval for Correlation: 0.727 to 0.923
## >>> Outlier analysis with Mahalanobis Distance
##
## MD ID
## ----- -----
## 8.14 Correll, Trevon
## 7.84 Capelle, Adam
##
## 5.63 Korhalkar, Jessica
## 5.58 James, Leslie
## 3.75 Hoang, Binh
## ... ...
Map variable Pre to the points with the size
parameter, a bubble plot.
## >>> Suggestions
## Plot(Years, Salary, fit="lm", fit_se=c(.90,.99)) # fit line, standard errors
## Plot(Years, Salary, out_cut=.10) # label top 10% potential outliers
## Plot(Years, Salary, enhance=TRUE) # many options
##
##
## >>> Pearson's product-moment correlation
##
## Number of paired values with neither missing, n = 36
##
##
## Sample Correlation of Years and Salary: r = 0.852
##
##
## Hypothesis Test of 0 Correlation: t = 9.501, df = 34, p-value = 0.000
## 95% Confidence Interval for Correlation: 0.727 to 0.923
Plot against levels of categorical variable Gender with the by
parameter.
## >>> Suggestions
## Plot(Years, Salary, fit="lm", fit_se=c(.90,.99)) # fit line, standard errors
## Plot(Years, Salary, out_cut=.10) # label top 10% potential outliers
## Plot(Years, Salary, enhance=TRUE) # many options
##
##
## >>> Pearson's product-moment correlation
##
## Number of paired values with neither missing, n = 36
##
##
## Sample Correlation of Years and Salary: r = 0.852
##
##
## Hypothesis Test of 0 Correlation: t = 9.501, df = 34, p-value = 0.000
## 95% Confidence Interval for Correlation: 0.727 to 0.923
The categorical variable can also generate Trellis plots with the by
parameter.
## [Trellis graphics from Deepayan Sarkar's lattice package]
Two categorical variables result in a bubble plot of their joint frequencies.
## >>> Suggestions
## Plot(Dept, Gender, size_cut=FALSE)
## Plot(Dept, Gender, trans=.8, bg="off", grid="off")
## SummaryStats(Dept, Gender) # or ss
##
##
## Joint and Marginal Frequencies
## ------------------------------
##
## Dept
## Gender ACCT ADMN FINC MKTG SALE Sum
## F 3 4 1 5 5 18
## M 2 2 3 1 10 18
## Sum 5 6 4 6 15 36
##
##
## Cramer's V: 0.415
##
## Chi-square Test: Chisq = 6.200, df = 4, p-value = 0.185
## >>> Low cell expected frequencies, chi-squared approximation may not be accurate
The default plot for a single continuous variable includes not only the scatterplot, but also the violin plot and box plot, with outliers identified. Call this plot the VBS plot.
## [Violin/Box/Scatterplot graphics from Deepayan Sarkar's lattice package]
## >>> Suggestions
## Plot(Salary, out_cut=2, fences=TRUE, vbs_mean=TRUE) # Label two outliers ...
## Plot(Salary, box_adj=TRUE) # Adjust boxplot whiskers for asymmetry
##
## --- Salary ---
## Present: 37
## Missing: 0
## Total : 37
##
## Mean : 73795.557
## Stnd Dev : 21799.533
## IQR : 31012.560
## Skew : 0.190 [medcouple, -1 to 1]
##
## Minimum : 46124.970
## Lower Whisker: 46124.970
## 1st Quartile : 56772.950
## Median : 69547.600
## 3rd Quartile : 87785.510
## Upper Whisker: 122563.380
## Maximum : 134419.230
##
##
## (Box plot) Outliers: 1
##
## Small Large
## ----- -----
## Correll, Trevon 134419.23
##
##
## Number of duplicated values: 0
##
##
## Parameter values (can be manually set)
## -------------------------------------------------------
## size: 0.61 size of plotted points
## jitter_y: 0.45 random vertical movement of points
## jitter_x: 0.00 random horizontal movement of points
## bw: 9529.04 set bandwidth higher for smoother edges
For a single categorical variable, get the corresponding bubble plot of frequencies.
## >>> Suggestions
## Plot(Dept, color_low="lemonchiffon2", color_hi="maroon3")
## Plot(Dept, values="count") # scatter plot of counts
##
##
## --- Dept ---
##
##
## ACCT ADMN FINC MKTG SALE Total
## Frequencies: 5 6 4 6 15 36
## Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
##
##
## Chi-squared test of null hypothesis of equal probabilities
## Chisq = 10.944, df = 4, p-value = 0.027
The Cleveland dot plot, here for a single variable, has row names on the y-axis. The default plots sorts by the value plotted.
## >>> Suggestions
## Plot(Salary, y=row_names, sort_yx=FALSE, segments_y=FALSE)
##
##
##
## --- Salary ---
##
## n miss mean sd min mdn max
## 37 0 73795.6 21799.5 46125.0 69547.6 134419.2
##
##
## (Box plot) Outliers: 1
##
## Small Large
## ----- -----
## 134419.2
The standard scatterplot version of a Cleveland dot plot.
## >>> Suggestions
##
##
##
## --- Salary ---
##
## n miss mean sd min mdn max
## 37 0 73795.6 21799.5 46125.0 69547.6 134419.2
##
##
## (Box plot) Outliers: 1
##
## Small Large
## ----- -----
## 134419.2
This Cleveland dot plot has two x-variables, indicated as a standard R vector with the c()
function. In this situation the two points on each row are connected with a line segment. By default the rows are sorted by distance between the successive points.
## >>> Suggestions
## Plot(c(Pre, Post), y=row_names, sort_yx=FALSE, segments_y=FALSE)
##
##
##
## --- Pre ---
##
## n miss mean sd min mdn max
## 37 0 78.8 12.0 59.0 80.0 100.0
##
##
## --- Post ---
##
## n miss mean sd min mdn max
## 37 0 81.0 11.6 59.0 84.0 100.0
##
##
## No (Box plot) outliers
##
##
## n diff Row
## ---------------------------
## 1 13.0 Korhalkar, Jessica
## 2 13.0 Cooper, Lindsay
## 3 12.0 Anastasiou, Crystal
## 4 12.0 Wu, James
## 5 10.0 Ritchie, Darnell
## 6 8.0 Campagna, Justin
## 7 7.0 Cassinelli, Anastis
## 8 7.0 Hamide, Bita
## 9 7.0 Sheppard, Cory
## 10 6.0 LaRoe, Maria
## 27 -1.0 Kimball, Claire
## 28 -2.0 Capelle, Adam
## 29 -2.0 Stanley, Emma
## 30 -2.0 Adib, Hassan
## 31 -2.0 Skrotzki, Sara
## 32 -3.0 Anderson, David
## 33 -3.0 Correll, Trevon
## 34 -3.0 Kralik, Laura
## 35 -3.0 Jones, Alissa
## 36 -4.0 Gvakharia, Kimberly
## 37 -4.0 Downs, Deborah
Read time series data of stock Price for three companies: Apple, IBM, and Intel. The data table is in long form, part of lessR.
##
## >>> Suggestions
## Details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## Date: Date with year, month and day
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 date Date 1374 0 458 1980-12-01 ... 2019-01-01
## 2 Company character 1374 0 3 Apple Apple ... Intel Intel
## 3 Price double 1374 0 1259 0.027 0.023 ... 46.634 46.823
## ------------------------------------------------------------------------------------------
## date Company Price
## 1 1980-12-01 Apple 0.027
## 2 1981-01-01 Apple 0.023
## 3 1981-02-01 Apple 0.021
## 4 1981-03-01 Apple 0.020
## 5 1981-04-01 Apple 0.023
Activate a time series plot by setting the \(x\)-variable to a variable of R type Date
, which is true of the variable date in this data set. Here plot just for Apple.
## >>> Suggestions
## Plot(date, Price, fit="lm", fit_se=c(.90,.99)) # fit line, standard errors
## Plot(date, Price, out_cut=.10) # label top 10% potential outliers
## Plot(date, Price, enhance=TRUE) # many options
##
##
## >>> Pearson's product-moment correlation
##
## Number of paired values with neither missing, n = 458
##
##
## Sample Correlation of date and Price: r = 0.706
##
##
## Hypothesis Test of 0 Correlation: t = 21.280, df = 456, p-value = 0.000
## 95% Confidence Interval for Correlation: 0.6570 to 0.7490
With the by
parameter, plot all three companies on the same panel.
## >>> Suggestions
## Plot(date, Price, fit="lm", fit_se=c(.90,.99)) # fit line, standard errors
## Plot(date, Price, out_cut=.10) # label top 10% potential outliers
## Plot(date, Price, enhance=TRUE) # many options
##
##
## >>> Pearson's product-moment correlation
##
## Number of paired values with neither missing, n = 1374
##
##
## Sample Correlation of date and Price: r = 0.677
##
##
## Hypothesis Test of 0 Correlation: t = 34.036, df = 1372, p-value = 0.000
## 95% Confidence Interval for Correlation: 0.6470 to 0.7040
With the by1
parameter, plot all three companies on the different panels, a Trellis plot.
## [Trellis graphics from Deepayan Sarkar's lattice package]
Now do the Trellis plot with some color.
style(sub_theme="black", trans=.55,
window_fill="gray10", grid_color="gray25")
Plot(date, Price, by1=Company, n.col=1, fill="darkred", color="red")
## [Trellis graphics from Deepayan Sarkar's lattice package]
Use the base R help()
function to view the full manual for Plot()
. Simply enter a question mark followed by the name of the function.
?Plot