All examples in this document use default R datasets.
x <- 0:9
x[6] <- 15
x <- c(0:4, 15, 6:9) # do it in one line
x## [1] 0 1 2 3 4 15 6 7 8 9
45:57 * .42## [1] 18.90 19.32 19.74 20.16 20.58 21.00 21.42 21.84 22.26 22.68 23.10
## [12] 23.52 23.94
1:10 %*% matrix(rbeta(400, 2, 1), nrow = 10, ncol = 4)## [,1] [,2] [,3] [,4]
## [1,] 33.02 34.13 39.78 34.4
Seatbelts data to include only the drivers, rear, PetrolPrice, and law columnsdata.frame(Seatbelts[, c('drivers', 'rear', 'PetrolPrice', 'law')])drivers <dbl> | rear <dbl> | PetrolPrice <dbl> | law <dbl> | |
|---|---|---|---|---|
| 1687 | 269 | 0.10297 | 0 | |
| 1508 | 265 | 0.10236 | 0 | |
| 1507 | 319 | 0.10206 | 0 | |
| 1385 | 407 | 0.10087 | 0 | |
| 1632 | 454 | 0.10102 | 0 | |
| 1511 | 427 | 0.10058 | 0 | |
| 1559 | 522 | 0.10377 | 0 | |
| 1630 | 536 | 0.10408 | 0 | |
| 1579 | 405 | 0.10377 | 0 | |
| 1653 | 437 | 0.10303 | 0 |
CO2 data to include only observations where the plant’s CO2 uptake rate is less than or equal to 15CO2[which(CO2$uptake <= 15), ]Plant <ord> | Type <fctr> | Treatment <fctr> | conc <dbl> | uptake <dbl> |
|---|---|---|---|---|
| Qn2 | Quebec | nonchilled | 95 | 13.6 |
| Qc1 | Quebec | chilled | 95 | 14.2 |
| Qc2 | Quebec | chilled | 95 | 9.3 |
| Mn1 | Mississippi | nonchilled | 95 | 10.6 |
| Mn2 | Mississippi | nonchilled | 95 | 12.0 |
| Mn3 | Mississippi | nonchilled | 95 | 11.3 |
| Mc1 | Mississippi | chilled | 95 | 10.5 |
| Mc1 | Mississippi | chilled | 175 | 14.9 |
| Mc2 | Mississippi | chilled | 95 | 7.7 |
| Mc2 | Mississippi | chilled | 175 | 11.4 |
mtcars data in ascending order by cylinders and miles per gallonmtcars[order(mtcars$cyl, mtcars$mpg), ]mpg <dbl> | cyl <dbl> | disp <dbl> | hp <dbl> | drat <dbl> | wt <dbl> | qsec <dbl> | vs <dbl> | am <dbl> | gear <dbl> | |
|---|---|---|---|---|---|---|---|---|---|---|
| 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | |
| 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | |
| 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | |
| 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | |
| 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | |
| 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | |
| 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | |
| 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | |
| 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | |
| 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 |
plot(density(rnorm(1e4, 2, .89)))invlogit() function from arm without loading the packagearm::invlogit(.034)## [1] 0.5085
mtcars data, fit a linear model that explains variation in miles per gallon as a function of number of cylinders, displacement, and horsepower. Extract the coefficients, standard error, and R2 from the model.m1 <- lm(mpg ~ cyl + disp + hp, data = mtcars)
coef(m1)## (Intercept) cyl disp hp
## 34.18492 -1.22742 -0.01884 -0.01468
sqrt(diag(vcov(m1)))## (Intercept) cyl disp hp
## 2.59078 0.79728 0.01040 0.01465
summary(m1)$r.squared## [1] 0.7679
Titanic data to fit a model that explains whether a passenger survived the ship’s sinking as a function of their sex, age, and passenger class, but use a probit link function. What is the difference in coefficient estimates between this model and one using the canonical logit link function?coef(glm(Survived ~ Class + Sex + Age, data = Titanic, family = binomial(link = 'probit'))) -
coef(glm(Survived ~ Class + Sex + Age, data = Titanic, family = binomial(link = 'logit')))## (Intercept) Class2nd Class3rd ClassCrew SexFemale AgeAdult
## -2.902e-16 3.942e-16 4.920e-16 6.943e-16 -2.156e-16 2.214e-16
x <- numeric()
for (i in 1:1e4) {
x[i] <- mean(rnorm(1e3, -2.5, 4))
}
mean(x)## [1] -2.501
my.mean <- function(x) {
sum(x) / length(x)
}
my.mean(1:7)## [1] 4
my.mean.NA <- function(x) {
x <- na.omit(x)
sum(x) / length(x)
}
my.mean.NA(c(NA, 1:7, NA))## [1] 4
myfunc <- function(x) {
for (i in 1:length(x)) {
if (x[i] %% 2 == 0) {
x[i] <- x[i]^2
} else {
x[i] <- sqrt(x[i])
}
}
x
}
myfunc(seq(1, 6, by = .5))## [1] 1.000 1.225 4.000 1.581 1.732 1.871 16.000 2.121 2.236 2.345
## [11] 36.000
airquality data to plot wind speed against temperature. Use separate colors for observations in each month, and include a linear fit line for each month.library(ggplot2)
ggplot(data = airquality, aes(x = Wind, y = Temp, color = as.factor(Month))) +
geom_point() +
geom_smooth(method = 'lm', se = F) +
labs(color = 'Month') +
scale_color_discrete(labels = c('May', 'Jun', 'Jul', 'Aug', 'Sep')) +
theme_bw() +
theme(legend.position = 'right',
plot.background = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank())