Example R Analysis: Data Visualization
Demonstrating R Markdown with executable code, plots, and cross-references
By Benjamin Peeters
Example R Analysis: Data Visualization
Demonstrating R Markdown with executable code, plots, and cross-references
Author
Benjamin Peeters
Introduction
This is an example R Markdown document that demonstrates the integration of:
- Executable R code
- Data visualization
- Mathematical equations with cross-references
- Interactive analysis
Mathematical Foundation
The linear regression model is defined as:
yi=β0+β1xi+ϵi(1)
where ϵi∼N(0,σ2) represents the error term.
The coefficient of determination is:
R2=1−∑i=1n(yi−yˉ)2∑i=1n(yi−y^i)2(2)
Data Analysis
Let’s demonstrate with a simple analysis using R:
# Generate sample data
set.seed(42)
n <- 100
x <- seq(0, 10, length.out = n)
y <- 2 + 3 * x + rnorm(n, sd = 2)
data <- data.frame(x = x, y = y)
# Summary statistics
summary(data) x y
Min. : 0.0 Min. : 1.174
1st Qu.: 2.5 1st Qu.: 9.316
Median : 5.0 Median :17.149
Mean : 5.0 Mean :17.065
3rd Qu.: 7.5 3rd Qu.:24.397
Max. :10.0 Max. :33.306 Regression Model
We fit the linear model described in Equation 1:
Call:
lm(formula = y ~ x, data = data)
Residuals:
Min 1Q Median 3Q Max
-6.0389 -1.3237 0.1618 1.3054 4.4527
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.13643 0.41546 5.142 1.39e-06 ***
x 2.98572 0.07178 41.596 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.093 on 98 degrees of freedom
Multiple R-squared: 0.9464, Adjusted R-squared: 0.9459
F-statistic: 1730 on 1 and 98 DF, p-value: < 2.2e-16 The R2 value from Equation 2 is 0.946.
Visualization
# Create scatter plot with regression line
plot(data$x, data$y,
pch = 16, col = "steelblue",
xlab = "X", ylab = "Y",
main = "Linear Regression Example")
abline(model, col = "red", lwd = 2)
legend("topleft",
legend = c("Data", "Fitted line"),
col = c("steelblue", "red"),
pch = c(16, NA),
lty = c(NA, 1),
lwd = c(NA, 2))Results
As shown in Section 3, the linear relationship between x and y is well-captured by the model in Equation 1.
Key findings:
- The estimated slope is approximately 3
- The intercept is close to 2
- The model explains most of the variance in the data
Conclusion
This document demonstrates the capability to:
- Execute R code and display results
- Generate plots and visualizations
- Use equation cross-references (Equation 1, Equation 2)
- Reference sections (Section 1, Section 2, Section 3)