Example R Analysis: Data Visualization

Demonstrating R Markdown with executable code, plots, and cross-references

Example R Analysis: Data Visualization

Demonstrating R Markdown with executable code, plots, and cross-references
Author

Benjamin Peeters

Introduction

This is an example R Markdown document that demonstrates the integration of:

  • Executable R code
  • Data visualization
  • Mathematical equations with cross-references
  • Interactive analysis

Mathematical Foundation

The linear regression model is defined as:

yi=β0+β1xi+ϵi(1) y_i = \beta_0 + \beta_1 x_i + \epsilon_i \tag{1}

where ϵiN(0,σ2)\epsilon_i \sim N(0, \sigma^2) represents the error term.

The coefficient of determination is:

R2=1i=1n(yiy^i)2i=1n(yiyˉ)2(2) R^2 = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2} \tag{2}

Data Analysis

Let’s demonstrate with a simple analysis using R:

# Generate sample data
set.seed(42)
n <- 100
x <- seq(0, 10, length.out = n)
y <- 2 + 3 * x + rnorm(n, sd = 2)
data <- data.frame(x = x, y = y)

# Summary statistics
summary(data)
       x              y         
 Min.   : 0.0   Min.   : 1.174  
 1st Qu.: 2.5   1st Qu.: 9.316  
 Median : 5.0   Median :17.149  
 Mean   : 5.0   Mean   :17.065  
 3rd Qu.: 7.5   3rd Qu.:24.397  
 Max.   :10.0   Max.   :33.306  

Regression Model

We fit the linear model described in Equation 1:

# Fit linear regression
model <- lm(y ~ x, data = data)
summary(model)

Call:
lm(formula = y ~ x, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.0389 -1.3237  0.1618  1.3054  4.4527 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.13643    0.41546   5.142 1.39e-06 ***
x            2.98572    0.07178  41.596  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.093 on 98 degrees of freedom
Multiple R-squared:  0.9464,    Adjusted R-squared:  0.9459 
F-statistic:  1730 on 1 and 98 DF,  p-value: < 2.2e-16

The R2R^2 value from Equation 2 is 0.946.

Visualization

# Create scatter plot with regression line
plot(data$x, data$y,
     pch = 16, col = "steelblue",
     xlab = "X", ylab = "Y",
     main = "Linear Regression Example")
abline(model, col = "red", lwd = 2)
legend("topleft",
       legend = c("Data", "Fitted line"),
       col = c("steelblue", "red"),
       pch = c(16, NA),
       lty = c(NA, 1),
       lwd = c(NA, 2))

Results

As shown in Section 3, the linear relationship between xx and yy is well-captured by the model in Equation 1.

Key findings:

  1. The estimated slope is approximately 3
  2. The intercept is close to 2
  3. The model explains most of the variance in the data

Conclusion

This document demonstrates the capability to: