R Tutorial



R STATISTICS


Statistics in R

R is one of the most powerful languages for statistical computing and analysis. It provides built-in functions for descriptive statistics, probability, and hypothesis testing.

1. Descriptive Statistics

Use these functions to get summary statistics of your data:

data <- c(10, 20, 15, 25, 30, 20, 40)

mean(data)        # Average
median(data)      # Middle value
mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
mode(data)        # Custom mode function

range(data)       # Minimum and maximum
var(data)         # Variance
sd(data)          # Standard deviation
summary(data)     # Full statistical summary
  

2. Probability Distributions

R can work with common probability distributions:

  • dnorm(), pnorm() – Normal distribution
  • dbinom(), pbinom() – Binomial distribution
  • dpois(), ppois() – Poisson distribution
# Normal distribution probability
dnorm(0, mean = 0, sd = 1)

# Binomial probability
dbinom(3, size = 5, prob = 0.5)

# Poisson probability
dpois(2, lambda = 3)
  

3. Correlation and Covariance

Used to measure relationships between variables:

x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)

cor(x, y)   # Correlation
cov(x, y)   # Covariance
  

4. Hypothesis Testing

Basic statistical tests for comparing data samples:

# One sample t-test
t.test(data, mu = 20)

# Two sample t-test
group1 <- c(10, 12, 14)
group2 <- c(15, 18, 20)
t.test(group1, group2)

# Chi-square test
observed <- c(25, 30, 45)
expected <- c(30, 30, 40)
chisq.test(x = observed, p = expected/sum(expected))
  

5. Linear Regression

Fit a linear model to predict values:

x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)

model <- lm(y ~ x)
summary(model)

# Plot the regression line
plot(x, y)
abline(model, col = "red")
  

Tips for Students

  • Use summary() on datasets to explore distributions.
  • Know the difference between descriptive and inferential statistics.
  • Use help(function_name) to explore usage.
  • Practice interpreting statistical outputs like p-values and coefficients.

Practice Questions

  • Calculate mean, median, and standard deviation for a sample dataset.
  • Use cor() and cov() to analyze relationships.
  • Perform a t-test to compare two groups.
  • Fit a linear regression model and interpret the summary.

🌟 Enjoyed Learning with Us?

Help others discover Technorank Learning by sharing your honest experience.
Your support inspires us to keep building!

Leave a Google Review