Topic overview

Logistic Regression

Predict yes or no. Approve or deny. Stay or churn. Master the art of classification.

Learning objectives

  • Explain why linear regression fails for classification problems.
  • Interpret the sigmoid function and its role in logistic regression.
  • Calculate and interpret odds and log-odds.
  • Evaluate classification models using confusion matrices and key metrics.
  • Build and interpret a logistic regression model in R.

Start here

Open the lecture experience and follow the guided flow.

Open lecture
🎯

Key Learning Summary

Walk away from this week knowing these core ideas cold.

1

LPM Fails for Binary Outcomes

Using regular linear regression (lm()) on a 0/1 outcome can produce predictions above 1 or below 0. That's why we need logistic regression.

2

The Sigmoid Bounds Predictions Between 0 and 1

Logistic regression wraps the linear equation through a sigmoid function, guaranteeing outputs are valid probabilities. Large positive inputs → probability near 1. Large negative inputs → probability near 0.

3

Coefficients Are in Log-Odds, Not Probability

The numbers in R's summary() output are log-odds. To get a human-readable percentage change in odds, use (exp(coef(model)) - 1) * 100.

4

Always Use type = "response"

Without it, predict() returns log-odds (numbers like -3.8 or 1.2). With it, you get actual probabilities (0.02 or 0.77).

5

Validate on Unseen Data

Training accuracy is misleading — the model may have memorized patterns. Split your data into training (75%) and validation (25%) sets. Only validation accuracy tells you how the model will perform in the real world.

6

Accuracy Isn't the Whole Story

A model can be "accurate" overall but terrible at catching the thing you care about. Sensitivity measures how well you catch positives (spam, defaults). Specificity measures how well you identify negatives. The threshold (default 0.5) is a business decision, not a math rule.

The one sentence to remember: Logistic regression is linear regression on a transformed (log-odds) scale, squeezed through the sigmoid to produce valid probabilities — and you must always validate on data the model hasn't seen.

📖

Key Vocabulary

Terms you should be able to define and use confidently.

Core Concepts

Classification

Predicting which category an observation belongs to (e.g., spam vs. not spam, approved vs. denied).

concept

Binary Outcome

A dependent variable with only two values: 0 (no/failure) or 1 (yes/success).

concept

Linear Probability Model (LPM)

Using regular linear regression on a binary outcome. Simple but flawed — predictions can go below 0 or above 1.

concept

Logistic Regression

A regression model for binary outcomes that uses the sigmoid function to keep predictions between 0 and 1. Predicts the probability of Y = 1.

concept
The Math Language

Sigmoid Function

The S-shaped "squashing function" that takes any real number and maps it to a value between 0 and 1. Also called the logistic function.

math

Odds

The ratio of the probability of an event happening to it not happening. P/(1-P). Odds of 4 means "4 to 1 in favor." Range: 0 to ∞.

math

Log-Odds (Logit)

The natural logarithm of the odds: ln(P/(1-P)). This is what logistic regression actually models. Range: -∞ to +∞. Log-odds of 0 = 50% probability.

math

Odds Ratio

The exponentiated coefficient: eβ. Tells you how many times the odds are multiplied for each 1-unit increase in X. An odds ratio of 1.67 means +67% change in odds.

math

Dummy Variable

A variable coded as 0 or 1 to represent a categorical predictor. E.g., Single = 1 if plan type is "Single", 0 if "Family." The coefficient compares the coded group to the reference group.

math
R Functions & Syntax

glm()

Generalized Linear Model function. Used instead of lm() for logistic regression. Requires family = binomial.

R

family = binomial

Tells R that the outcome is binary (0/1) and to use logistic regression. Without this, glm() doesn't know what type of model to fit.

R

type = "response"

Argument for predict() that converts output from log-odds to probability. Forgetting this is the #1 student mistake.

R

exp(coef(model))

Exponentiates the coefficients to get odds ratios. Use (exp(coef(model)) - 1) * 100 for percentage change in odds.

R

ifelse()

Used both for creating dummy variables (ifelse(Plan == "Single", 1, 0)) and converting probabilities to 0/1 predictions (ifelse(pHat >= 0.5, 1, 0)).

R
Model Evaluation

Holdout Method (Cross-Validation)

Splitting data into training set (build the model) and validation set (test it). Prevents overfitting. The validation accuracy is the honest score.

evaluation

Overfitting

When a model memorizes the training data instead of learning real patterns. Shows as high training accuracy but low validation accuracy.

evaluation

Confusion Matrix

A 2×2 table showing True Positives, True Negatives, False Positives, and False Negatives. The foundation for all classification metrics.

evaluation

Accuracy

Percentage of all predictions that were correct. (TP + TN) / Total. Simple but can be misleading with imbalanced data.

evaluation

Sensitivity (True Positive Rate)

Of all actual positives, how many did the model catch? TP / (TP + FN). "How good is it at finding spam?"

evaluation

Specificity (True Negative Rate)

Of all actual negatives, how many were correctly identified? TN / (TN + FP). "How good is it at leaving real emails alone?"

evaluation

Threshold (Cutoff)

The probability boundary for classification. Default is 0.5, but can be adjusted. Lowering it catches more positives (higher sensitivity) but creates more false alarms (lower specificity).

evaluation
📐

Key Formulas

The essential formulas for this week. You don't need to memorize derivations — just know what each one does and when to use it.

The Translation Chain

Probability → Odds

Odds = P / (1 - P)

Converts probability (0 to 1) to odds (0 to ∞). Example: P = 0.80 → Odds = 0.80/0.20 = 4.

Odds → Log-Odds

Log-odds = ln(Odds) = ln(P / (1 - P))

Converts odds (0 to ∞) to log-odds (-∞ to +∞). This is the "logit" transformation. Example: Odds = 4 → ln(4) = 1.39.

Log-Odds → Probability (Sigmoid)

P = 1 / (1 + e-z)

Where z = β₀ + β₁X₁ + β₂X₂. Converts any real number back to a probability between 0 and 1. This is what type = "response" does.

Probability→ P/(1-P) →Odds→ ln() →Log-Odds→ sigmoid →Probability
The Model

Logistic Regression Equation

ln(Odds) = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ

The model predicts log-odds as a linear function of the predictors. Looks just like linear regression, but the left side is log-odds, not Y.

Percentage Change in Odds

% Change = (eβ - 1) × 100

Converts a log-odds coefficient to a meaningful percentage. Example: β = 0.513 → (e0.513 - 1) × 100 = +67.1% increase in odds.

In R: (exp(coef(model)) - 1) * 100

Evaluation Metrics

Classification Rule

ŷ = 1 if P(Y=1) ≥ 0.5, else ŷ = 0

Convert predicted probability to a 0/1 classification using a threshold. 0.5 is the default but can be adjusted based on business needs.

In R: yHat <- ifelse(pHat >= 0.5, 1, 0)

Accuracy

Accuracy = (TP + TN) / N

Proportion of all predictions that were correct. Simple but can be misleading when classes are imbalanced.

In R: 100 * mean(actual == predicted)

Sensitivity (True Positive Rate)

Sensitivity = TP / (TP + FN)

Of those who are actually positive, how many did we correctly predict?

"Of all real spam, how much did we catch?"

Specificity (True Negative Rate)

Specificity = TN / (TN + FP)

Of those who are actually negative, how many were correctly identified?

"Of all real emails, how many did we leave alone?"

Confusion Matrix Quick Reference

Predicted
ŷ = 0ŷ = 1
ActualY = 0TN ✓FP ✗
Y = 1FN ✗TP ✓
TP = True Positive (correctly caught)
TN = True Negative (correctly cleared)
FP = False Positive (false alarm)
FN = False Negative (missed)