Topic overview
Logistic Regression
Predict yes or no. Approve or deny. Stay or churn. Master the art of classification.
Learning objectives
- •Explain why linear regression fails for classification problems.
- •Interpret the sigmoid function and its role in logistic regression.
- •Calculate and interpret odds and log-odds.
- •Evaluate classification models using confusion matrices and key metrics.
- •Build and interpret a logistic regression model in R.
Key Learning Summary
Walk away from this week knowing these core ideas cold.
LPM Fails for Binary Outcomes
Using regular linear regression (lm()) on a 0/1 outcome can produce predictions above 1 or below 0. That's why we need logistic regression.
The Sigmoid Bounds Predictions Between 0 and 1
Logistic regression wraps the linear equation through a sigmoid function, guaranteeing outputs are valid probabilities. Large positive inputs → probability near 1. Large negative inputs → probability near 0.
Coefficients Are in Log-Odds, Not Probability
The numbers in R's summary() output are log-odds. To get a human-readable percentage change in odds, use (exp(coef(model)) - 1) * 100.
Always Use type = "response"
Without it, predict() returns log-odds (numbers like -3.8 or 1.2). With it, you get actual probabilities (0.02 or 0.77).
Validate on Unseen Data
Training accuracy is misleading — the model may have memorized patterns. Split your data into training (75%) and validation (25%) sets. Only validation accuracy tells you how the model will perform in the real world.
Accuracy Isn't the Whole Story
A model can be "accurate" overall but terrible at catching the thing you care about. Sensitivity measures how well you catch positives (spam, defaults). Specificity measures how well you identify negatives. The threshold (default 0.5) is a business decision, not a math rule.
The one sentence to remember: Logistic regression is linear regression on a transformed (log-odds) scale, squeezed through the sigmoid to produce valid probabilities — and you must always validate on data the model hasn't seen.
Key Vocabulary
Terms you should be able to define and use confidently.
Core Concepts
Classification
Predicting which category an observation belongs to (e.g., spam vs. not spam, approved vs. denied).
Binary Outcome
A dependent variable with only two values: 0 (no/failure) or 1 (yes/success).
Linear Probability Model (LPM)
Using regular linear regression on a binary outcome. Simple but flawed — predictions can go below 0 or above 1.
Logistic Regression
A regression model for binary outcomes that uses the sigmoid function to keep predictions between 0 and 1. Predicts the probability of Y = 1.
The Math Language
Sigmoid Function
The S-shaped "squashing function" that takes any real number and maps it to a value between 0 and 1. Also called the logistic function.
Odds
The ratio of the probability of an event happening to it not happening. P/(1-P). Odds of 4 means "4 to 1 in favor." Range: 0 to ∞.
Log-Odds (Logit)
The natural logarithm of the odds: ln(P/(1-P)). This is what logistic regression actually models. Range: -∞ to +∞. Log-odds of 0 = 50% probability.
Odds Ratio
The exponentiated coefficient: eβ. Tells you how many times the odds are multiplied for each 1-unit increase in X. An odds ratio of 1.67 means +67% change in odds.
Dummy Variable
A variable coded as 0 or 1 to represent a categorical predictor. E.g., Single = 1 if plan type is "Single", 0 if "Family." The coefficient compares the coded group to the reference group.
R Functions & Syntax
glm()
Generalized Linear Model function. Used instead of lm() for logistic regression. Requires family = binomial.
family = binomial
Tells R that the outcome is binary (0/1) and to use logistic regression. Without this, glm() doesn't know what type of model to fit.
type = "response"
Argument for predict() that converts output from log-odds to probability. Forgetting this is the #1 student mistake.
exp(coef(model))
Exponentiates the coefficients to get odds ratios. Use (exp(coef(model)) - 1) * 100 for percentage change in odds.
ifelse()
Used both for creating dummy variables (ifelse(Plan == "Single", 1, 0)) and converting probabilities to 0/1 predictions (ifelse(pHat >= 0.5, 1, 0)).
Model Evaluation
Holdout Method (Cross-Validation)
Splitting data into training set (build the model) and validation set (test it). Prevents overfitting. The validation accuracy is the honest score.
Overfitting
When a model memorizes the training data instead of learning real patterns. Shows as high training accuracy but low validation accuracy.
Confusion Matrix
A 2×2 table showing True Positives, True Negatives, False Positives, and False Negatives. The foundation for all classification metrics.
Accuracy
Percentage of all predictions that were correct. (TP + TN) / Total. Simple but can be misleading with imbalanced data.
Sensitivity (True Positive Rate)
Of all actual positives, how many did the model catch? TP / (TP + FN). "How good is it at finding spam?"
Specificity (True Negative Rate)
Of all actual negatives, how many were correctly identified? TN / (TN + FP). "How good is it at leaving real emails alone?"
Threshold (Cutoff)
The probability boundary for classification. Default is 0.5, but can be adjusted. Lowering it catches more positives (higher sensitivity) but creates more false alarms (lower specificity).
Key Formulas
The essential formulas for this week. You don't need to memorize derivations — just know what each one does and when to use it.
The Translation Chain
Probability → Odds
Converts probability (0 to 1) to odds (0 to ∞). Example: P = 0.80 → Odds = 0.80/0.20 = 4.
Odds → Log-Odds
Converts odds (0 to ∞) to log-odds (-∞ to +∞). This is the "logit" transformation. Example: Odds = 4 → ln(4) = 1.39.
Log-Odds → Probability (Sigmoid)
Where z = β₀ + β₁X₁ + β₂X₂. Converts any real number back to a probability between 0 and 1. This is what type = "response" does.
The Model
Logistic Regression Equation
The model predicts log-odds as a linear function of the predictors. Looks just like linear regression, but the left side is log-odds, not Y.
Percentage Change in Odds
Converts a log-odds coefficient to a meaningful percentage. Example: β = 0.513 → (e0.513 - 1) × 100 = +67.1% increase in odds.
In R: (exp(coef(model)) - 1) * 100
Evaluation Metrics
Classification Rule
Convert predicted probability to a 0/1 classification using a threshold. 0.5 is the default but can be adjusted based on business needs.
In R: yHat <- ifelse(pHat >= 0.5, 1, 0)
Accuracy
Proportion of all predictions that were correct. Simple but can be misleading when classes are imbalanced.
In R: 100 * mean(actual == predicted)
Sensitivity (True Positive Rate)
Of those who are actually positive, how many did we correctly predict?
"Of all real spam, how much did we catch?"
Specificity (True Negative Rate)
Of those who are actually negative, how many were correctly identified?
"Of all real emails, how many did we leave alone?"
Confusion Matrix Quick Reference
| Predicted | |||
|---|---|---|---|
| ŷ = 0 | ŷ = 1 | ||
| Actual | Y = 0 | TN ✓ | FP ✗ |
| Y = 1 | FN ✗ | TP ✓ | |