placement brief / Interview Questions / brief / 08 Jun 2026

Statistics for Data Science 2026: 28 Interview Questions with Answers

Q: What statistical mistakes do DS candidates make most often in interviews?

Three most common: (1) saying "p-value is probability the null is true" -- it is not; (2) running an A/B test without pre-specifying sample size and stopping based on when significance appears (peeking problem); (3) confusing statistical significance with practical significance. Candidates from public preparation resources consistently identify these three as the top elimination questions.

Q: Do I need to know ARIMA deeply for DS interviews?

For time series roles (demand forecasting, finance, supply chain), yes -- stationarity, differencing, ACF/PACF interpretation, and model selection are tested. For general DS roles, understanding stationarity and knowing when to apply time-series-specific CV is sufficient.

Q: How much probability theory is expected in DS interviews vs applied statistics?

Most product company DS interviews focus on applied statistics (hypothesis testing, regression, A/B design) over pure probability theory. Combinatorics and conditional probability appear mainly in quant-leaning roles. Confirm the expected depth on the official company careers portal before your round.

28 statistics interview questions for data science with full answers and Python code covering probability, hypothesis testing, distributions, Bayesian statistics, regression, and A/B testing design for 2026.

By Aditya SharmaPublished 8 Jun 20262 sources listedSpot an error? Corrections open

5 min read last revised 8 Jun 2026

on this page§ 05

Strong statistics foundations separate senior data scientists from junior analysts. Every product company DS interview includes statistics questions on hypothesis testing, experimental design, probability, and regression. This guide covers 28 statistics interview questions with full answers and Python code, organized from fundamentals to advanced experimental design.

PapersAdda's take: Candidates report that p-value interpretation errors and A/B test design mistakes are the two most common reasons for DS interview rejections at product companies. Understanding what a p-value actually measures (not "the probability the null is true") and correctly designing sample sizes before running experiments are tested at every level. Confirm the specific statistical depth expected on the official company careers portal before you prepare.

Related articles: Data Science Interview Questions 2026 | Machine Learning Interview Questions 2026 | Pandas Interview Questions 2026 | SQL for Data Analysts 2026 | Scikit-Learn Interview Questions 2026

Core Areas Tested by Company Type

Company Type	Statistical Focus
FAANG India (Google, Meta, Amazon)	Experimental design, causal inference, regression
Fintech (Razorpay, PhonePe, Groww)	Time series, risk modeling, ARIMA
E-commerce (Flipkart, Meesho)	A/B testing, funnel analysis, survival analysis
Healthtech (1mg, Practo)	Clinical trial design, sensitivity/specificity
Quant/Trading	Probability distributions, stochastic processes

EASY: Probability and Distributions (Questions 1-8)

Q1. What is the difference between probability and statistics?

Probability: given a known model (distribution), compute the likelihood of outcomes.

"If a coin is fair, what is P(HHH)?" = (0.5)^3 = 0.125

Statistics: given observed data, infer the underlying model or population parameters.

"You observe HHH. Is the coin fair?" = hypothesis test

Relationship: probability provides the theoretical framework; statistics applies it to real data through estimation, hypothesis testing, and prediction.

Q2. What are the key probability distributions? When do you use each in data science?

from scipy import stats
import numpy as np
import matplotlib.pyplot as plt

# Discrete distributions
# Bernoulli: single binary trial
rv_bern = stats.bernoulli(p=0.3)
print(rv_bern.pmf(1))   # P(X=1) = 0.3

# Binomial: number of successes in N Bernoulli trials
rv_bin = stats.binom(n=100, p=0.05)
print(rv_bin.pmf(5))    # P(X=5) in 100 trials, 5% success rate

# Poisson: count of events in fixed time interval
rv_poi = stats.poisson(mu=10)   # expected 10 events/hour
print(rv_poi.pmf(8))    # P(exactly 8 events)

# Continuous distributions
# Normal: sum of many independent effects (CLT)
rv_norm = stats.norm(loc=0, scale=1)
print(rv_norm.pdf(1.96))  # density at 1.96
print(rv_norm.cdf(1.96))  # P(X <= 1.96) = 0.975 (used for 95% CI)

# Exponential: time between events in a Poisson process
rv_exp = stats.expon(scale=1/10)  # rate=10 events/hour
print(rv_exp.mean())    # mean time = 0.1 hours

# Log-normal: products of positive random variables (income, prices)
rv_lnorm = stats.lognorm(s=0.5, scale=np.exp(3))  # log mean=3, log std=0.5

# Beta: probability over [0,1] (click rates, conversion rates)
rv_beta = stats.beta(a=10, b=90)   # prior for 10% click rate
print(rv_beta.mean())  # 0.1

# Chi-squared: sum of squared standard normals (goodness of fit tests)
rv_chi2 = stats.chi2(df=5)

Distribution	Use case in DS
Normal	Model errors, CLT-based inference, Z-tests
Binomial	Conversion rates, click-through (fixed N)
Poisson	Event counts (arrivals, failures, fraud alerts)
Exponential	Time-to-event (survival, session duration)
Beta	Bayesian prior for rates/probabilities
Log-normal	Prices, income, latency (right-skewed positive)

Q3. What is the difference between PMF, PDF, and CDF?

from scipy import stats
import numpy as np

# PMF (Probability Mass Function): discrete distributions
# P(X = k) -- exact probability at each integer
rv = stats.binom(n=10, p=0.3)
print(f"P(X=3) = {rv.pmf(3):.4f}")
print(f"P(X<=3) = {rv.cdf(3):.4f}")

# PDF (Probability Density Function): continuous distributions
# P(a <= X <= b) = integral of pdf from a to b
# pdf(x) is NOT a probability -- it's a density (can exceed 1)
rv_norm = stats.norm(loc=0, scale=1)
print(f"pdf at 0 = {rv_norm.pdf(0):.4f}")          # 0.3989 (peak of standard normal)
print(f"P(-1 <= X <= 1) = {rv_norm.cdf(1) - rv_norm.cdf(-1):.4f}")  # 0.6827

# CDF (Cumulative Distribution Function): P(X <= x)
# Works for both discrete and continuous
print(f"P(X <= 1.96) = {rv_norm.cdf(1.96):.4f}")   # 0.975

# Inverse CDF (ppf / quantile function)
print(f"90th percentile: {rv_norm.ppf(0.90):.4f}")  # 1.2816
print(f"95th percentile: {rv_norm.ppf(0.95):.4f}")  # 1.6449

# Practical: use CDF to compute p-values
z_score = 2.1
p_value_two_tailed = 2 * (1 - rv_norm.cdf(abs(z_score)))
print(f"p-value for z=2.1: {p_value_two_tailed:.4f}")  # 0.0357

Q4. Explain the Law of Large Numbers and its relevance to data science.

Weak LLN: as n increases, the sample mean converges in probability to the population mean mu. lim P(|X_bar - mu| > epsilon) = 0 as n -> infinity.

Strong LLN: sample mean converges almost surely (with probability 1).

DS applications:

Monte Carlo estimation converges as sample size increases.
A/B test results become stable only after sufficient sample size.
Batch gradient descent converges to true gradient as batch size increases.
Online learning: single-sample gradient updates are noisy but converge.

import numpy as np
import matplotlib.pyplot as plt

# Demonstrate LLN: die roll experiment
np.random.seed(42)
rolls = np.random.randint(1, 7, size=10_000)  # fair die, expected value = 3.5
running_means = np.cumsum(rolls) / np.arange(1, len(rolls) + 1)

# After 10 rolls: mean might be 3.2 or 4.1
# After 10,000 rolls: mean converges to ~3.5
print(f"After 100 rolls: {running_means[99]:.3f}")
print(f"After 1000 rolls: {running_means[999]:.3f}")
print(f"After 10000 rolls: {running_means[9999]:.3f}")
# Expected: progressively closer to 3.5

Q5. What is conditional probability? Give a data science example.

P(A|B) = P(A AND B) / P(B) -- probability of A given B has occurred.

Example -- spam classifier:

# P(spam) = 0.20
# P("urgent" | spam) = 0.70
# P("urgent" | not spam) = 0.05

p_spam = 0.20
p_not_spam = 0.80
p_urgent_given_spam = 0.70
p_urgent_given_not_spam = 0.05

# P("urgent") = P("urgent"|spam)*P(spam) + P("urgent"|not spam)*P(not spam)
p_urgent = p_urgent_given_spam * p_spam + p_urgent_given_not_spam * p_not_spam
# = 0.70 * 0.20 + 0.05 * 0.80 = 0.14 + 0.04 = 0.18

# P(spam | "urgent") via Bayes
p_spam_given_urgent = (p_urgent_given_spam * p_spam) / p_urgent
print(f"P(spam | 'urgent') = {p_spam_given_urgent:.3f}")  # 0.778

Independence: A and B are independent if P(A|B) = P(A). In practice, features are rarely truly independent (Naive Bayes assumes independence as a simplification).

Q6. What are expected value and variance? How are they computed for common distributions?

import numpy as np
from scipy import stats

# Expected value E[X]: probability-weighted average
# Variance Var(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2

# Properties:
# E[aX + b] = a*E[X] + b
# Var(aX + b) = a^2 * Var(X)
# For INDEPENDENT X, Y: E[X+Y] = E[X] + E[Y], Var(X+Y) = Var(X) + Var(Y)

# Common distributions
print("Binomial(n=100, p=0.3):")
print(f"  E[X] = np = {100 * 0.3}")            # 30
print(f"  Var(X) = np(1-p) = {100 * 0.3 * 0.7}")  # 21

print("\nPoisson(mu=10):")
print(f"  E[X] = mu = 10")
print(f"  Var(X) = mu = 10")    # Poisson is unique: mean = variance

print("\nNormal(mu=5, sigma=2):")
print(f"  E[X] = 5")
print(f"  Var(X) = 4, Std = 2")

# Simulation verification
rng = np.random.default_rng(42)
samples = rng.binomial(n=100, p=0.3, size=100_000)
print(f"\nSimulated Binomial E[X]: {samples.mean():.3f}")   # ~30
print(f"Simulated Binomial Var: {samples.var():.3f}")       # ~21

Q7. What is the birthday problem? Why is it relevant to data engineering?

Birthday problem: in a group of N people, what is P(at least two share a birthday)?

P(no collision) = 365/365 * 364/365 * ... * (365-N+1)/365

import numpy as np
from functools import reduce
from operator import mul

def p_birthday_collision(n_people, n_days=365):
    """Probability of at least one shared birthday in a group of n_people."""
    if n_people > n_days:
        return 1.0
    p_no_collision = reduce(mul, [(n_days - i) / n_days for i in range(n_people)])
    return 1 - p_no_collision

for n in [10, 23, 50, 70]:
    print(f"n={n}: P(collision) = {p_birthday_collision(n):.3f}")
# n=23: 0.507 -- already >50% with just 23 people

# Data engineering relevance: hash collision probability
# With 64-bit hash, ~4 billion records, P(collision) ~50%
# Birthday bound: collision likely when N ~ sqrt(2 * num_buckets)
# Rule: use 128-bit hashes when N > 10M records

n_records = 1_000_000
n_buckets = 2**64  # 64-bit hash
p_collision = p_birthday_collision(n_records, n_buckets)
print(f"\n1M records, 64-bit hash collision prob: {p_collision:.2e}")

DS relevance: hash collisions in feature hashing (TF-IDF, vowpal wabbit), UUID deduplication, partition key selection in distributed systems.

Q8. Explain Chebyshev's inequality and when to use it.

For any distribution with mean mu and variance sigma^2: P(|X - mu| >= k*sigma) <= 1/k^2

Regardless of distribution shape.

import numpy as np
from scipy import stats

# Chebyshev: at least (1 - 1/k^2) of data lies within k standard deviations
# k=2: at least 75% within 2 std (vs 95.4% for normal)
# k=3: at least 88.9% within 3 std (vs 99.7% for normal)

for k in [2, 3, 4, 5]:
    chebyshev_bound = 1 - 1/k**2
    normal_actual = stats.norm.cdf(k) - stats.norm.cdf(-k)
    print(f"k={k}: Chebyshev >= {chebyshev_bound*100:.1f}% | Normal = {normal_actual*100:.1f}%")

# When to use:
# 1. Distribution unknown or non-normal
# 2. Small sample sizes where CLT hasn't kicked in
# 3. Worst-case bounds in system design (SLA guarantees)
# 4. Anomaly detection threshold-setting with unknown distribution

DS use case: setting anomaly detection thresholds without assuming normality. "Any value beyond 4 standard deviations is in the outermost 6.25% of any distribution."

MEDIUM: Hypothesis Testing and Regression (Questions 9-20)

Q9. Walk through the full hypothesis testing framework.

from scipy import stats
import numpy as np

# Full hypothesis test procedure:

# Step 1: Formulate hypotheses
# H0: mu_treatment = mu_control (no effect)
# H1: mu_treatment != mu_control (two-tailed) or > or < (one-tailed)

# Step 2: Choose significance level alpha = 0.05

# Step 3: Compute test statistic
control = np.array([12.3, 11.8, 12.1, 11.5, 12.0, 12.4, 11.9, 12.2])
treatment = np.array([13.1, 12.8, 13.5, 12.9, 13.3, 13.0, 13.2, 12.7])

# Step 4: Two-sample t-test (Welch's -- doesn't assume equal variance)
t_stat, p_value = stats.ttest_ind(treatment, control, equal_var=False)

print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")

# Step 5: Decision
alpha = 0.05
print(f"\nSignificant (p < {alpha}): {p_value < alpha}")

# Step 6: Effect size -- ALWAYS report alongside p-value
cohen_d = (treatment.mean() - control.mean()) / np.sqrt(
    (treatment.std()**2 + control.std()**2) / 2
)
print(f"Cohen's d (effect size): {cohen_d:.4f}")
# d=0.2 small, d=0.5 medium, d=0.8 large

# Step 7: Confidence interval for the difference
diff_mean = treatment.mean() - control.mean()
se_diff = np.sqrt(treatment.var()/len(treatment) + control.var()/len(control))
ci_95 = (diff_mean - 1.96*se_diff, diff_mean + 1.96*se_diff)
print(f"95% CI for difference: ({ci_95[0]:.3f}, {ci_95[1]:.3f})")

Q10. What is statistical power? How do you compute required sample size?

from statsmodels.stats.power import TTestIndPower, NormalIndPower, TTestPower
import numpy as np

# Power = P(reject H0 | H0 is false) = 1 - P(Type II error)
# Target: power >= 0.80 (common convention)

# For A/B test on conversion rate
# Control: 10% conversion, Treatment: 12% conversion (2% absolute lift)
baseline = 0.10
treatment_rate = 0.12
relative_lift = (treatment_rate - baseline) / baseline  # 20% relative

# Compute Cohen's h for proportions
from statsmodels.stats.proportion import proportion_effectsize
effect_size_h = proportion_effectsize(treatment_rate, baseline)
print(f"Effect size h: {effect_size_h:.4f}")

power_analysis = NormalIndPower()
n = power_analysis.solve_power(
    effect_size=effect_size_h,
    alpha=0.05,
    power=0.80,
    ratio=1.0,     # equal group sizes
    alternative='two-sided',
)
print(f"Required n per group: {int(np.ceil(n))}")
# Run the experiment until you have n per group, then analyze ONCE

# Impact of different power/alpha choices
for power in [0.70, 0.80, 0.90]:
    for alpha in [0.05, 0.01]:
        n = power_analysis.solve_power(effect_size=effect_size_h, alpha=alpha, power=power, ratio=1.0)
        print(f"power={power}, alpha={alpha}: n={int(np.ceil(n))}")

Q11. What is the difference between one-tailed and two-tailed tests?

from scipy import stats
import numpy as np

x = np.random.normal(5.5, 2.0, 50)  # sample from unknown distribution

# One-tailed test: directional hypothesis
# H0: mu <= 5; H1: mu > 5 (we expect HIGHER, not just different)
t_stat, p_two = stats.ttest_1samp(x, popmean=5)
p_one_right = p_two / 2 if t_stat > 0 else 1 - p_two / 2  # right tail
p_one_left  = p_two / 2 if t_stat < 0 else 1 - p_two / 2   # left tail

print(f"Two-tailed p: {p_two:.4f}")
print(f"One-tailed p (right): {p_one_right:.4f}")

# When to use each:
# Two-tailed: "Does treatment affect the metric?" (could go either way)
# One-tailed: "Does treatment INCREASE the metric?" (strong directional prior)
# Rule: default to two-tailed. One-tailed only if you would discard the experiment
# result regardless of direction (e.g., "even if it's negative, we'll still ship")

# Important gotcha: pre-register the direction BEFORE seeing data.
# Switching from two-tailed to one-tailed after seeing results inflates Type I error.

Q12. Explain multiple testing correction. What is Bonferroni and FDR?

from statsmodels.stats.multitest import multipletests
from scipy import stats
import numpy as np

# Problem: testing 100 features at alpha=0.05 expects 5 false positives by chance
np.random.seed(42)
# Simulate 100 tests: 5 truly significant, 95 null
true_effects = np.array([1.0] * 5 + [0.0] * 95)
p_values = np.array([
    stats.ttest_1samp(np.random.normal(eff, 1.0, 50), popmean=0)[1]
    for eff in true_effects
])

# Without correction: how many discoveries at alpha=0.05?
raw_rejections = (p_values < 0.05).sum()
print(f"Raw rejections: {raw_rejections}")  # ~5 true + ~5 false positives

# Bonferroni correction: alpha' = alpha / m (m = number of tests)
# Strict -- controls FWER (Family-Wise Error Rate)
bonferroni_alpha = 0.05 / len(p_values)
bonferroni_rejections = (p_values < bonferroni_alpha).sum()
print(f"Bonferroni rejections: {bonferroni_rejections}")  # fewer false positives

# FDR (Benjamini-Hochberg): controls False Discovery Rate
# Allows some false positives but more powerful than Bonferroni
reject_bh, p_corrected, _, _ = multipletests(p_values, alpha=0.05, method='fdr_bh')
print(f"BH FDR rejections: {reject_bh.sum()}")  # better balance

# Use cases:
# Bonferroni: clinical trials, confirmatory analysis (strict FWER control)
# FDR/BH: genomics, feature screening, exploratory analysis (more power)
print(f"\nSummary:")
print(f"  True significant: 5")
print(f"  Raw (no correction): {raw_rejections}")
print(f"  Bonferroni: {bonferroni_rejections}")
print(f"  FDR (BH): {reject_bh.sum()}")

Q13. What is OLS regression? What are its assumptions?

import statsmodels.api as sm
import numpy as np
from scipy import stats

# Generate data with heteroscedasticity
np.random.seed(42)
X = np.random.uniform(0, 10, 200)
y = 2.5 + 1.8 * X + np.random.normal(0, X * 0.5)  # variance grows with X

# OLS regression
X_with_const = sm.add_constant(X)
model = sm.OLS(y, X_with_const).fit()
print(model.summary())

# ASSUMPTIONS (LINE):
# 1. Linearity: E[y|X] is linear in parameters
# 2. Independence: residuals are independent (no autocorrelation)
# 3. Normality: residuals are normally distributed (for valid t/F tests)
# 4. Equal variance (Homoscedasticity): Var(e) constant across X
#    Violation = heteroscedasticity

# Diagnostic tests
residuals = model.resid

# Test 1: Normality of residuals (Shapiro-Wilk)
stat, p_norm = stats.shapiro(residuals)
print(f"\nShapiro-Wilk normality test: p={p_norm:.4f}")

# Test 2: Heteroscedasticity (Breusch-Pagan)
from statsmodels.stats.diagnostic import het_breuschpagan
lm_stat, lm_p, f_stat, f_p = het_breuschpagan(residuals, model.model.exog)
print(f"Breusch-Pagan test: p={lm_p:.4f}")  # small p = heteroscedastic

# Test 3: Autocorrelation (Durbin-Watson)
from statsmodels.stats.stattools import durbin_watson
dw = durbin_watson(residuals)
print(f"Durbin-Watson: {dw:.4f}")  # ~2.0 = no autocorrelation

# Fix for heteroscedasticity: robust standard errors
model_robust = sm.OLS(y, X_with_const).fit(cov_type='HC3')
print(f"\nRobust std error for X: {model_robust.bse[1]:.4f}")
print(f"Standard std error for X: {model.bse[1]:.4f}")

Q14. What is multicollinearity? How does it affect regression?

import numpy as np
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
import pandas as pd

# Simulate collinear predictors
np.random.seed(42)
n = 200
x1 = np.random.randn(n)
x2 = 0.95 * x1 + 0.05 * np.random.randn(n)  # nearly collinear with x1
x3 = np.random.randn(n)   # independent
y = 2*x1 + 3*x2 + x3 + np.random.randn(n)

df = pd.DataFrame({"x1": x1, "x2": x2, "x3": x3})
X = sm.add_constant(df)
model = sm.OLS(y, X).fit()

print("Coefficients:")
print(model.params)
# NOTE: individual coefficients of x1, x2 are unstable/noisy
# Combined effect (2+3=5 total) is still estimable

# VIF detection
vif_data = pd.DataFrame()
vif_data["feature"] = df.columns
vif_data["VIF"] = [variance_inflation_factor(df.values, i) for i in range(df.shape[1])]
print("\nVIF:")
print(vif_data)
# x1 and x2: VIF >> 10 (severe collinearity)

# Effects of multicollinearity:
# 1. Large standard errors on individual coefficients
# 2. Coefficients are sensitive to small data changes
# 3. R-squared is unaffected -- prediction is fine
# 4. Interpretation of individual coefficients is unreliable

# Solutions: Ridge regression (L2), drop one predictor, PCA
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=1.0)
ridge.fit(df, y)
print("\nRidge coefficients:", ridge.coef_)  # more stable

Q15. What is logistic regression? Derive the log-odds interpretation.

import numpy as np
from scipy.special import expit  # sigmoid function

# Logistic regression: model log(p/(1-p)) = beta_0 + beta_1*x1 + ...
# p = sigmoid(beta_0 + beta_1*x1 + ...) = 1 / (1 + exp(-z))

def log_odds_to_prob(log_odds):
    return expit(log_odds)  # equivalent to 1 / (1 + exp(-log_odds))

# Interpretation example
# Trained model: log_odds = -3.2 + 0.4 * age + 0.8 * income_k
# age coefficient = 0.4: holding income constant, one unit increase in age
# multiplies the ODDS of default by exp(0.4) = 1.49 (49% higher odds)

age_coef = 0.4
odds_ratio_age = np.exp(age_coef)
print(f"Odds ratio for age: {odds_ratio_age:.3f}")  # 1.49

# For a customer: age=35, income_k=50
z = -3.2 + 0.4*35 + 0.8*50
prob = expit(z)
print(f"Default probability: {prob:.4f}")

# Decision boundary: p=0.5 where z=0
# -3.2 + 0.4*age + 0.8*income = 0 defines the boundary hyperplane

# MLE training
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss

model = LogisticRegression(C=1.0, max_iter=1000)
# model.fit(X_train, y_train)

# Log-loss (cross-entropy) = -sum(y_i * log(p_i) + (1-y_i) * log(1-p_i))
# Binary CE is the negative log-likelihood of Bernoulli model -- what LR minimizes

Q16. What is regularization in regression? Compare L1, L2, and Elastic Net.

from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
import numpy as np

X, y, coef = make_regression(n_samples=200, n_features=100, n_informative=20,
                               noise=10, coef=True, random_state=42)

# L2 (Ridge): penalize sum of squared coefficients
# Loss = MSE + alpha * sum(beta^2)
# Effect: shrinks all coefficients toward 0, never exactly 0
# Best for: multicollinearity, all features contribute
ridge = Ridge(alpha=1.0)
ridge_cv = cross_val_score(ridge, X, y, cv=5, scoring='r2')
print(f"Ridge R^2: {ridge_cv.mean():.4f}")

# L1 (Lasso): penalize sum of absolute coefficients
# Loss = MSE + alpha * sum(|beta|)
# Effect: drives some coefficients exactly to 0 (automatic feature selection)
# Best for: sparse solutions, feature selection
lasso = Lasso(alpha=0.1, max_iter=5000)
lasso.fit(X, y)
print(f"Lasso non-zero coefs: {(lasso.coef_ != 0).sum()} / {len(lasso.coef_)}")

# Elastic Net: combination of L1 and L2
# Loss = MSE + alpha * l1_ratio * sum(|beta|) + alpha * (1-l1_ratio) * sum(beta^2)
# Best for: correlated features where you still want sparsity
en = ElasticNet(alpha=0.1, l1_ratio=0.5)
en_cv = cross_val_score(en, X, y, cv=5, scoring='r2')
print(f"ElasticNet R^2: {en_cv.mean():.4f}")

# Alpha selection via cross-validation
from sklearn.linear_model import RidgeCV, LassoCV
ridge_cv_model = RidgeCV(alphas=[0.1, 1.0, 10.0, 100.0], cv=5)
ridge_cv_model.fit(X, y)
print(f"Best Ridge alpha: {ridge_cv_model.alpha_}")

Q17. What is the Pearson vs Spearman correlation? When does Spearman outperform Pearson?

from scipy import stats
import numpy as np

np.random.seed(42)
n = 100

# Linear relationship: Pearson and Spearman agree
x_lin = np.random.randn(n)
y_lin = 2 * x_lin + 0.5 * np.random.randn(n)

r_pearson, _ = stats.pearsonr(x_lin, y_lin)
r_spearman, _ = stats.spearmanr(x_lin, y_lin)
print(f"Linear: Pearson={r_pearson:.3f}, Spearman={r_spearman:.3f}")

# Monotonic but non-linear: Spearman captures it, Pearson misses
x_mono = np.random.uniform(0.1, 5, n)
y_mono = np.log(x_mono) + 0.2 * np.random.randn(n)

r_pearson2, _ = stats.pearsonr(x_mono, y_mono)
r_spearman2, _ = stats.spearmanr(x_mono, y_mono)
print(f"Log relationship: Pearson={r_pearson2:.3f}, Spearman={r_spearman2:.3f}")
# Spearman >> Pearson for non-linear monotonic

# With outliers: Spearman is robust, Pearson is not
x_out = np.random.randn(n)
y_out = x_out + 0.2 * np.random.randn(n)
x_out[0], y_out[0] = 10, -10  # outlier

r_pearson3, _ = stats.pearsonr(x_out, y_out)
r_spearman3, _ = stats.spearmanr(x_out, y_out)
print(f"With outlier: Pearson={r_pearson3:.3f}, Spearman={r_spearman3:.3f}")
# Spearman much more robust

# Summary:
# Pearson: linear relationships, no outliers, continuous variables
# Spearman: ordinal data, non-linear monotonic, presence of outliers
# Kendall's tau: small samples, many ties

Q18. What is ANOVA? When do you use it instead of a t-test?

from scipy import stats
import numpy as np

# t-test: compare means of TWO groups
# ANOVA: compare means of THREE OR MORE groups simultaneously
# Running multiple t-tests inflates Type I error (multiple testing problem)

# One-way ANOVA example: compare load times across 4 CDN providers
np.random.seed(42)
cdn_a = np.random.normal(150, 20, 50)  # ms
cdn_b = np.random.normal(145, 25, 50)
cdn_c = np.random.normal(165, 22, 50)
cdn_d = np.random.normal(155, 18, 50)

f_stat, p_value = stats.f_oneway(cdn_a, cdn_b, cdn_c, cdn_d)
print(f"F-statistic: {f_stat:.4f}")
print(f"p-value: {p_value:.4f}")
# If p < 0.05: at least one CDN is different (but doesn't say WHICH)

# Post-hoc test: Tukey HSD (find which pairs differ)
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import pandas as pd

all_data = np.concatenate([cdn_a, cdn_b, cdn_c, cdn_d])
groups = np.repeat(["A", "B", "C", "D"], 50)

tukey = pairwise_tukeyhsd(all_data, groups, alpha=0.05)
print(tukey.summary())

# ANOVA assumptions (same as linear regression):
# 1. Independence of observations
# 2. Normal distribution within each group (or large N via CLT)
# 3. Homogeneity of variances (Levene's test)

stat, p_levene = stats.levene(cdn_a, cdn_b, cdn_c, cdn_d)
print(f"\nLevene's test (equal variances): p={p_levene:.4f}")
# If p < 0.05: use Welch's ANOVA instead

Q19. What is Bayesian A/B testing? How does it differ from frequentist?

import numpy as np
from scipy import stats

# Frequentist A/B test: test H0, compute p-value, accept/reject
# Bayesian A/B: compute posterior distribution of treatment effect

# Bayesian approach for conversion rate comparison
# Prior: Beta(1, 1) = Uniform [0, 1] (no prior knowledge)
# Data:
control_conversions, control_n = 100, 1000    # 10% conversion
treatment_conversions, treatment_n = 120, 1000  # 12% conversion

# Posterior: Beta(alpha + conversions, beta + non-conversions)
posterior_control = stats.beta(
    1 + control_conversions,
    1 + control_n - control_conversions
)
posterior_treatment = stats.beta(
    1 + treatment_conversions,
    1 + treatment_n - treatment_conversions
)

# Monte Carlo to compute P(treatment > control)
n_samples = 100_000
rng = np.random.default_rng(42)
samples_c = posterior_control.rvs(n_samples)
samples_t = posterior_treatment.rvs(n_samples)

p_treatment_better = (samples_t > samples_c).mean()
print(f"P(treatment > control) = {p_treatment_better:.3f}")

# Expected lift
lift_samples = (samples_t - samples_c) / samples_c
print(f"Expected lift: {lift_samples.mean():.3f}")
print(f"95% credible interval for lift: ({np.quantile(lift_samples, 0.025):.3f}, {np.quantile(lift_samples, 0.975):.3f})")

# Bayesian advantages:
# - Can stop early (no peeking problem with sequential Bayes)
# - Provides P(treatment > control) directly (intuitive)
# - Credible intervals have natural "probability of being in range" interpretation
# - No need to pre-specify sample size

# Frequentist advantages:
# - Widely understood, auditable
# - Hard guarantee on Type I error
# - Preferred for regulatory/compliance contexts

Q20. What is the bootstrap? How do you use it to compute confidence intervals?

import numpy as np
from scipy import stats

def bootstrap_ci(data, statistic_fn, n_bootstrap=10_000, ci=0.95, seed=42):
    """
    Compute bootstrap confidence interval for any statistic.
    data: array-like of observations
    statistic_fn: function that takes an array and returns a scalar
    """
    rng = np.random.default_rng(seed)
    n = len(data)
    bootstrap_stats = np.array([
        statistic_fn(rng.choice(data, size=n, replace=True))
        for _ in range(n_bootstrap)
    ])
    alpha = (1 - ci) / 2
    ci_low, ci_high = np.quantile(bootstrap_stats, [alpha, 1 - alpha])
    return {
        "estimate": statistic_fn(data),
        "ci_low": ci_low,
        "ci_high": ci_high,
        "bootstrap_std": bootstrap_stats.std(),
    }

# Example: CI for median (no closed-form formula)
np.random.seed(42)
data = np.random.lognormal(mean=3, sigma=0.5, size=100)  # right-skewed

result = bootstrap_ci(data, np.median)
print(f"Median: {result['estimate']:.2f}")
print(f"95% CI: ({result['ci_low']:.2f}, {result['ci_high']:.2f})")

# Example: CI for Pearson correlation
x = np.random.randn(50)
y = 0.6 * x + np.random.randn(50) * 0.8

def pearson_r(data):
    x, y = data[:, 0], data[:, 1]
    return stats.pearsonr(x, y)[0]

xy_data = np.column_stack([x, y])
result_corr = bootstrap_ci(xy_data, pearson_r)
print(f"\nCorrelation: {result_corr['estimate']:.4f}")
print(f"95% CI: ({result_corr['ci_low']:.4f}, {result_corr['ci_high']:.4f})")

# When to use bootstrap:
# - Unknown or complex distribution of the statistic
# - Small sample sizes
# - Complex statistics (median, percentile, correlation, AUROC)
# - Difference-in-means for non-normal data

HARD: Experimental Design and Causal Inference (Questions 21-28)

Q21. What are the assumptions required for a valid A/B test?

import pandas as pd
import numpy as np
from scipy import stats

# Assumption 1: SUTVA (Stable Unit Treatment Value Assumption)
# User A's outcome is not affected by user B's treatment assignment
# VIOLATED by: social products (network effects), shared resources, spillover

# Assumption 2: Random Assignment (no confounding)
def check_srm(control_n, treatment_n, expected_ratio=1.0):
    """Sample Ratio Mismatch (SRM) check via chi-squared test."""
    total = control_n + treatment_n
    expected_control = total / (1 + expected_ratio)
    expected_treatment = total * expected_ratio / (1 + expected_ratio)

    chi2, p = stats.chisquare(
        f_obs=[control_n, treatment_n],
        f_exp=[expected_control, expected_treatment]
    )
    return {"chi2": chi2, "p_value": p, "srm": p < 0.001}

result = check_srm(4950, 5050)  # slight imbalance
print(f"SRM check: {result}")

# Assumption 3: Consistent measurement
# Same event tracking, same attribution window, same population filter

# Assumption 4: Single exposure per user
# No user should be in both control and treatment

# Assumption 5: No interaction between experiments
# Multiple concurrent experiments must be on orthogonal user segments

# Guardrail metrics: metrics that should NOT change
# A/B test is invalid if guardrails fail
def validate_experiment(exp_data: pd.DataFrame) -> dict:
    """Run A/B test validity checks."""
    checks = {}

    # SRM
    n_control = (exp_data["variant"] == "control").sum()
    n_treatment = (exp_data["variant"] == "treatment").sum()
    checks["srm"] = check_srm(n_control, n_treatment)

    # User overlap
    control_users = set(exp_data[exp_data["variant"] == "control"]["user_id"])
    treatment_users = set(exp_data[exp_data["variant"] == "treatment"]["user_id"])
    overlap = control_users & treatment_users
    checks["user_overlap"] = len(overlap)

    return checks

Q22. What are A/A tests? When and why do you run them?

An A/A test splits users into two groups and shows them IDENTICAL experiences. Expected result: no statistically significant difference.

Purposes:

Validate the testing infrastructure: confirm randomization is truly random (SRM check).
Calibrate Type I error rate: if 5% of A/A tests show p < 0.05, the system is well-calibrated.
Establish baseline variance: measure natural day-to-day variation in your metrics.

import numpy as np
from scipy import stats

# Simulate A/A test: both groups from same distribution
np.random.seed(42)
n = 5000
group_a = np.random.normal(10.0, 2.0, n)  # same distribution
group_b = np.random.normal(10.0, 2.0, n)  # same distribution

t_stat, p_val = stats.ttest_ind(group_a, group_b)
print(f"A/A test: t={t_stat:.4f}, p={p_val:.4f}")
# Expect p > 0.05 most of the time (when system is correct)

# Run 1000 A/A tests to calibrate false positive rate
p_values = [
    stats.ttest_ind(
        np.random.normal(10, 2, n),
        np.random.normal(10, 2, n)
    )[1]
    for _ in range(1000)
]
false_positive_rate = (np.array(p_values) < 0.05).mean()
print(f"False positive rate from 1000 A/A tests: {false_positive_rate:.3f}")
# Should be ~0.05 if calibrated correctly
# If much higher (0.15+): there's a randomization or metric bug

Q23. What is the difference-in-differences (DiD) method?

DiD estimates causal effects from observational data when randomized experiments are infeasible.

DiD estimate = (Y_treatment_post - Y_treatment_pre) - (Y_control_post - Y_control_pre)

import statsmodels.formula.api as smf
import pandas as pd
import numpy as np

# Example: did a new payment feature reduce cart abandonment?
# Treatment group: users who adopted the feature
# Control group: users who did not adopt the feature
# Pre-period: before feature launch, Post-period: after launch

np.random.seed(42)
n = 1000

# Simulate panel data
data = pd.DataFrame({
    "user_id": np.tile(np.arange(n//2), 2) + np.repeat([0, n//2], n//2),
    "post": np.repeat([0, 1], n//2),  # 0=pre, 1=post
    "treated": np.tile(np.repeat([0, 1], n//4), 2),  # 0=control, 1=treatment
})

# Outcomes: treatment group improved more post-launch
data["abandonment_rate"] = (
    0.30                               # baseline
    - 0.05 * data["treated"]           # pre-existing difference (ok)
    - 0.08 * data["post"]             # time trend
    - 0.06 * (data["treated"] * data["post"])  # TRUE treatment effect: -6%
    + np.random.normal(0, 0.05, n)
)

# DiD regression: outcome ~ treated + post + treated*post
model = smf.ols("abandonment_rate ~ treated + post + treated:post", data=data).fit()
print(model.summary())
# Coefficient on treated:post = DiD estimate = ~-0.06

# Parallel trends assumption: treatment and control must have SAME TREND pre-treatment
# Violated if treated group was already changing faster before the intervention

Q24. What is propensity score matching? Give a DS use case.

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

def propensity_score_match(df, treatment_col, outcome_col, covariate_cols, caliper=0.05):
    """
    Estimate ATE using propensity score matching.
    """
    # Estimate propensity score P(treatment=1 | covariates)
    X = StandardScaler().fit_transform(df[covariate_cols])
    y = df[treatment_col]

    lr = LogisticRegression(max_iter=1000)
    lr.fit(X, y)
    df["propensity_score"] = lr.predict_proba(X)[:, 1]

    # Match treated units to nearest control unit
    treated = df[df[treatment_col] == 1].copy()
    control = df[df[treatment_col] == 0].copy()

    matched_pairs = []
    used_controls = set()

    for _, t_row in treated.iterrows():
        distances = abs(control["propensity_score"] - t_row["propensity_score"])
        distances = distances[~distances.index.isin(used_controls)]

        if len(distances) == 0 or distances.min() > caliper:
            continue

        best_control = distances.idxmin()
        matched_pairs.append({
            "treated_outcome": t_row[outcome_col],
            "control_outcome": control.loc[best_control, outcome_col],
        })
        used_controls.add(best_control)

    pairs_df = pd.DataFrame(matched_pairs)
    att = (pairs_df["treated_outcome"] - pairs_df["control_outcome"]).mean()
    return {"ATT": att, "n_matched": len(pairs_df)}

# Use case: estimate effect of loyalty program on purchase frequency
# Can't randomize (customers self-select into program)
# Use propensity matching to control for selection bias

np.random.seed(42)
n = 2000
age = np.random.randint(18, 65, n)
purchase_hist = np.random.poisson(lam=5, size=n)
# Treatment assignment depends on covariates (selection bias)
propensity = 1 / (1 + np.exp(-(0.02*age + 0.1*purchase_hist - 2)))
treated = np.random.binomial(1, propensity)
# Outcome: purchases in next 30 days
outcome = 8 + 3*treated + 0.05*age + 0.5*purchase_hist + np.random.randn(n)

df = pd.DataFrame({"treated": treated, "outcome": outcome,
                   "age": age, "purchase_hist": purchase_hist})
result = propensity_score_match(df, "treated", "outcome",
                                 ["age", "purchase_hist"])
print(f"ATT (loyalty program effect): {result['ATT']:.3f}")
print(f"Matched pairs: {result['n_matched']}")

Q25. What is time series stationarity? How do you test for it?

import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.tsa.seasonal import seasonal_decompose

# Stationarity: statistical properties (mean, variance, autocorrelation)
# do not change over time. Required for ARIMA modeling.

np.random.seed(42)
t = np.arange(200)

# Non-stationary series (upward trend + increasing variance)
non_stationary = 0.1 * t + np.cumsum(np.random.randn(200))

# Stationary series (no trend, constant variance)
stationary = np.random.randn(200)

def stationarity_tests(series, name=""):
    print(f"\n--- {name} ---")

    # Augmented Dickey-Fuller test: H0 = unit root (non-stationary)
    adf_result = adfuller(series, autolag='AIC')
    print(f"ADF statistic: {adf_result[0]:.4f}")
    print(f"ADF p-value: {adf_result[1]:.4f}")
    print(f"ADF conclusion: {'stationary' if adf_result[1] < 0.05 else 'non-stationary'}")

    # KPSS test: H0 = stationary (opposite null!)
    kpss_result = kpss(series, regression='c', nlags='auto')
    print(f"KPSS statistic: {kpss_result[0]:.4f}")
    print(f"KPSS p-value: {kpss_result[1]:.4f}")
    print(f"KPSS conclusion: {'non-stationary' if kpss_result[1] < 0.05 else 'stationary'}")

stationarity_tests(non_stationary, "Non-stationary (trending)")
stationarity_tests(stationary, "Stationary")

# Make non-stationary series stationary:
# 1. Differencing (most common)
differenced = np.diff(non_stationary)
stationarity_tests(differenced, "After first differencing")

# 2. Log transformation (for exponential growth / heteroscedasticity)
prices = np.exp(0.1 * t + np.random.randn(200))
log_returns = np.diff(np.log(prices))

# 3. Seasonal differencing: subtract lag-s (s=12 for monthly data)
# y_t - y_{t-12}

Q26. What is autocorrelation? How does it affect time series modeling?

import numpy as np
import pandas as pd
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.stats.stattools import durbin_watson
from statsmodels.tsa.arima.model import ARIMA

np.random.seed(42)
n = 300

# AR(1) process: y_t = 0.7 * y_{t-1} + epsilon_t
y = np.zeros(n)
for t in range(1, n):
    y[t] = 0.7 * y[t-1] + np.random.randn()

# Autocorrelation: correlation of series with its own lags
# ACF: autocorrelation at each lag
# PACF: partial autocorrelation (controlling for intermediate lags)

# Durbin-Watson test for autocorrelation in residuals
dw = durbin_watson(y)
print(f"Durbin-Watson: {dw:.4f}")
# ~2.0 = no autocorrelation
# ~0 = positive autocorrelation
# ~4 = negative autocorrelation

# ARIMA model selection heuristic:
# ACF cuts off at lag q -> MA(q) process
# PACF cuts off at lag p -> AR(p) process
# Both decay slowly -> ARIMA(p, 1, q) needed (difference first)

# Fit ARIMA model
model = ARIMA(y, order=(1, 0, 0))  # AR(1)
result = model.fit()
print(f"\nAR(1) coefficient: {result.params[1]:.4f}")  # should be ~0.7
print(f"AIC: {result.aic:.2f}")

# Forecast
forecast = result.forecast(steps=10)
print(f"10-step ahead forecast: {forecast[:5]}")

Q27. What is survival analysis? How does the Kaplan-Meier estimator work?

import numpy as np
import pandas as pd
from lifelines import KaplanMeierFitter, CoxPHFitter

# Survival analysis: time until an event (churn, failure, death)
# Handles censored data: users who haven't churned yet at observation end

np.random.seed(42)
n = 500

# Simulate subscription data
# Groups: premium vs basic plan
tenure_days = np.random.exponential(scale=180, size=n)  # mean 6 months
plan = np.random.choice(["basic", "premium"], size=n, p=[0.6, 0.4])

# Premium users have lower churn rate
churn_prob_basic = 1 - np.exp(-tenure_days / 150)
churn_prob_premium = 1 - np.exp(-tenure_days / 250)
churned = np.where(
    plan == "basic",
    np.random.binomial(1, churn_prob_basic.clip(0, 1)),
    np.random.binomial(1, churn_prob_premium.clip(0, 1))
)

df = pd.DataFrame({"tenure": tenure_days.astype(int) + 1,
                   "churned": churned, "plan": plan})

# Kaplan-Meier: non-parametric survival curve estimator
kmf = KaplanMeierFitter()
kmf.fit(df[df["plan"] == "basic"]["tenure"],
        df[df["plan"] == "basic"]["churned"],
        label="Basic")
kmf.fit(df[df["plan"] == "premium"]["tenure"],
        df[df["plan"] == "premium"]["churned"],
        label="Premium")

# Median survival time (when 50% have churned)
print(f"Median tenure (basic): {KaplanMeierFitter().fit(df[df['plan']=='basic']['tenure'], df[df['plan']=='basic']['churned']).median_survival_time_:.0f} days")

# Log-rank test: are survival curves different?
from lifelines.statistics import logrank_test
result = logrank_test(
    df[df["plan"]=="basic"]["tenure"],
    df[df["plan"]=="premium"]["tenure"],
    df[df["plan"]=="basic"]["churned"],
    df[df["plan"]=="premium"]["churned"],
)
print(f"Log-rank test p={result.p_value:.4f}")

Q28. Design a complete experiment for measuring impact of a new onboarding flow.

Product: E-commerce app
Change: New onboarding flow (5 steps -> 3 steps)
Primary metric: Day-7 retention
Guardrail metrics: D1 retention, checkout completion rate, error rate

Step 1: Define hypotheses
  H0: Day-7 retention is the same for both flows
  H1: New flow has higher Day-7 retention (one-tailed, we designed for improvement)

Step 2: Power analysis
  Baseline D7 retention: 35%
  Minimum detectable effect: 3% absolute (35% -> 38%)
  Alpha: 0.05, Power: 0.80

  from statsmodels.stats.proportion import proportion_effectsize
  from statsmodels.stats.power import NormalIndPower
  effect_size = proportion_effectsize(0.38, 0.35)  # ~0.063
  n = NormalIndPower().solve_power(effect_size, alpha=0.05, power=0.80, ratio=1.0)
  # n ~= 3,100 per group
  # At 1,000 new users/day: run for 7 days minimum, stop analysis at day 14

Step 3: Randomization
  - New users only (fresh installs)
  - 50/50 split at first app open
  - Hash(user_id + experiment_salt) % 100 < 50 -> control
  - Exclude: users who bypassed onboarding, corporate accounts

Step 4: Duration
  - Minimum 7 days to observe D7 retention
  - Add 7 days for full outcome observation = 14-day experiment
  - Check for novelty effect: compare first week vs second week for early adopters

Step 5: Analysis
  - Primary: two-proportion z-test on D7 retention
  - Check SRM: n_control vs n_treatment should be 50/50 (chi-square test)
  - Segment analysis: by platform (iOS/Android), user country, device type
  - Guardrail checks: D1 retention, checkout completion, error rate (no regressions)

Step 6: Decision criteria
  - Ship if: p < 0.05 AND guardrails clean AND effect > MDE
  - Flag if: guardrail failure even if primary is positive
  - No ship if: SRM detected (investigate randomization first)

FAQ

Q: What statistical mistakes do DS candidates make most often in interviews?

A: Three most common: (1) saying "p-value is probability the null is true" -- it is not; (2) running an A/B test without pre-specifying sample size and stopping based on when significance appears (peeking problem); (3) confusing statistical significance with practical significance. Candidates from public preparation resources consistently identify these three as the top elimination questions.

Q: Do I need to know ARIMA deeply for DS interviews?

A: For time series roles (demand forecasting, finance, supply chain), yes -- stationarity, differencing, ACF/PACF interpretation, and model selection are tested. For general DS roles, understanding stationarity and knowing when to apply time-series-specific CV is sufficient.

Q: How much probability theory is expected in DS interviews vs applied statistics?

A: Most product company DS interviews focus on applied statistics (hypothesis testing, regression, A/B design) over pure probability theory. Combinatorics and conditional probability appear mainly in quant-leaning roles. Confirm the expected depth on the official company careers portal before your round.

Sources and review notesreviewed 8 Jun 2026

Article-specific sources

Verification window

Page last edited 8 Jun 2026 by Aditya Sharma. A review date records an editorial edit, not a guarantee that every external fact is still current.

Evidence labels

Official notices, candidate reports, offer documents, and editorial practice questions carry different confidence levels. The visible source list lets you inspect the evidence instead of relying on a blanket verification badge.

Verification policy: /editorial-standards/. Found something incorrect? Submit a correction - we respond within 48 hours.

topic cluster

Sat this this year? Share your story, earn ₹500.

First-person experience reports help future candidates prep smarter. We pay verified contributors ₹500 via UPI per accepted story with byline.

Submit your story →

ready to practice?

Take a free timed mock test

Put what you learned into practice. Our mock tests match the 2026 pattern with timer, navigator, reveal, and score breakdown. No signup.

Start free mock test →

related guides

Interview Questions

Share this guide

Twitter LinkedIn W WhatsApp

Statistics for Data Science 2026: 28 Interview Questions with Answers

Core Areas Tested by Company Type

EASY: Probability and Distributions (Questions 1-8)

Q1. What is the difference between probability and statistics?

Q2. What are the key probability distributions? When do you use each in data science?

Q3. What is the difference between PMF, PDF, and CDF?

Q4. Explain the Law of Large Numbers and its relevance to data science.

Q5. What is conditional probability? Give a data science example.

Q6. What are expected value and variance? How are they computed for common distributions?

Q7. What is the birthday problem? Why is it relevant to data engineering?

Q8. Explain Chebyshev's inequality and when to use it.

MEDIUM: Hypothesis Testing and Regression (Questions 9-20)

Q9. Walk through the full hypothesis testing framework.

Q10. What is statistical power? How do you compute required sample size?

Q11. What is the difference between one-tailed and two-tailed tests?

Q12. Explain multiple testing correction. What is Bonferroni and FDR?

Q13. What is OLS regression? What are its assumptions?

Q14. What is multicollinearity? How does it affect regression?

Q15. What is logistic regression? Derive the log-odds interpretation.

Q16. What is regularization in regression? Compare L1, L2, and Elastic Net.

Q17. What is the Pearson vs Spearman correlation? When does Spearman outperform Pearson?

Q18. What is ANOVA? When do you use it instead of a t-test?

Q19. What is Bayesian A/B testing? How does it differ from frequentist?

Q20. What is the bootstrap? How do you use it to compute confidence intervals?

HARD: Experimental Design and Causal Inference (Questions 21-28)

Q21. What are the assumptions required for a valid A/B test?

Q22. What are A/A tests? When and why do you run them?

Q23. What is the difference-in-differences (DiD) method?

Q24. What is propensity score matching? Give a DS use case.

Q25. What is time series stationarity? How do you test for it?

Q26. What is autocorrelation? How does it affect time series modeling?

Q27. What is survival analysis? How does the Kaplan-Meier estimator work?

Q28. Design a complete experiment for measuring impact of a new onboarding flow.

FAQ

Q: What statistical mistakes do DS candidates make most often in interviews?

Q: Do I need to know ARIMA deeply for DS interviews?

Q: How much probability theory is expected in DS interviews vs applied statistics?

More resources in Interview Questions

Sat this this year? Share your story, earn ₹500.

Take a free timed mock test

Data Science Interview Questions 2026: 30 Answers with Code

Top 40 R Programming Interview Questions 2026

Airflow Interview Questions 2026: 25 Answers with Code

Apache Spark Interview Questions 2026: 28 Answers with Code

Data Engineering Interview Questions 2026, Top 50 Questions with Answers

Share this guide