issue 117apr 27mmxxvi
est. 2017
Sun, 27 Apr 2026
vol. IX · no. 117
PapersAdda
placement intelligence, since 2017
640+ briefs · 24 campuses · by reservation
verified offers · sourced from r/developersIndia
razorpay₹65.00 LPA· iit-d · sde-1google₹54.00 LPA· iiit-h · swe-imicrosoft₹49.50 LPA· iit-b · sdeatlassian₹38.00 LPA· nit-w · sde-1amazon₹44.20 LPA· bits-p · sde-1uber₹42.00 LPA· iit-kgp · sde-1razorpay₹65.00 LPA· iit-d · sde-1google₹54.00 LPA· iiit-h · swe-imicrosoft₹49.50 LPA· iit-b · sdeatlassian₹38.00 LPA· nit-w · sde-1amazon₹44.20 LPA· bits-p · sde-1uber₹42.00 LPA· iit-kgp · sde-1

NLP Interview Questions 2026: 30 Answers with Code

28 min read
Interview Questions
Updated: 8 Jun 2026
Aditya Sharma
Aditya's Edit

PapersAdda 2026 Placement Cycle

By Aditya Sharma·Founder & Editor, PapersAdda

What changed in 2026 drives

Mass-recruiter offer letters are flatter for 2026 batch - the 4-5 LPA ASE band has barely budged in three years while inflation eats real wages. Premium tracks (Digital, Pro, Elite, Specialist) are still where the differential lives, and they are entirely test-driven. If you are aiming higher than the default offer, the coding round is not optional pageantry - it is the entire interview.

What I'd actually study for this

  • 01Two solid coding-round answers (1 medium-hard DSA each, with edge-case discussion) > five half-baked ones
  • 02One real project you can defend end-to-end - file paths, design decisions, and what you would change
  • 03One DBMS schema you actually built (not a textbook ER diagram), with at least 3 join-heavy queries written from memory
  • 04Three behavioural STAR stories: failure recovered, conflict handled, ownership taken

Where most candidates trip up

The single biggest mistake is treating company-specific guides as primary prep and DSA as secondary. It is the opposite. Mass recruiters use the test as a filter, but premium tracks at every IT services company use coding to allocate offer band. Spend 70% of prep time on DSA + system fundamentals, 20% on company-specific patterns, 10% on HR rehearsal. Reverse that ratio and you collect the default offer.

Editorial commentary by Aditya Sharma · written for PapersAdda · not generated, not aggregated.

Natural language processing is one of the highest-demand specializations in 2026. Every company that processes text, which is nearly every company, needs NLP engineers. The interview arc goes from classical text preprocessing all the way through fine-tuning large language models. This guide covers 30 NLP interview questions with full answers, Python code, and comparison tables for 2026.

PapersAdda's take: NLP interviews in 2026 have a sharp divide. Companies hiring for LLM-adjacent roles will ask about transformers, fine-tuning, RAG, and evaluation. Companies building internal NLP tools will ask classical NLP plus modern BERT fine-tuning. Know both tracks. Candidates report that tokenization internals and BERT vs GPT architecture tradeoffs appear in virtually every NLP interview. According to candidate accounts from public preparation resources, RAG system design questions have become standard at AI-focused startups. Confirm the specific NLP interview format on the official careers portal before your round.

Related articles: AI/ML Interview Questions 2026 | Deep Learning Interview Questions 2026 | LLM Interview Questions 2026 | Generative AI Interview Questions 2026 | Prompt Engineering Interview Questions 2026 | Machine Learning Interview Questions 2026


Which Companies Ask These Questions?

TopicCompanies
Text preprocessing and classical NLPAll companies with NLP roles
Word embeddings (Word2Vec, GloVe)All ML teams
BERT fine-tuningGoogle, Microsoft, Flipkart, Meesho
LLM + RAG systemsOpenAI, Cohere, Anthropic-adjacent startups
NER and Information ExtractionFreshworks, Sprinklr, Sarvam AI
Machine TranslationGoogle, Microsoft, Koo, Bhashini
Conversational AIAmazon Alexa, Google, Jio AI

EASY: Foundations (Questions 1-10)

Q1. What is tokenization? Compare word, subword, and character tokenization.

TypeUnitVocab SizeOOV HandlingExamples
WordWhole words50K-500KBad (unseen words)Early NLP, basic bag-of-words
Subword (BPE)Frequent substrings30K-50KGood (breaks into subwords)GPT-2, GPT-4, LLaMA
Subword (WordPiece)Learned subwords30KGoodBERT, DistilBERT
CharacterIndividual chars~256PerfectRare; very long sequences
SentencePieceLanguage-agnostic BPE/unigram32KGoodT5, LLaMA, multilingual models
from transformers import AutoTokenizer

# BERT WordPiece
bert_tok = AutoTokenizer.from_pretrained('bert-base-uncased')
tokens = bert_tok.tokenize("unbelievable transformers")
print(tokens)  # ['un', '##believe', '##able', 'transformers']

# GPT-2 BPE (Byte-Pair Encoding)
gpt_tok = AutoTokenizer.from_pretrained('gpt2')
tokens = gpt_tok.encode("unbelievable transformers")
print(gpt_tok.convert_ids_to_tokens(tokens))  # ['un', 'believable', 'Ġtransformers']

# LLaMA SentencePiece
llama_tok = AutoTokenizer.from_pretrained('meta-llama/Llama-3-8b')
encoded = llama_tok("Hello world", return_tensors='pt')
print(encoded.input_ids, encoded.attention_mask)

Q2. What is TF-IDF and when is it still useful in 2026?

TF(t, d)  = count(t in d) / total_words(d)
IDF(t)    = log(N / df(t))           # N = total docs, df(t) = docs containing t
TF-IDF    = TF * IDF
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

# TF-IDF is still effective for short-text classification at low compute cost
pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(
        ngram_range=(1,2),      # unigrams and bigrams
        max_features=50000,
        sublinear_tf=True,      # log(1 + tf) instead of raw tf
        min_df=2,               # ignore very rare terms
        strip_accents='unicode',
        analyzer='word'
    )),
    ('clf', LogisticRegression(C=5, max_iter=1000))
])

pipeline.fit(X_train_text, y_train)
print(pipeline.score(X_test_text, y_test))

Still useful in 2026 when:

  • Extremely low-latency requirement where BERT is too slow
  • Short text classification baseline
  • High-volume, resource-constrained environments
  • Interpretability required (TF-IDF features are human-readable)

Q3. Explain Word2Vec. What is the difference between CBOW and Skip-gram?

VariantTaskFaster For
CBOW (Continuous Bag of Words)Predict center word from contextFrequent words
Skip-gramPredict context words from center wordRare words, larger datasets
from gensim.models import Word2Vec
import numpy as np

sentences = [["the", "cat", "sat", "on", "mat"],
             ["the", "dog", "ran", "in", "park"],
             ["cat", "and", "dog", "are", "pets"]]

# Skip-gram (sg=1): better for rare words, recommended
model = Word2Vec(sentences, vector_size=100, window=5,
                 min_count=1, workers=4, sg=1, epochs=100)

# Semantic relationships
print(model.wv.most_similar('cat'))
print(model.wv.similarity('cat', 'dog'))

# Famous analogy: king - man + woman = queen
result = model.wv.most_similar(positive=['king', 'woman'],
                                negative=['man'], topn=1)

# Negative sampling: faster than hierarchical softmax for large vocab
model_neg = Word2Vec(sentences, vector_size=100, window=5,
                      negative=5, ns_exponent=0.75)  # 5 negative samples

Q4. What are GloVe embeddings? How do they differ from Word2Vec?

PropertyWord2VecGloVe
MethodPredictive (local context windows)Count-based (global co-occurrence matrix)
SpeedFaster trainingRequires full corpus co-occurrence computation
QualityBetter on syntactic analogiesBetter on semantic analogies
Download-readyYes (pre-trained)Yes (pre-trained on Common Crawl)

GloVe objective: Minimize the difference between the dot product of word vectors and the log of their co-occurrence count:

J = Σ f(X_ij) (w_i^T w_j + b_i + b_j - log X_ij)^2
import numpy as np

def load_glove(filepath, max_words=None):
    """Load GloVe pretrained vectors."""
    embeddings = {}
    with open(filepath, 'r', encoding='utf-8') as f:
        for i, line in enumerate(f):
            if max_words and i >= max_words:
                break
            values = line.split()
            word = values[0]
            vector = np.array(values[1:], dtype='float32')
            embeddings[word] = vector
    return embeddings

# glove = load_glove('glove.6B.300d.txt')
# print(glove['king'].shape)  # (300,)

# Use with Keras/PyTorch via embedding matrix
# embedding_matrix = build_from_glove(vocab, glove, dim=300)

2026 status: GloVe and Word2Vec are largely replaced by contextual embeddings (BERT, sentence-transformers). Still used as fast, lightweight baselines and in resource-constrained settings.


Q5. What is a language model? What is perplexity?

P(w_1, ..., w_n) = P(w_1) * P(w_2|w_1) * ... * P(w_n|w_1,...,w_{n-1})

Perplexity: Measures how surprised the model is by the test text. Lower = better.

Perplexity = exp(-1/N * Σ log P(w_i | w_{<i}))

A perplexity of 20 means the model is as uncertain as if it were choosing uniformly from 20 options at each step.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

def compute_perplexity(text, model_name='gpt2'):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)
    model.eval()

    encodings = tokenizer(text, return_tensors='pt')
    input_ids = encodings.input_ids

    with torch.no_grad():
        outputs = model(input_ids, labels=input_ids)
        neg_log_likelihood = outputs.loss    # mean NLL per token

    perplexity = torch.exp(neg_log_likelihood).item()
    return perplexity

# Lower perplexity = model assigns higher probability to this text
# GPT-2 on general English text: ~50
# Fine-tuned GPT-2 on domain text: ~20-30

Q6. What is Named Entity Recognition (NER)? How do you build an NER system?

Approaches in 2026:

ApproachToolAccuracySpeed
Rule-basedspaCy EntityRulerLowVery fast
CRFsklearn-crfsuiteMediumFast
BiLSTM-CRFCustom PyTorchGoodMedium
BERT fine-tunedHuggingFaceBestSlower
Zero-shot (LLM)GPT-4, ClaudeNear-bestSlowest
# Modern approach: fine-tune BERT for NER
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import TrainingArguments, Trainer
import numpy as np

model_name = 'bert-base-cased'
tokenizer = AutoTokenizer.from_pretrained(model_name)

label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG',
              'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC']
id2label = {i: l for i, l in enumerate(label_list)}
label2id = {l: i for i, l in id2label.items()}

model = AutoModelForTokenClassification.from_pretrained(
    model_name,
    num_labels=len(label_list),
    id2label=id2label,
    label2id=label2id
)

# Quick inference with pipeline
from transformers import pipeline
ner = pipeline('ner', model='dslim/bert-base-NER', aggregation_strategy='simple')
entities = ner("Tata Consultancy Services was founded in Mumbai.")
for ent in entities:
    print(f"{ent['word']}: {ent['entity_group']} ({ent['score']:.2f})")
# Output: Tata Consultancy Services: ORG (0.99), Mumbai: LOC (0.98)

Q7. What is sequence-to-sequence (seq2seq) and what are its applications?

Applications: Machine translation, summarization, question answering, code generation, speech recognition.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# T5 for summarization
tokenizer = AutoTokenizer.from_pretrained('t5-base')
model = AutoModelForSeq2SeqLM.from_pretrained('t5-base')

text = """Researchers have developed a new deep learning model that achieves
          state-of-the-art results on multiple NLP benchmarks with 10x less
          compute than previous approaches."""

inputs = tokenizer("summarize: " + text, return_tensors='pt',
                    max_length=512, truncation=True)

outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=100,
    num_beams=4,            # beam search
    early_stopping=True,
    no_repeat_ngram_size=3  # prevent repetition
)

summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(summary)

Q8. What is beam search? How does it differ from greedy decoding?

DecodingHowSpeedQuality
GreedyPick highest-probability token at each stepFastestOften suboptimal
Beam searchMaintain top-k sequences at each stepSlowerBetter for translation/summarization
Top-k samplingSample from top-k tokensFastCreative text generation
Top-p (nucleus)Sample from minimum tokens covering probability pFastMore natural than top-k
TemperatureScale logits before softmaxFreeHigher T = more diverse
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained('gpt2')
model = AutoModelForCausalLM.from_pretrained('gpt2')

inputs = tokenizer("Machine learning is", return_tensors='pt')

# Greedy decoding
greedy_out = model.generate(inputs.input_ids, max_new_tokens=30)

# Beam search (good for factual tasks)
beam_out = model.generate(inputs.input_ids, max_new_tokens=30,
                           num_beams=5, early_stopping=True)

# Sampling with temperature + top-p (good for creative tasks)
sampled_out = model.generate(
    inputs.input_ids, max_new_tokens=50,
    do_sample=True, temperature=0.8,
    top_p=0.9, top_k=50
)

# Beam search with repetition penalty (avoids repeated phrases)
beam_no_rep = model.generate(inputs.input_ids, max_new_tokens=30,
                              num_beams=5, repetition_penalty=1.3)

Q9. What is sentiment analysis? Compare rule-based and ML approaches.

ApproachAccuracySpeedMaintenance
VADER (rule-based)~70% on reviewsVery fastNo training
TF-IDF + LR~85-88%FastLabeled data needed
BERT fine-tuned~93-95%SlowerFine-tuning data needed
LLM zero-shot~90-92%SlowNo labeled data needed
# VADER for quick baseline
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()
scores = analyzer.polarity_scores("The product is absolutely amazing!")
print(scores)  # {'neg': 0.0, 'neu': 0.295, 'pos': 0.705, 'compound': 0.6368}

# BERT fine-tuned (best accuracy)
from transformers import pipeline

sentiment = pipeline('sentiment-analysis',
                      model='cardiffnlp/twitter-roberta-base-sentiment-latest')
result = sentiment("PapersAdda's content is genuinely helpful for interviews.")
print(result)  # [{'label': 'Positive', 'score': 0.98}]

# Zero-shot with LLM
zero_shot = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')
result = zero_shot("The service was terrible and I want a refund.",
                    candidate_labels=['positive', 'negative', 'neutral'])

Q10. What is the difference between extractive and abstractive summarization?

TypeMethodOutputModel
ExtractiveSelect and copy sentences from sourceVerbatim sentencesTextRank, BERT-extractive
AbstractiveGenerate new text capturing key ideasParaphrased summaryT5, BART, PEGASUS
from transformers import pipeline
from summarizer import Summarizer  # BERT-extractive

text = """The Indian government launched the PM-KISAN scheme to provide
income support to farmers. Under the scheme, eligible farmer families
receive a benefit of Rs 6,000 per year in three equal installments
of Rs 2,000 each. The scheme covers all landholding farmer families
subject to certain exclusion criteria."""

# Abstractive summarization (BART)
abstractive_summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = abstractive_summarizer(text, max_length=60, min_length=20)
print("Abstractive:", summary[0]['summary_text'])

# Extractive (BERT-based)
extractive_model = Summarizer()
extractive_summary = extractive_model(text, num_sentences=2)
print("Extractive:", extractive_summary)

MEDIUM: Transformers and BERT (Questions 11-22)

Q11. Explain the BERT architecture. How is it different from GPT?

AspectBERTGPT
ArchitectureTransformer encoder onlyTransformer decoder only
AttentionBidirectional (all tokens see all tokens)Causal (left-to-right only)
Pre-trainingMasked Language Model (MLM) + NSPCausal Language Modeling (CLM)
Best forClassification, NER, QA, embeddingsText generation
2026 usageEmbeddings and retrievalLLM generation (GPT-4, LLaMA)
from transformers import BertTokenizer, BertModel, BertForSequenceClassification
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased',
                                                        num_labels=3)

# Tokenization
inputs = tokenizer(
    "PapersAdda helps candidates crack placement interviews.",
    return_tensors='pt',
    padding=True, truncation=True, max_length=128
)
# input_ids: token IDs including [CLS] and [SEP]
# attention_mask: 1 for real tokens, 0 for padding
# token_type_ids: 0 for sentence A, 1 for sentence B

# Classification (use [CLS] token embedding)
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits     # [1, 3]
    pred = logits.argmax(-1)

Q12. How do you fine-tune BERT for text classification?

from transformers import (AutoTokenizer, AutoModelForSequenceClassification,
                           TrainingArguments, Trainer)
from datasets import Dataset
import numpy as np
from sklearn.metrics import accuracy_score, f1_score

# Prepare data
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

def tokenize_function(examples):
    return tokenizer(examples['text'], padding='max_length',
                      truncation=True, max_length=128)

train_dataset = Dataset.from_dict({'text': X_train, 'label': y_train})
eval_dataset  = Dataset.from_dict({'text': X_test, 'label': y_test})
train_dataset = train_dataset.map(tokenize_function, batched=True)
eval_dataset  = eval_dataset.map(tokenize_function, batched=True)

# Model
model = AutoModelForSequenceClassification.from_pretrained(
    'bert-base-uncased', num_labels=num_classes
)

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    return {
        'accuracy': accuracy_score(labels, preds),
        'f1': f1_score(labels, preds, average='weighted')
    }

training_args = TrainingArguments(
    output_dir='./bert-classifier',
    num_train_epochs=3,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    warmup_steps=100,
    weight_decay=0.01,
    learning_rate=2e-5,        # low LR for fine-tuning pre-trained weights
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    fp16=True                  # faster training on GPU
)

trainer = Trainer(
    model=model, args=training_args,
    train_dataset=train_dataset, eval_dataset=eval_dataset,
    compute_metrics=compute_metrics
)
trainer.train()

Q13. What are sentence embeddings? How do sentence-transformers work?

Sentence-BERT (SBERT): Fine-tunes BERT with a siamese network on natural language inference (NLI) data and semantic textual similarity (STS) tasks, producing embeddings where cosine similarity = semantic similarity.

from sentence_transformers import SentenceTransformer, util
import torch

# Load a sentence encoder
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Encode sentences
sentences = [
    "How do I crack a machine learning interview?",
    "Tips for ML interview preparation",
    "What is gradient descent?",
    "Best recipe for biryani"
]
embeddings = model.encode(sentences, convert_to_tensor=True)

# Semantic similarity search
query = "ML interview tips"
query_emb = model.encode(query, convert_to_tensor=True)
scores = util.cos_sim(query_emb, embeddings)[0]

# Top matches
for idx in scores.topk(3).indices:
    print(f"{scores[idx]:.3f}: {sentences[idx]}")
# 0.81: How do I crack a machine learning interview?
# 0.79: Tips for ML interview preparation
# 0.42: What is gradient descent?

Q14. What is zero-shot and few-shot classification in NLP?

MethodLabeled ExamplesHow
Zero-shot0Use LLM or NLI model; provide label descriptions
Few-shot3-10Include examples in prompt; in-context learning
Full fine-tuningManyTraditional supervised training
Parameter-efficient fine-tuningModerateLoRA, prompt tuning
from transformers import pipeline

# Zero-shot via NLI entailment
classifier = pipeline('zero-shot-classification',
                        model='facebook/bart-large-mnli')

text = "The new iPhone camera system produces stunning portrait photos."
labels = ['technology', 'photography', 'sports', 'politics', 'food']

result = classifier(text, candidate_labels=labels)
print({label: f"{score:.3f}" for label, score in
        zip(result['labels'], result['scores'])})

# Few-shot with LLM (in-context learning)
from openai import OpenAI
client = OpenAI()

few_shot_prompt = """Classify the sentiment as positive, negative, or neutral.

Text: The pizza was cold and tasteless.
Sentiment: negative

Text: The service was quick but the food was average.
Sentiment: neutral

Text: Absolutely loved the ambience and the staff was wonderful!
Sentiment: positive

Text: The product arrived late and the packaging was damaged.
Sentiment:"""

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[{'role': 'user', 'content': few_shot_prompt}]
)

Q15. What is RAG (Retrieval Augmented Generation)? Build a simple RAG pipeline.

RAG pipeline:
1. Offline: chunk documents -> embed -> store in vector DB
2. Online: embed query -> ANN search -> retrieve top-k chunks -> LLM generates answer
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
from openai import OpenAI

# Build index
encoder = SentenceTransformer('all-MiniLM-L6-v2')
documents = [
    "Python is a general-purpose programming language created by Guido van Rossum.",
    "Machine learning is a subset of artificial intelligence focused on learning from data.",
    "Neural networks are computational models inspired by the human brain.",
    "Transformers use self-attention to process sequences in parallel."
]

doc_embeddings = encoder.encode(documents)
index = faiss.IndexFlatIP(doc_embeddings.shape[1])  # inner product = cosine on normalized vecs
faiss.normalize_L2(doc_embeddings)
index.add(doc_embeddings)

def rag_answer(query, top_k=3):
    # Retrieve
    query_emb = encoder.encode([query])
    faiss.normalize_L2(query_emb)
    scores, indices = index.search(query_emb, top_k)
    context = "\n".join([documents[i] for i in indices[0]])

    # Generate
    client = OpenAI()
    prompt = f"""Answer the question using the context below.

Context:
{context}

Question: {query}
Answer:"""

    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=0
    )
    return response.choices[0].message.content

print(rag_answer("What are transformers in machine learning?"))

Q16. What is masked language modeling (MLM)? How does BERT use it for pre-training?

  • 80% of masked tokens: replaced with [MASK]
  • 10%: replaced with a random token
  • 10%: kept unchanged (forces model to learn contextual representations even for unmasked tokens)
from transformers import BertTokenizer, BertForMaskedLM
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.eval()

text = "PapersAdda is the best platform for [MASK] preparation."
inputs = tokenizer(text, return_tensors='pt')

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Find [MASK] position and get predictions
mask_idx = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero().item()
top_tokens = logits[0, mask_idx].topk(5).indices
print([tokenizer.convert_ids_to_tokens([t.item()])[0] for t in top_tokens])
# ['interview', 'exam', 'job', 'placement', 'career']

Q17. How do you evaluate NLP models beyond accuracy?

TaskPrimary MetricsSecondary
Text classificationF1 (macro/weighted), AUCAccuracy, precision/recall
NEREntity-level F1 (exact match)Token-level F1
Machine translationBLEU, ChrFBERTScore, COMET
SummarizationROUGE-1, ROUGE-2, ROUGE-LBERTScore, human eval
Language modelingPerplexityDownstream task accuracy
QAExact Match, F1Human eval
LLM qualityMT-Bench, AlpacaEvalLLM-as-judge
from rouge_score import rouge_scorer
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from bert_score import score as bert_score_fn

# ROUGE for summarization
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
scores = scorer.score(reference_summary, generated_summary)
print(f"ROUGE-1 F: {scores['rouge1'].fmeasure:.3f}")
print(f"ROUGE-L F: {scores['rougeL'].fmeasure:.3f}")

# BERTScore (better semantic similarity)
P, R, F = bert_score_fn([generated], [reference], lang='en',
                          model_type='microsoft/deberta-xlarge-mnli')
print(f"BERTScore F1: {F.mean().item():.3f}")

# BLEU for translation
reference = [["the", "cat", "sat", "on", "the", "mat"]]
hypothesis = ["the", "cat", "is", "on", "the", "mat"]
smooth = SmoothingFunction().method1
bleu = sentence_bleu(reference, hypothesis, smoothing_function=smooth)
print(f"BLEU: {bleu:.3f}")

Q18. What is question answering in NLP? What are extractive vs generative QA?

TypeOutputModelExample
Extractive QASpan from contextBERT (start/end token prediction)SQuAD
Generative QAGenerated answerT5, GPT, LLM + RAGOpen-domain QA
Knowledge-basedFrom structured KBSPARQL, entity linkingWikidata QA
from transformers import pipeline

# Extractive QA
qa = pipeline('question-answering',
               model='deepset/roberta-base-squad2')

context = """PapersAdda is an online platform that helps freshers prepare
for placement interviews. It covers aptitude tests, coding rounds,
HR interviews, and technical rounds for companies like TCS, Infosys, Wipro,
Accenture, Capgemini, and more."""

result = qa(question="What does PapersAdda help with?", context=context)
print(f"Answer: {result['answer']}")
print(f"Score: {result['score']:.3f}")

# Generative QA with LLM + RAG
from transformers import AutoModelForCausalLM, AutoTokenizer

# Use retrieved context as input; model generates free-form answer
# (see RAG question above for full implementation)

Q19. What is attention in the context of NLP before transformers? Explain Bahdanau attention.

e_ij   = score(s_{i-1}, h_j)      # alignment energy (learned)
alpha_ij = softmax(e_ij)           # attention weights
c_i    = sum(alpha_ij * h_j)       # context vector (soft attention over source)

This allows the decoder to "focus on" relevant parts of the input at each generation step.

import torch
import torch.nn as nn

class BahdanauAttention(nn.Module):
    def __init__(self, enc_dim, dec_dim, attn_dim):
        super().__init__()
        self.enc_proj  = nn.Linear(enc_dim, attn_dim)
        self.dec_proj  = nn.Linear(dec_dim, attn_dim)
        self.score_fc  = nn.Linear(attn_dim, 1)

    def forward(self, encoder_states, decoder_hidden):
        """
        encoder_states: [B, T_src, enc_dim]
        decoder_hidden: [B, dec_dim]
        """
        # Project and compute alignment
        enc_proj = self.enc_proj(encoder_states)                 # [B, T, attn]
        dec_proj = self.dec_proj(decoder_hidden).unsqueeze(1)    # [B, 1, attn]
        energy   = torch.tanh(enc_proj + dec_proj)               # broadcast add
        scores   = self.score_fc(energy).squeeze(-1)             # [B, T]
        weights  = torch.softmax(scores, dim=-1)                 # [B, T]
        context  = (weights.unsqueeze(-1) * encoder_states).sum(dim=1)  # [B, enc_dim]
        return context, weights

Q20. How does T5 (Text-to-Text Transfer Transformer) work?

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained('t5-base')
model = T5ForConditionalGeneration.from_pretrained('t5-base')

# Translation
input_text = "translate English to French: How are you doing today?"
input_ids = tokenizer(input_text, return_tensors='pt').input_ids
outputs = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Summarization
summary_input = "summarize: " + long_text
input_ids = tokenizer(summary_input, return_tensors='pt',
                       max_length=512, truncation=True).input_ids
outputs = model.generate(input_ids, max_new_tokens=100, num_beams=4)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Classification (framed as generation)
clf_input = "mnli hypothesis: The cat is outside. premise: The cat is inside."
input_ids = tokenizer(clf_input, return_tensors='pt').input_ids
outputs = model.generate(input_ids)
# Output: "contradiction" or "entailment" or "neutral"

Q21. What is coreference resolution? Why is it hard?

"Priya went to the store. She bought milk. Her friend came too." -> Priya, She, Her all refer to the same entity.

Why it is hard:

  • Requires world knowledge ("The trophy couldn't fit in the suitcase because it was too big" - what is "it"?)
  • Requires contextual understanding across sentences
  • Ambiguous pronouns
import spacy

# Neural coreference with spaCy (requires coreferee or experimental component)
nlp = spacy.load("en_core_web_trf")

# Modern approach: LLM for coreference
prompt = """Find all coreference chains in the text.
Text: Rahul is a software engineer at Infosys. He joined in 2023.
His team works on banking software. They deliver quarterly.

Chains:"""

# Expected: [Rahul, He, His], [team, They]

Q22. What is text augmentation and when is it useful for NLP?

TechniqueMethodPreserves Label?
Synonym replacementReplace n words with synonyms (WordNet)Usually yes
Back-translationTranslate to another language and backYes
Easy Data Augmentation (EDA)Swap, insert, delete, replaceYes
Contextual insertion (BERT)Use MLM to insert plausible wordsYes
LLM paraphraseAsk GPT-4 to rephraseYes
import nlpaug.augmenter.word as naw
import nlpaug.augmenter.sentence as nas

text = "The machine learning model achieved excellent performance."

# Synonym replacement
syn_aug = naw.SynonymAug(aug_src='wordnet', aug_p=0.3)
print(syn_aug.augment(text))

# Contextual word insertion with BERT
bert_aug = naw.ContextualWordEmbsAug(model_path='bert-base-uncased',
                                      action='insert', aug_p=0.2)
print(bert_aug.augment(text))

# Back-translation (requires translation model or API)
# English -> French -> English

HARD: Advanced NLP (Questions 23-30)

Q23. How do you fine-tune an LLM with LoRA for NLP tasks?

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
from datasets import Dataset

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    'mistralai/Mistral-7B-v0.1',
    torch_dtype=torch.bfloat16,
    device_map='auto'
)
tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-v0.1')
tokenizer.pad_token = tokenizer.eos_token

# LoRA configuration
lora_config = LoraConfig(
    r=16, lora_alpha=32,
    target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'],
    lora_dropout=0.05,
    task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, lora_config)

# Training
training_args = TrainingArguments(
    output_dir='./mistral-finetune',
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    bf16=True,
    logging_steps=10,
    save_strategy='epoch'
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    dataset_text_field='text',
    max_seq_length=512,
    args=training_args
)
trainer.train()
model.save_pretrained('./mistral-lora-adapter')  # save only LoRA weights

Q24. What is machine translation? How do modern neural MT systems work?

Key components:

  1. Subword tokenization (shared BPE vocabulary across language pairs in multilingual models)
  2. Encoder processes source tokens bidirectionally
  3. Decoder generates target tokens autoregressively, attending to encoder states
  4. Beam search at inference
from transformers import MarianMTModel, MarianTokenizer

# Helsinki-NLP models cover 1,000+ language pairs
model_name = 'Helsinki-NLP/opus-mt-en-hi'   # English to Hindi
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

texts = ["Machine learning is transforming every industry.",
         "PapersAdda helps you prepare for placements."]

inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True)
translated = model.generate(**inputs, num_beams=5)
results = tokenizer.batch_decode(translated, skip_special_tokens=True)
for src, tgt in zip(texts, results):
    print(f"EN: {src}")
    print(f"HI: {tgt}")
    print()

Q25. What is cross-lingual transfer learning? How does mBERT work?

mBERT (Multilingual BERT):

  • BERT architecture trained on 104 languages jointly
  • Shared WordPiece vocabulary across all languages
  • MLM objective on all languages (no cross-lingual signal; transfer is emergent)
  • Works surprisingly well for NER, classification across languages

XLM-RoBERTa (better than mBERT in 2026):

  • Trained on 100 languages, 2.5TB of text
  • Uses SentencePiece tokenization (language-agnostic)
  • No NSP, larger batch, more data = significantly better
from transformers import pipeline

# Cross-lingual NER: train on English, test on any of 104 languages
nlp_ner = pipeline('ner', model='xlm-roberta-large-finetuned-conll03-english',
                    aggregation_strategy='simple')

# Works on English
print(nlp_ner("Apple was founded by Steve Jobs in Cupertino."))

# Also works on other languages (zero-shot transfer)
# Works on multilingual input even without language-specific fine-tuning

Q26. How do you build a text classification system that handles 1 billion documents?

Production architecture:

1. Embedding phase (offline):
   - Use a fast encoder (MiniLM-L6: 6 layers, 384-dim, 5x faster than BERT-base)
   - Embed all documents, store in vector store (Qdrant, Weaviate, pgvector)
   - Batch embed on GPU: throughput ~5,000 docs/sec with MiniLM

2. Classification phase (online, two tiers):
   Tier 1 (fast): TF-IDF + LogReg or hashed features -> rules out 90% cases in <1ms
   Tier 2 (accurate): ANN retrieval -> find k-nearest labeled examples -> majority vote
   Tier 3 (hard cases only): BERT fine-tuned, run asynchronously

3. Active learning loop:
   - Flag low-confidence predictions for human review
   - Retrain classifier weekly with new labels
from sentence_transformers import SentenceTransformer
from sklearn.linear_model import SGDClassifier
import numpy as np

# Fast production encoder
encoder = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Streaming classification with online learning
clf = SGDClassifier(loss='hinge', random_state=42, n_jobs=-1)

def process_batch(texts, labels=None):
    embeddings = encoder.encode(texts, batch_size=128, show_progress_bar=False)
    if labels is not None:
        clf.partial_fit(embeddings, labels, classes=ALL_CLASSES)
    return clf.predict(embeddings)

Q27. What is the difference between semantic search and keyword search? How do you combine them?

ApproachHowStrengthWeakness
Keyword (BM25)TF-IDF-based term matchingExact terms, rare entitiesSynonyms, paraphrases
Semantic (dense)Cosine similarity of embeddingsParaphrases, semantic meaningExact keyword miss
HybridBM25 + dense, fuse scoresBest of bothMore complex

Hybrid search with Reciprocal Rank Fusion (RRF):

from rank_bm25 import BM25Okapi
import faiss
import numpy as np

def reciprocal_rank_fusion(bm25_results, dense_results, k=60):
    """
    bm25_results: [(doc_id, score), ...] sorted by BM25 rank
    dense_results: [(doc_id, score), ...] sorted by semantic rank
    """
    rrf_scores = {}
    for rank, (doc_id, _) in enumerate(bm25_results):
        rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank + 1)
    for rank, (doc_id, _) in enumerate(dense_results):
        rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank + 1)
    return sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)

# Most production search engines (Elasticsearch, OpenSearch, Qdrant)
# support hybrid search natively in 2026

Q28. What is relation extraction and how is it approached in 2026?

"Microsoft was founded by Bill Gates in 1975." -> (Microsoft, founded_by, Bill Gates), (Microsoft, founded_in, 1975)

Approaches:

# Approach 1: BERT for relation classification (supervised)
from transformers import AutoModelForSequenceClassification

# Mark entities in text: "[E1] Microsoft [/E1] was founded by [E2] Bill Gates [/E2]"
# Fine-tune BERT to classify relationship type

# Approach 2: LLM-based (zero-shot in 2026)
from openai import OpenAI
client = OpenAI()

def extract_relations(text):
    prompt = f"""Extract all (subject, relation, object) triples from the text.
Format: subject | relation | object
One triple per line.

Text: {text}
Triples:"""

    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=0
    )
    return response.choices[0].message.content

# Approach 3: Information extraction with spaCy
import spacy
nlp = spacy.load("en_core_web_trf")
doc = nlp("Apple acquired DeepMind-like startup for $1 billion.")
# Extract noun chunks, dependency paths, and predicate relations

Q29. What is text generation evaluation in the LLM era? How do you go beyond BLEU?

BLEU and ROUGE measure n-gram overlap with a reference. They fail for open-ended generation where many valid outputs exist.

Modern evaluation approaches:

MethodDescriptionTool
BERTScoreContextual embedding similaritybert-score library
LLM-as-judgeAsk GPT-4/Claude to rate responseOpenAI API
MT-BenchGPT-4 evaluates multi-turn chatLMSYS benchmark
AlpacaEvalWin rate vs reference modelAlpacaEval framework
Human evaluationGold standard, expensiveAmazon Mechanical Turk
Task-specific metricsPass@k for code, exact match for QACustom
# LLM-as-judge (increasingly standard in 2026)
from openai import OpenAI

def evaluate_with_llm(question, generated_answer, reference_answer):
    client = OpenAI()
    prompt = f"""Rate the quality of the generated answer on a scale of 1-10.

Question: {question}
Reference Answer: {reference_answer}
Generated Answer: {generated_answer}

Score (1-10) and brief justification:"""

    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=0
    )
    return response.choices[0].message.content

# For production LLM evaluation at scale, use Prometheus (open-source judge model)
# instead of paying for GPT-4 per call

Q30. How do you handle multilingual text in a production NLP system for Indian languages?

Stack for Indian language NLP in 2026:

1. Tokenization: IndicNLP library (language-aware) or SentencePiece
2. Base model: IndicBERT / MuRIL (Google, trained on 17 Indian languages)
3. Translation: AI4Bharat IndicTrans2 (best open-source IN translation)
4. Speech: Sarvam-1, Whisper multilingual
5. LLM: Namaste GPT, OpenHathi (Llama fine-tuned on Hindi)
from transformers import AutoTokenizer, AutoModel
import torch

# MuRIL: Multilingual Representations for Indian Languages
# Google's model trained on 17 Indian languages
muril_tokenizer = AutoTokenizer.from_pretrained('google/muril-base-cased')
muril_model = AutoModel.from_pretrained('google/muril-base-cased')

# Handle code-switched text (Hindi + English)
text = "yeh model bahut achha hai for sentiment analysis"  # Hindi-English mix
inputs = muril_tokenizer(text, return_tensors='pt')
with torch.no_grad():
    embeddings = muril_model(**inputs).last_hidden_state[:, 0, :]  # CLS

# IndicBERT (AI4Bharat) for Indic languages
indic_tokenizer = AutoTokenizer.from_pretrained('ai4bharat/indic-bert')

NLP Tools Comparison Table 2026

TaskTool/ModelNotes
TokenizationHuggingFace Tokenizers, SentencePieceFast, BPE and WordPiece
Classical NLP pipelinespaCyPOS, NER, dependency, fast
Sentence embeddingssentence-transformersSemantic search
BERT fine-tuningHuggingFace Transformers + TrainerStandard in 2026
LLM inferencevLLM, TGIProduction serving
LLM fine-tuningTRL + PEFT (LoRA)Memory-efficient
Vector searchFAISS, Qdrant, pgvectorANN for RAG
Evaluationevaluate (HuggingFace)ROUGE, BLEU, BERTScore

FAQ

Q: What is the best starting point for learning NLP in 2026? A: Start with the HuggingFace NLP course (free). Then implement a BERT fine-tuning project on a classification task. Then build a small RAG system with FAISS and OpenAI.

Q: Is spaCy still relevant in 2026? A: Yes for classical NLP pipelines: POS tagging, dependency parsing, rule-based NER, text preprocessing. For deep learning NLP tasks, HuggingFace is the standard.

Q: What is the difference between BERT and RoBERTa? A: RoBERTa (Liu et al.) removes Next Sentence Prediction (NSP) from BERT, uses dynamic masking, trains with much more data and larger batches. Consistently outperforms BERT on all benchmarks.

Q: What is instruction tuning? A: Fine-tuning a base LLM on instruction-response pairs so that it follows user instructions (e.g., "Summarize this text", "Write a poem about..."). InstructGPT, Alpaca, and FLAN-T5 use this technique.


Related articles on PapersAdda:

Methodology applied to this articlelast verified 8 Jun 2026
Sources used
Public exam-pattern documents, official recruiter pages, and verified candidate reports on r/developersIndia and LinkedIn.
Verification window
Page last edited 8 Jun 2026 by Aditya Sharma. Numbers and patterns sanity-checked against the most recent 2026 cycle drives we tracked.
What we did NOT do
  • No fabricated salary numbers or success rates. If we quote a range, it's sourced.
  • No noun-substituted templates. This article was not generated by swapping company names in a stock prompt.
  • No paid placements, sponsored coaching links, or affiliate-shilled course pushes.
Verification policy: /editorial-standards/. Found something incorrect? Submit a correction - we respond within 48 hours.

Explore this topic cluster

More resources in Interview Questions

Use the category hub to browse similar questions, exam patterns, salary guides, and preparation resources related to this topic.

Paid contributor programme

Sat this this year? Share your story, earn ₹500.

First-person experience reports help future candidates prep smarter. We pay verified contributors ₹500 via UPI per accepted story - with byline.

Submit your story →

Ready to practice?

Take a free timed mock test

Put what you learned into practice. Our mock tests match the 2026 pattern with timer, navigator, reveal, and score breakdown. No signup.

Start Free Mock Test →

Related Articles

More from PapersAdda

Share this guide: