placement brief / Interview Questions / interview questions / 08 Jun 2026

NLP Interview Questions 2026: 30 Answers with Code

30 NLP interview questions with full answers and Python code covering tokenization, word embeddings, BERT, GPT, NER, text classification, and LLM fine-tuning for 2026 interviews.

By Aditya SharmaPublished 8 Jun 20262 sources listedSpot an error? Corrections open

8 min read last revised 8 Jun 2026

on this page§ 06

Natural language processing is one of the highest-demand specializations in 2026. Every company that processes text, which is nearly every company, needs NLP engineers. The interview arc goes from classical text preprocessing all the way through fine-tuning large language models. This guide covers 30 NLP interview questions with full answers, Python code, and comparison tables for 2026.

PapersAdda's take: NLP interviews in 2026 have a sharp divide. Companies hiring for LLM-adjacent roles will ask about transformers, fine-tuning, RAG, and evaluation. Companies building internal NLP tools will ask classical NLP plus modern BERT fine-tuning. Know both tracks. Candidates report that tokenization internals and BERT vs GPT architecture tradeoffs appear in virtually every NLP interview. According to candidate accounts from public preparation resources, RAG system design questions have become standard at AI-focused startups. Confirm the specific NLP interview format on the official careers portal before your round.

Related articles: AI/ML Interview Questions 2026 | Deep Learning Interview Questions 2026 | LLM Interview Questions 2026 | Generative AI Interview Questions 2026 | Prompt Engineering Interview Questions 2026 | Machine Learning Interview Questions 2026

Which Companies Ask These Questions?

Topic	Companies
Text preprocessing and classical NLP	All companies with NLP roles
Word embeddings (Word2Vec, GloVe)	All ML teams
BERT fine-tuning	Google, Microsoft, Flipkart, Meesho
LLM + RAG systems	OpenAI, Cohere, Anthropic-adjacent startups
NER and Information Extraction	Freshworks, Sprinklr, Sarvam AI
Machine Translation	Google, Microsoft, Koo, Bhashini
Conversational AI	Amazon Alexa, Google, Jio AI

EASY: Foundations (Questions 1-10)

Q1. What is tokenization? Compare word, subword, and character tokenization.

Type	Unit	Vocab Size	OOV Handling	Examples
Word	Whole words	50K-500K	Bad (unseen words)	Early NLP, basic bag-of-words
Subword (BPE)	Frequent substrings	30K-50K	Good (breaks into subwords)	GPT-2, GPT-4, LLaMA
Subword (WordPiece)	Learned subwords	30K	Good	BERT, DistilBERT
Character	Individual chars	~256	Perfect	Rare; very long sequences
SentencePiece	Language-agnostic BPE/unigram	32K	Good	T5, LLaMA, multilingual models

from transformers import AutoTokenizer

# BERT WordPiece
bert_tok = AutoTokenizer.from_pretrained('bert-base-uncased')
tokens = bert_tok.tokenize("unbelievable transformers")
print(tokens)  # ['un', '##believe', '##able', 'transformers']

# GPT-2 BPE (Byte-Pair Encoding)
gpt_tok = AutoTokenizer.from_pretrained('gpt2')
tokens = gpt_tok.encode("unbelievable transformers")
print(gpt_tok.convert_ids_to_tokens(tokens))  # ['un', 'believable', 'Ġtransformers']

# LLaMA SentencePiece
llama_tok = AutoTokenizer.from_pretrained('meta-llama/Llama-3-8b')
encoded = llama_tok("Hello world", return_tensors='pt')
print(encoded.input_ids, encoded.attention_mask)

Q2. What is TF-IDF and when is it still useful in 2026?

TF(t, d)  = count(t in d) / total_words(d)
IDF(t)    = log(N / df(t))           # N = total docs, df(t) = docs containing t
TF-IDF    = TF * IDF

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

# TF-IDF is still effective for short-text classification at low compute cost
pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(
        ngram_range=(1,2),      # unigrams and bigrams
        max_features=50000,
        sublinear_tf=True,      # log(1 + tf) instead of raw tf
        min_df=2,               # ignore very rare terms
        strip_accents='unicode',
        analyzer='word'
    )),
    ('clf', LogisticRegression(C=5, max_iter=1000))
])

pipeline.fit(X_train_text, y_train)
print(pipeline.score(X_test_text, y_test))

Still useful in 2026 when:

Extremely low-latency requirement where BERT is too slow
Short text classification baseline
High-volume, resource-constrained environments
Interpretability required (TF-IDF features are human-readable)

Q3. Explain Word2Vec. What is the difference between CBOW and Skip-gram?

Variant	Task	Faster For
CBOW (Continuous Bag of Words)	Predict center word from context	Frequent words
Skip-gram	Predict context words from center word	Rare words, larger datasets

from gensim.models import Word2Vec
import numpy as np

sentences = [["the", "cat", "sat", "on", "mat"],
             ["the", "dog", "ran", "in", "park"],
             ["cat", "and", "dog", "are", "pets"]]

# Skip-gram (sg=1): better for rare words, recommended
model = Word2Vec(sentences, vector_size=100, window=5,
                 min_count=1, workers=4, sg=1, epochs=100)

# Semantic relationships
print(model.wv.most_similar('cat'))
print(model.wv.similarity('cat', 'dog'))

# Famous analogy: king - man + woman = queen
result = model.wv.most_similar(positive=['king', 'woman'],
                                negative=['man'], topn=1)

# Negative sampling: faster than hierarchical softmax for large vocab
model_neg = Word2Vec(sentences, vector_size=100, window=5,
                      negative=5, ns_exponent=0.75)  # 5 negative samples

Q4. What are GloVe embeddings? How do they differ from Word2Vec?

Property	Word2Vec	GloVe
Method	Predictive (local context windows)	Count-based (global co-occurrence matrix)
Speed	Faster training	Requires full corpus co-occurrence computation
Quality	Better on syntactic analogies	Better on semantic analogies
Download-ready	Yes (pre-trained)	Yes (pre-trained on Common Crawl)

GloVe objective: Minimize the difference between the dot product of word vectors and the log of their co-occurrence count:

J = Σ f(X_ij) (w_i^T w_j + b_i + b_j - log X_ij)^2

import numpy as np

def load_glove(filepath, max_words=None):
    """Load GloVe pretrained vectors."""
    embeddings = {}
    with open(filepath, 'r', encoding='utf-8') as f:
        for i, line in enumerate(f):
            if max_words and i >= max_words:
                break
            values = line.split()
            word = values[0]
            vector = np.array(values[1:], dtype='float32')
            embeddings[word] = vector
    return embeddings

# glove = load_glove('glove.6B.300d.txt')
# print(glove['king'].shape)  # (300,)

# Use with Keras/PyTorch via embedding matrix
# embedding_matrix = build_from_glove(vocab, glove, dim=300)

2026 status: GloVe and Word2Vec are largely replaced by contextual embeddings (BERT, sentence-transformers). Still used as fast, lightweight baselines and in resource-constrained settings.

Q5. What is a language model? What is perplexity?

P(w_1, ..., w_n) = P(w_1) * P(w_2|w_1) * ... * P(w_n|w_1,...,w_{n-1})

Perplexity: Measures how surprised the model is by the test text. Lower = better.

Perplexity = exp(-1/N * Σ log P(w_i | w_{<i}))

A perplexity of 20 means the model is as uncertain as if it were choosing uniformly from 20 options at each step.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

def compute_perplexity(text, model_name='gpt2'):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)
    model.eval()

    encodings = tokenizer(text, return_tensors='pt')
    input_ids = encodings.input_ids

    with torch.no_grad():
        outputs = model(input_ids, labels=input_ids)
        neg_log_likelihood = outputs.loss    # mean NLL per token

    perplexity = torch.exp(neg_log_likelihood).item()
    return perplexity

# Lower perplexity = model assigns higher probability to this text
# GPT-2 on general English text: ~50
# Fine-tuned GPT-2 on domain text: ~20-30

Q6. What is Named Entity Recognition (NER)? How do you build an NER system?

Approaches in 2026:

Approach	Tool	Accuracy	Speed
Rule-based	spaCy EntityRuler	Low	Very fast
CRF	sklearn-crfsuite	Medium	Fast
BiLSTM-CRF	Custom PyTorch	Good	Medium
BERT fine-tuned	HuggingFace	Best	Slower
Zero-shot (LLM)	GPT-4, Claude	Near-best	Slowest

# Modern approach: fine-tune BERT for NER
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import TrainingArguments, Trainer
import numpy as np

model_name = 'bert-base-cased'
tokenizer = AutoTokenizer.from_pretrained(model_name)

label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG',
              'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC']
id2label = {i: l for i, l in enumerate(label_list)}
label2id = {l: i for i, l in id2label.items()}

model = AutoModelForTokenClassification.from_pretrained(
    model_name,
    num_labels=len(label_list),
    id2label=id2label,
    label2id=label2id
)

# Quick inference with pipeline
from transformers import pipeline
ner = pipeline('ner', model='dslim/bert-base-NER', aggregation_strategy='simple')
entities = ner("Tata Consultancy Services was founded in Mumbai.")
for ent in entities:
    print(f"{ent['word']}: {ent['entity_group']} ({ent['score']:.2f})")
# Output: Tata Consultancy Services: ORG (0.99), Mumbai: LOC (0.98)

Q7. What is sequence-to-sequence (seq2seq) and what are its applications?

Applications: Machine translation, summarization, question answering, code generation, speech recognition.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# T5 for summarization
tokenizer = AutoTokenizer.from_pretrained('t5-base')
model = AutoModelForSeq2SeqLM.from_pretrained('t5-base')

text = """Researchers have developed a new deep learning model that achieves
          state-of-the-art results on multiple NLP benchmarks with 10x less
          compute than previous approaches."""

inputs = tokenizer("summarize: " + text, return_tensors='pt',
                    max_length=512, truncation=True)

outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=100,
    num_beams=4,            # beam search
    early_stopping=True,
    no_repeat_ngram_size=3  # prevent repetition
)

summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(summary)

Q8. What is beam search? How does it differ from greedy decoding?

Decoding	How	Speed	Quality
Greedy	Pick highest-probability token at each step	Fastest	Often suboptimal
Beam search	Maintain top-k sequences at each step	Slower	Better for translation/summarization
Top-k sampling	Sample from top-k tokens	Fast	Creative text generation
Top-p (nucleus)	Sample from minimum tokens covering probability p	Fast	More natural than top-k
Temperature	Scale logits before softmax	Free	Higher T = more diverse

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained('gpt2')
model = AutoModelForCausalLM.from_pretrained('gpt2')

inputs = tokenizer("Machine learning is", return_tensors='pt')

# Greedy decoding
greedy_out = model.generate(inputs.input_ids, max_new_tokens=30)

# Beam search (good for factual tasks)
beam_out = model.generate(inputs.input_ids, max_new_tokens=30,
                           num_beams=5, early_stopping=True)

# Sampling with temperature + top-p (good for creative tasks)
sampled_out = model.generate(
    inputs.input_ids, max_new_tokens=50,
    do_sample=True, temperature=0.8,
    top_p=0.9, top_k=50
)

# Beam search with repetition penalty (avoids repeated phrases)
beam_no_rep = model.generate(inputs.input_ids, max_new_tokens=30,
                              num_beams=5, repetition_penalty=1.3)

Q9. What is sentiment analysis? Compare rule-based and ML approaches.

Approach	Accuracy	Speed	Maintenance
VADER (rule-based)	~70% on reviews	Very fast	No training
TF-IDF + LR	~85-88%	Fast	Labeled data needed
BERT fine-tuned	~93-95%	Slower	Fine-tuning data needed
LLM zero-shot	~90-92%	Slow	No labeled data needed

# VADER for quick baseline
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()
scores = analyzer.polarity_scores("The product is absolutely amazing!")
print(scores)  # {'neg': 0.0, 'neu': 0.295, 'pos': 0.705, 'compound': 0.6368}

# BERT fine-tuned (best accuracy)
from transformers import pipeline

sentiment = pipeline('sentiment-analysis',
                      model='cardiffnlp/twitter-roberta-base-sentiment-latest')
result = sentiment("PapersAdda's content is genuinely helpful for interviews.")
print(result)  # [{'label': 'Positive', 'score': 0.98}]

# Zero-shot with LLM
zero_shot = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')
result = zero_shot("The service was terrible and I want a refund.",
                    candidate_labels=['positive', 'negative', 'neutral'])

Q10. What is the difference between extractive and abstractive summarization?

Type	Method	Output	Model
Extractive	Select and copy sentences from source	Verbatim sentences	TextRank, BERT-extractive
Abstractive	Generate new text capturing key ideas	Paraphrased summary	T5, BART, PEGASUS

from transformers import pipeline
from summarizer import Summarizer  # BERT-extractive

text = """The Indian government launched the PM-KISAN scheme to provide
income support to farmers. Under the scheme, eligible farmer families
receive a benefit of Rs 6,000 per year in three equal installments
of Rs 2,000 each. The scheme covers all landholding farmer families
subject to certain exclusion criteria."""

# Abstractive summarization (BART)
abstractive_summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = abstractive_summarizer(text, max_length=60, min_length=20)
print("Abstractive:", summary[0]['summary_text'])

# Extractive (BERT-based)
extractive_model = Summarizer()
extractive_summary = extractive_model(text, num_sentences=2)
print("Extractive:", extractive_summary)

MEDIUM: Transformers and BERT (Questions 11-22)

Q11. Explain the BERT architecture. How is it different from GPT?

Aspect	BERT	GPT
Architecture	Transformer encoder only	Transformer decoder only
Attention	Bidirectional (all tokens see all tokens)	Causal (left-to-right only)
Pre-training	Masked Language Model (MLM) + NSP	Causal Language Modeling (CLM)
Best for	Classification, NER, QA, embeddings	Text generation
2026 usage	Embeddings and retrieval	LLM generation (GPT-4, LLaMA)

from transformers import BertTokenizer, BertModel, BertForSequenceClassification
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased',
                                                        num_labels=3)

# Tokenization
inputs = tokenizer(
    "PapersAdda helps candidates crack placement interviews.",
    return_tensors='pt',
    padding=True, truncation=True, max_length=128
)
# input_ids: token IDs including [CLS] and [SEP]
# attention_mask: 1 for real tokens, 0 for padding
# token_type_ids: 0 for sentence A, 1 for sentence B

# Classification (use [CLS] token embedding)
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits     # [1, 3]
    pred = logits.argmax(-1)

Q12. How do you fine-tune BERT for text classification?

from transformers import (AutoTokenizer, AutoModelForSequenceClassification,
                           TrainingArguments, Trainer)
from datasets import Dataset
import numpy as np
from sklearn.metrics import accuracy_score, f1_score

# Prepare data
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

def tokenize_function(examples):
    return tokenizer(examples['text'], padding='max_length',
                      truncation=True, max_length=128)

train_dataset = Dataset.from_dict({'text': X_train, 'label': y_train})
eval_dataset  = Dataset.from_dict({'text': X_test, 'label': y_test})
train_dataset = train_dataset.map(tokenize_function, batched=True)
eval_dataset  = eval_dataset.map(tokenize_function, batched=True)

# Model
model = AutoModelForSequenceClassification.from_pretrained(
    'bert-base-uncased', num_labels=num_classes
)

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    return {
        'accuracy': accuracy_score(labels, preds),
        'f1': f1_score(labels, preds, average='weighted')
    }

training_args = TrainingArguments(
    output_dir='./bert-classifier',
    num_train_epochs=3,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    warmup_steps=100,
    weight_decay=0.01,
    learning_rate=2e-5,        # low LR for fine-tuning pre-trained weights
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    fp16=True                  # faster training on GPU
)

trainer = Trainer(
    model=model, args=training_args,
    train_dataset=train_dataset, eval_dataset=eval_dataset,
    compute_metrics=compute_metrics
)
trainer.train()

Q13. What are sentence embeddings? How do sentence-transformers work?

Sentence-BERT (SBERT): Fine-tunes BERT with a siamese network on natural language inference (NLI) data and semantic textual similarity (STS) tasks, producing embeddings where cosine similarity = semantic similarity.

from sentence_transformers import SentenceTransformer, util
import torch

# Load a sentence encoder
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Encode sentences
sentences = [
    "How do I crack a machine learning interview?",
    "Tips for ML interview preparation",
    "What is gradient descent?",
    "Best recipe for biryani"
]
embeddings = model.encode(sentences, convert_to_tensor=True)

# Semantic similarity search
query = "ML interview tips"
query_emb = model.encode(query, convert_to_tensor=True)
scores = util.cos_sim(query_emb, embeddings)[0]

# Top matches
for idx in scores.topk(3).indices:
    print(f"{scores[idx]:.3f}: {sentences[idx]}")
# 0.81: How do I crack a machine learning interview?
# 0.79: Tips for ML interview preparation
# 0.42: What is gradient descent?

Q14. What is zero-shot and few-shot classification in NLP?

Method	Labeled Examples	How
Zero-shot	0	Use LLM or NLI model; provide label descriptions
Few-shot	3-10	Include examples in prompt; in-context learning
Full fine-tuning	Many	Traditional supervised training
Parameter-efficient fine-tuning	Moderate	LoRA, prompt tuning

from transformers import pipeline

# Zero-shot via NLI entailment
classifier = pipeline('zero-shot-classification',
                        model='facebook/bart-large-mnli')

text = "The new iPhone camera system produces stunning portrait photos."
labels = ['technology', 'photography', 'sports', 'politics', 'food']

result = classifier(text, candidate_labels=labels)
print({label: f"{score:.3f}" for label, score in
        zip(result['labels'], result['scores'])})

# Few-shot with LLM (in-context learning)
from openai import OpenAI
client = OpenAI()

few_shot_prompt = """Classify the sentiment as positive, negative, or neutral.

Text: The pizza was cold and tasteless.
Sentiment: negative

Text: The service was quick but the food was average.
Sentiment: neutral

Text: Absolutely loved the ambience and the staff was wonderful!
Sentiment: positive

Text: The product arrived late and the packaging was damaged.
Sentiment:"""

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[{'role': 'user', 'content': few_shot_prompt}]
)

Q15. What is RAG (Retrieval Augmented Generation)? Build a simple RAG pipeline.

RAG pipeline:
1. Offline: chunk documents -> embed -> store in vector DB
2. Online: embed query -> ANN search -> retrieve top-k chunks -> LLM generates answer

from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
from openai import OpenAI

# Build index
encoder = SentenceTransformer('all-MiniLM-L6-v2')
documents = [
    "Python is a general-purpose programming language created by Guido van Rossum.",
    "Machine learning is a subset of artificial intelligence focused on learning from data.",
    "Neural networks are computational models inspired by the human brain.",
    "Transformers use self-attention to process sequences in parallel."
]

doc_embeddings = encoder.encode(documents)
index = faiss.IndexFlatIP(doc_embeddings.shape[1])  # inner product = cosine on normalized vecs
faiss.normalize_L2(doc_embeddings)
index.add(doc_embeddings)

def rag_answer(query, top_k=3):
    # Retrieve
    query_emb = encoder.encode([query])
    faiss.normalize_L2(query_emb)
    scores, indices = index.search(query_emb, top_k)
    context = "\n".join([documents[i] for i in indices[0]])

    # Generate
    client = OpenAI()
    prompt = f"""Answer the question using the context below.

Context:
{context}

Question: {query}
Answer:"""

    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=0
    )
    return response.choices[0].message.content

print(rag_answer("What are transformers in machine learning?"))

Q16. What is masked language modeling (MLM)? How does BERT use it for pre-training?

80% of masked tokens: replaced with [MASK]
10%: replaced with a random token
10%: kept unchanged (forces model to learn contextual representations even for unmasked tokens)

from transformers import BertTokenizer, BertForMaskedLM
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.eval()

text = "PapersAdda is the best platform for [MASK] preparation."
inputs = tokenizer(text, return_tensors='pt')

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Find [MASK] position and get predictions
mask_idx = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero().item()
top_tokens = logits[0, mask_idx].topk(5).indices
print([tokenizer.convert_ids_to_tokens([t.item()])[0] for t in top_tokens])
# ['interview', 'exam', 'job', 'placement', 'career']

Q17. How do you evaluate NLP models beyond accuracy?

Task	Primary Metrics	Secondary
Text classification	F1 (macro/weighted), AUC	Accuracy, precision/recall
NER	Entity-level F1 (exact match)	Token-level F1
Machine translation	BLEU, ChrF	BERTScore, COMET
Summarization	ROUGE-1, ROUGE-2, ROUGE-L	BERTScore, human eval
Language modeling	Perplexity	Downstream task accuracy
QA	Exact Match, F1	Human eval
LLM quality	MT-Bench, AlpacaEval	LLM-as-judge

from rouge_score import rouge_scorer
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from bert_score import score as bert_score_fn

# ROUGE for summarization
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
scores = scorer.score(reference_summary, generated_summary)
print(f"ROUGE-1 F: {scores['rouge1'].fmeasure:.3f}")
print(f"ROUGE-L F: {scores['rougeL'].fmeasure:.3f}")

# BERTScore (better semantic similarity)
P, R, F = bert_score_fn([generated], [reference], lang='en',
                          model_type='microsoft/deberta-xlarge-mnli')
print(f"BERTScore F1: {F.mean().item():.3f}")

# BLEU for translation
reference = [["the", "cat", "sat", "on", "the", "mat"]]
hypothesis = ["the", "cat", "is", "on", "the", "mat"]
smooth = SmoothingFunction().method1
bleu = sentence_bleu(reference, hypothesis, smoothing_function=smooth)
print(f"BLEU: {bleu:.3f}")

Q18. What is question answering in NLP? What are extractive vs generative QA?

Type	Output	Model	Example
Extractive QA	Span from context	BERT (start/end token prediction)	SQuAD
Generative QA	Generated answer	T5, GPT, LLM + RAG	Open-domain QA
Knowledge-based	From structured KB	SPARQL, entity linking	Wikidata QA

from transformers import pipeline

# Extractive QA
qa = pipeline('question-answering',
               model='deepset/roberta-base-squad2')

context = """PapersAdda is an online platform that helps freshers prepare
for placement interviews. It covers aptitude tests, coding rounds,
HR interviews, and technical rounds for companies like TCS, Infosys, Wipro,
Accenture, Capgemini, and more."""

result = qa(question="What does PapersAdda help with?", context=context)
print(f"Answer: {result['answer']}")
print(f"Score: {result['score']:.3f}")

# Generative QA with LLM + RAG
from transformers import AutoModelForCausalLM, AutoTokenizer

# Use retrieved context as input; model generates free-form answer
# (see RAG question above for full implementation)

Q19. What is attention in the context of NLP before transformers? Explain Bahdanau attention.

e_ij   = score(s_{i-1}, h_j)      # alignment energy (learned)
alpha_ij = softmax(e_ij)           # attention weights
c_i    = sum(alpha_ij * h_j)       # context vector (soft attention over source)

This allows the decoder to "focus on" relevant parts of the input at each generation step.

import torch
import torch.nn as nn

class BahdanauAttention(nn.Module):
    def __init__(self, enc_dim, dec_dim, attn_dim):
        super().__init__()
        self.enc_proj  = nn.Linear(enc_dim, attn_dim)
        self.dec_proj  = nn.Linear(dec_dim, attn_dim)
        self.score_fc  = nn.Linear(attn_dim, 1)

    def forward(self, encoder_states, decoder_hidden):
        """
        encoder_states: [B, T_src, enc_dim]
        decoder_hidden: [B, dec_dim]
        """
        # Project and compute alignment
        enc_proj = self.enc_proj(encoder_states)                 # [B, T, attn]
        dec_proj = self.dec_proj(decoder_hidden).unsqueeze(1)    # [B, 1, attn]
        energy   = torch.tanh(enc_proj + dec_proj)               # broadcast add
        scores   = self.score_fc(energy).squeeze(-1)             # [B, T]
        weights  = torch.softmax(scores, dim=-1)                 # [B, T]
        context  = (weights.unsqueeze(-1) * encoder_states).sum(dim=1)  # [B, enc_dim]
        return context, weights

Q20. How does T5 (Text-to-Text Transfer Transformer) work?

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained('t5-base')
model = T5ForConditionalGeneration.from_pretrained('t5-base')

# Translation
input_text = "translate English to French: How are you doing today?"
input_ids = tokenizer(input_text, return_tensors='pt').input_ids
outputs = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Summarization
summary_input = "summarize: " + long_text
input_ids = tokenizer(summary_input, return_tensors='pt',
                       max_length=512, truncation=True).input_ids
outputs = model.generate(input_ids, max_new_tokens=100, num_beams=4)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Classification (framed as generation)
clf_input = "mnli hypothesis: The cat is outside. premise: The cat is inside."
input_ids = tokenizer(clf_input, return_tensors='pt').input_ids
outputs = model.generate(input_ids)
# Output: "contradiction" or "entailment" or "neutral"

Q21. What is coreference resolution? Why is it hard?

"Priya went to the store. She bought milk. Her friend came too." -> Priya, She, Her all refer to the same entity.

Why it is hard:

Requires world knowledge ("The trophy couldn't fit in the suitcase because it was too big" - what is "it"?)
Requires contextual understanding across sentences
Ambiguous pronouns

import spacy

# Neural coreference with spaCy (requires coreferee or experimental component)
nlp = spacy.load("en_core_web_trf")

# Modern approach: LLM for coreference
prompt = """Find all coreference chains in the text.
Text: Rahul is a software engineer at Infosys. He joined in 2023.
His team works on banking software. They deliver quarterly.

Chains:"""

# Expected: [Rahul, He, His], [team, They]

Q22. What is text augmentation and when is it useful for NLP?

Technique	Method	Preserves Label?
Synonym replacement	Replace n words with synonyms (WordNet)	Usually yes
Back-translation	Translate to another language and back	Yes
Easy Data Augmentation (EDA)	Swap, insert, delete, replace	Yes
Contextual insertion (BERT)	Use MLM to insert plausible words	Yes
LLM paraphrase	Ask GPT-4 to rephrase	Yes

import nlpaug.augmenter.word as naw
import nlpaug.augmenter.sentence as nas

text = "The machine learning model achieved excellent performance."

# Synonym replacement
syn_aug = naw.SynonymAug(aug_src='wordnet', aug_p=0.3)
print(syn_aug.augment(text))

# Contextual word insertion with BERT
bert_aug = naw.ContextualWordEmbsAug(model_path='bert-base-uncased',
                                      action='insert', aug_p=0.2)
print(bert_aug.augment(text))

# Back-translation (requires translation model or API)
# English -> French -> English

HARD: Advanced NLP (Questions 23-30)

Q23. How do you fine-tune an LLM with LoRA for NLP tasks?

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
from datasets import Dataset

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    'mistralai/Mistral-7B-v0.1',
    torch_dtype=torch.bfloat16,
    device_map='auto'
)
tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-v0.1')
tokenizer.pad_token = tokenizer.eos_token

# LoRA configuration
lora_config = LoraConfig(
    r=16, lora_alpha=32,
    target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'],
    lora_dropout=0.05,
    task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, lora_config)

# Training
training_args = TrainingArguments(
    output_dir='./mistral-finetune',
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    bf16=True,
    logging_steps=10,
    save_strategy='epoch'
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    dataset_text_field='text',
    max_seq_length=512,
    args=training_args
)
trainer.train()
model.save_pretrained('./mistral-lora-adapter')  # save only LoRA weights

Q24. What is machine translation? How do modern neural MT systems work?

Key components:

Subword tokenization (shared BPE vocabulary across language pairs in multilingual models)
Encoder processes source tokens bidirectionally
Decoder generates target tokens autoregressively, attending to encoder states
Beam search at inference

from transformers import MarianMTModel, MarianTokenizer

# Helsinki-NLP models cover 1,000+ language pairs
model_name = 'Helsinki-NLP/opus-mt-en-hi'   # English to Hindi
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

texts = ["Machine learning is transforming every industry.",
         "PapersAdda helps you prepare for placements."]

inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True)
translated = model.generate(**inputs, num_beams=5)
results = tokenizer.batch_decode(translated, skip_special_tokens=True)
for src, tgt in zip(texts, results):
    print(f"EN: {src}")
    print(f"HI: {tgt}")
    print()

Q25. What is cross-lingual transfer learning? How does mBERT work?

mBERT (Multilingual BERT):

BERT architecture trained on 104 languages jointly
Shared WordPiece vocabulary across all languages
MLM objective on all languages (no cross-lingual signal; transfer is emergent)
Works surprisingly well for NER, classification across languages

XLM-RoBERTa (better than mBERT in 2026):

Trained on 100 languages, 2.5TB of text
Uses SentencePiece tokenization (language-agnostic)
No NSP, larger batch, more data = significantly better

from transformers import pipeline

# Cross-lingual NER: train on English, test on any of 104 languages
nlp_ner = pipeline('ner', model='xlm-roberta-large-finetuned-conll03-english',
                    aggregation_strategy='simple')

# Works on English
print(nlp_ner("Apple was founded by Steve Jobs in Cupertino."))

# Also works on other languages (zero-shot transfer)
# Works on multilingual input even without language-specific fine-tuning

Q26. How do you build a text classification system that handles 1 billion documents?

Production architecture:

1. Embedding phase (offline):
   - Use a fast encoder (MiniLM-L6: 6 layers, 384-dim, 5x faster than BERT-base)
   - Embed all documents, store in vector store (Qdrant, Weaviate, pgvector)
   - Batch embed on GPU: throughput ~5,000 docs/sec with MiniLM

2. Classification phase (online, two tiers):
   Tier 1 (fast): TF-IDF + LogReg or hashed features -> rules out 90% cases in <1ms
   Tier 2 (accurate): ANN retrieval -> find k-nearest labeled examples -> majority vote
   Tier 3 (hard cases only): BERT fine-tuned, run asynchronously

3. Active learning loop:
   - Flag low-confidence predictions for human review
   - Retrain classifier weekly with new labels

from sentence_transformers import SentenceTransformer
from sklearn.linear_model import SGDClassifier
import numpy as np

# Fast production encoder
encoder = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Streaming classification with online learning
clf = SGDClassifier(loss='hinge', random_state=42, n_jobs=-1)

def process_batch(texts, labels=None):
    embeddings = encoder.encode(texts, batch_size=128, show_progress_bar=False)
    if labels is not None:
        clf.partial_fit(embeddings, labels, classes=ALL_CLASSES)
    return clf.predict(embeddings)

Q27. What is the difference between semantic search and keyword search? How do you combine them?

Approach	How	Strength	Weakness
Keyword (BM25)	TF-IDF-based term matching	Exact terms, rare entities	Synonyms, paraphrases
Semantic (dense)	Cosine similarity of embeddings	Paraphrases, semantic meaning	Exact keyword miss
Hybrid	BM25 + dense, fuse scores	Best of both	More complex

Hybrid search with Reciprocal Rank Fusion (RRF):

from rank_bm25 import BM25Okapi
import faiss
import numpy as np

def reciprocal_rank_fusion(bm25_results, dense_results, k=60):
    """
    bm25_results: [(doc_id, score), ...] sorted by BM25 rank
    dense_results: [(doc_id, score), ...] sorted by semantic rank
    """
    rrf_scores = {}
    for rank, (doc_id, _) in enumerate(bm25_results):
        rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank + 1)
    for rank, (doc_id, _) in enumerate(dense_results):
        rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank + 1)
    return sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)

# Most production search engines (Elasticsearch, OpenSearch, Qdrant)
# support hybrid search natively in 2026

Q28. What is relation extraction and how is it approached in 2026?

"Microsoft was founded by Bill Gates in 1975." -> (Microsoft, founded_by, Bill Gates), (Microsoft, founded_in, 1975)

Approaches:

# Approach 1: BERT for relation classification (supervised)
from transformers import AutoModelForSequenceClassification

# Mark entities in text: "[E1] Microsoft [/E1] was founded by [E2] Bill Gates [/E2]"
# Fine-tune BERT to classify relationship type

# Approach 2: LLM-based (zero-shot in 2026)
from openai import OpenAI
client = OpenAI()

def extract_relations(text):
    prompt = f"""Extract all (subject, relation, object) triples from the text.
Format: subject | relation | object
One triple per line.

Text: {text}
Triples:"""

    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=0
    )
    return response.choices[0].message.content

# Approach 3: Information extraction with spaCy
import spacy
nlp = spacy.load("en_core_web_trf")
doc = nlp("Apple acquired DeepMind-like startup for $1 billion.")
# Extract noun chunks, dependency paths, and predicate relations

Q29. What is text generation evaluation in the LLM era? How do you go beyond BLEU?

BLEU and ROUGE measure n-gram overlap with a reference. They fail for open-ended generation where many valid outputs exist.

Modern evaluation approaches:

Method	Description	Tool
BERTScore	Contextual embedding similarity	bert-score library
LLM-as-judge	Ask GPT-4/Claude to rate response	OpenAI API
MT-Bench	GPT-4 evaluates multi-turn chat	LMSYS benchmark
AlpacaEval	Win rate vs reference model	AlpacaEval framework
Human evaluation	Gold standard, expensive	Amazon Mechanical Turk
Task-specific metrics	Pass@k for code, exact match for QA	Custom

# LLM-as-judge (increasingly standard in 2026)
from openai import OpenAI

def evaluate_with_llm(question, generated_answer, reference_answer):
    client = OpenAI()
    prompt = f"""Rate the quality of the generated answer on a scale of 1-10.

Question: {question}
Reference Answer: {reference_answer}
Generated Answer: {generated_answer}

Score (1-10) and brief justification:"""

    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=0
    )
    return response.choices[0].message.content

# For production LLM evaluation at scale, use Prometheus (open-source judge model)
# instead of paying for GPT-4 per call

Q30. How do you handle multilingual text in a production NLP system for Indian languages?

Stack for Indian language NLP in 2026:

1. Tokenization: IndicNLP library (language-aware) or SentencePiece
2. Base model: IndicBERT / MuRIL (Google, trained on 17 Indian languages)
3. Translation: AI4Bharat IndicTrans2 (best open-source IN translation)
4. Speech: Sarvam-1, Whisper multilingual
5. LLM: Namaste GPT, OpenHathi (Llama fine-tuned on Hindi)

from transformers import AutoTokenizer, AutoModel
import torch

# MuRIL: Multilingual Representations for Indian Languages
# Google's model trained on 17 Indian languages
muril_tokenizer = AutoTokenizer.from_pretrained('google/muril-base-cased')
muril_model = AutoModel.from_pretrained('google/muril-base-cased')

# Handle code-switched text (Hindi + English)
text = "yeh model bahut achha hai for sentiment analysis"  # Hindi-English mix
inputs = muril_tokenizer(text, return_tensors='pt')
with torch.no_grad():
    embeddings = muril_model(**inputs).last_hidden_state[:, 0, :]  # CLS

# IndicBERT (AI4Bharat) for Indic languages
indic_tokenizer = AutoTokenizer.from_pretrained('ai4bharat/indic-bert')

NLP Tools Comparison Table 2026

Task	Tool/Model	Notes
Tokenization	HuggingFace Tokenizers, SentencePiece	Fast, BPE and WordPiece
Classical NLP pipeline	spaCy	POS, NER, dependency, fast
Sentence embeddings	sentence-transformers	Semantic search
BERT fine-tuning	HuggingFace Transformers + Trainer	Standard in 2026
LLM inference	vLLM, TGI	Production serving
LLM fine-tuning	TRL + PEFT (LoRA)	Memory-efficient
Vector search	FAISS, Qdrant, pgvector	ANN for RAG
Evaluation	evaluate (HuggingFace)	ROUGE, BLEU, BERTScore

FAQ

Q: What is the best starting point for learning NLP in 2026?

A: Start with the HuggingFace NLP course (free). Then implement a BERT fine-tuning project on a classification task. Then build a small RAG system with FAISS and OpenAI.

Q: Is spaCy still relevant in 2026?

A: Yes for classical NLP pipelines: POS tagging, dependency parsing, rule-based NER, text preprocessing. For deep learning NLP tasks, HuggingFace is the standard.

Q: What is the difference between BERT and RoBERTa?

A: RoBERTa (Liu et al.) removes Next Sentence Prediction (NSP) from BERT, uses dynamic masking, trains with much more data and larger batches. Consistently outperforms BERT on all benchmarks.

Q: What is instruction tuning?

A: Fine-tuning a base LLM on instruction-response pairs so that it follows user instructions (e.g., "Summarize this text", "Write a poem about..."). InstructGPT, Alpaca, and FLAN-T5 use this technique.

Related articles on PapersAdda:

Sources and review notesreviewed 8 Jun 2026

Article-specific sources

Verification window

Page last edited 8 Jun 2026 by Aditya Sharma. A review date records an editorial edit, not a guarantee that every external fact is still current.

Evidence labels

Official notices, candidate reports, offer documents, and editorial practice questions carry different confidence levels. The visible source list lets you inspect the evidence instead of relying on a blanket verification badge.

Verification policy: /editorial-standards/. Found something incorrect? Submit a correction - we respond within 48 hours.

topic cluster

Sat this this year? Share your story, earn ₹500.

First-person experience reports help future candidates prep smarter. We pay verified contributors ₹500 via UPI per accepted story with byline.

Submit your story →

ready to practice?

Take a free timed mock test

Put what you learned into practice. Our mock tests match the 2026 pattern with timer, navigator, reveal, and score breakdown. No signup.

Start free mock test →

related guides

Interview Questions

Share this guide

Twitter LinkedIn W WhatsApp