RakeshKushwaha82

Friday, 13 March 2026

AI Interview Question Answer

Yahan per sabhi 50 questions English mein hain aur unke answers Hinglish (Hindi+English) mein diye gaye hain taaki aapko samajhne mein aasani ho.

---

### 1. What is tokenization, and why is it important in LLMs?

**ANSWER:** Tokenization text ko chhoti units mein todne ki process hai, jise **tokens** kehte hain. Yeh units words, subwords ya characters ho sakte hain. Jaise, 'tokenization' word ko chhote subwords jaise 'token' aur 'ization' mein toda ja sakta hai. Yeh step **crucial** hai kyunki LLMs raw text ko directly samajh nahi sakte. Iske bajaye, yeh sequences of numbers process karte hain jo in tokens ko represent karte hain. Effective tokenization model ko different languages handle karne, rare words ko manage karne aur vocabulary size ko reduce karne mein madad karta hai, jisse **efficiency aur performance** dono behtar hoti hai.

### 2. What is LoRA and QLoRA?

**ANSWER:** LoRA aur QLoRA techniques hain jo Large Language Models (LLMs) ki **fine-tuning** ko optimize karne ke liye design ki gayi hain. Inka focus memory usage ko reduce karna aur efficiency badhana hai bina performance se samjhauta kiye. **LoRA** ek parameter-efficient fine-tuning method hai jo naye trainable parameters introduce karta hai taaki model ke behavior ko modify kiya ja sake bina uski overall size badhaye. **QLoRA** LoRA par build karta hai by incorporating **quantization** (jaise 4-bit Normal Float) to further optimize memory usage, jisse model ke parameters compress ho jate hain aur memory aur bhi kam lagti hai.

### 3. What is beam search, and how does it differ from greedy decoding?

**ANSWER:** **Beam search** ek search algorithm hai jo text generation ke dauran sabse likely word sequence dhundhne ke liye use hota hai. **Greedy decoding** har step par sirf sabse high-probability wala ek word choose karta hai, jabki beam search multiple possible sequences ko parallel mein explore karta hai. Yeh top k candidates (beams) ka ek set maintain karta hai. Yeh high-probability sequences dhundhne aur alternative paths explore karne ke beech mein balance banata hai. Iski wajah se outputs zyada **coherent aur contextually appropriate** hote hain, khaas kar long-form text generation tasks mein.

### 4. Explain the concept of temperature in LLM text generation.

**ANSWER:** **Temperature** ek hyperparameter hai jo text generation mein randomness ko control karta hai. Yeh next possible tokens ke probability distribution ko adjust karta hai. **Low temperature (0 ke karib)** model ko highly deterministic banata hai, yani yeh sabse probable tokens ko prefer karta hai. **High temperature (1 se upar)** distribution ko flatten karke diversity badhata hai, jisse kam probable tokens bhi select ho sakte hain. Example ke liye, temperature 0.7 creativity aur coherence ke beech ka balance strike karta hai, jo diverse lekin sensible outputs generate karne ke liye suitable hai.

### 5. What is masked language modeling, and how does it contribute to model pretraining?

**ANSWER:** **Masked language modeling (MLM)** ek training objective hai jisme input ke kuch tokens randomly mask (hide) kar diye jate hain, aur model ka kaam hota hai unhe context ke basis par predict karna. Yeh model ko words ke beech contextual relationships seekhne ke liye force karta hai, jisse language semantics ko samajhne ki uski capability enhance hoti hai. MLM aam taur par BERT jaisé models mein use hota hai, jo is objective ke saath pretrained hote hain taaki fine-tuning se pehle unki language ki deep understanding develop ho.

### 6. What are Sequence-to-Sequence Models?

**ANSWER:** **Sequence-to-Sequence (Seq2Seq) Models** ek type ke neural network architecture hain jo ek sequence of data ko doosre sequence mein transform karne ke liye design kiye gaye hain. Yeh models un tasks mein common hain jahan input aur output ki lengths variable hoti hain, jaise ki **machine translation, text summarization, aur speech recognition** mein.

### 7. How do autoregressive models differ from masked models in LLM training?

**ANSWER:** **Autoregressive models (jaise GPT)** text generate karte hain ek token at a time, har naya token pehle generate ho chuke tokens ke basis par predict hota hai. Yeh sequential approach text generation jaise tasks ke liye ideal hai. **Masked models (jaise BERT)** randomly masked tokens ko sentence ke andar predict karte hain, dono left aur right context ka use karte hue. Autoregressive models generative tasks mein excel karte hain, jabki masked models understanding aur classification tasks ke liye better hote hain.

### 8. What role do embeddings play in LLMs, and how are they initialized?

**ANSWER:** **Embeddings** tokens ke dense, continuous vector representations hote hain jo semantic aur syntactic information capture karte hain. Yeh discrete tokens ko high-dimensional space mein map karte hain, jisse wo neural networks ke liye input ban jate hain. Embeddings ko aam taur par **randomly initialize** kiya jata hai ya phir pehle se pretrained vectors jaise Word2Vec ya GloVe ka use kiya jata hai. Training ke dauran, in embeddings ko fine-tune kiya jata hai taaki wo task-specific nuances capture kar sake, jisse model ka performance behtar hota hai.

### 9. What is next sentence prediction and how is useful in language modelling?

**ANSWER:** **Next Sentence Prediction (NSP)** ek key technique hai jo language modeling mein use hoti hai, khaas kar BERT jaise bade models ki training mein. NSP model ko do sentences ke beech ke relationship ko samajhne mein madad karta hai, jo question answering, dialogue generation aur information retrieval jaise tasks ke liye important hai. Pretraining ke dauran, model ko do sentences feed kiye jate hain: 50% times, second sentence actual next sentence hota hai (positive pair), aur 50% times wo ek random sentence hota hai (negative pair). Model ko classify karna hota hai ki second sentence sahi hai ya nahi, jisse uski overall language understanding improve hoti hai.

### 10. Explain the difference between top-k sampling and nucleus (top-p) sampling in LLMs.

**ANSWER:** **Top-k sampling** model ke choices ko har step par top k most probable tokens tak restrict karta hai, controlled randomness introduce karte hue. Jaise, k=10 set karne ka matlab hai model sirf 10 sabse likely tokens par hi consider karega. **Nucleus sampling (top-p sampling)** zyada dynamic approach apnata hai. Yeh un tokens ko select karta hai jinki cumulative probability ek threshold p (e.g., 0.9) se zyada ho jati hai. Yeh context ke basis par ek flexible candidate set banata hai, jo generated text mein diversity aur coherence dono ko promote karta hai.

### 11. How does prompt engineering influence the output of LLMs?

**ANSWER:** **Prompt engineering** ka matlab hai input prompts ko is tarah design karna ki LLM ka output effectively guide ho. LLMs input phrasing ke prati kaafi sensitive hote hain, isliye ek achhi tarah se design kiya gaya prompt response ki quality aur relevance ko kaafi influence kar sakta hai. Jaise, prompt mein context ya specific instructions dene se summarization ya question-answering jaise tasks mein accuracy improve ho sakti hai. Prompt engineering khaas kar **zero-shot aur few-shot learning** scenarios mein useful hai, jahan task-specific examples bahut kam hote hain.

### 12. How can catastrophic forgetting be mitigated in large language models (LLMs)?

**ANSWER:** **Catastrophic forgetting** tab hota hai jab LLM naye tasks seekhte time pehle se seekhe gaye tasks ko bhool jata hai, jisse uski versatility limit ho jati hai. Isse kam karne ke liye kai strategies use hoti hain:

* **Rehearsal methods:** Model ko purane aur naye data ke mix par dubara train kiya jata hai, jisse wo purane tasks ka knowledge retain kar paata hai.

* **Elastic Weight Consolidation (EWC):** Kuch model weights ko important mark kar diya jata hai, taaki naye tasks seekhte time wo critical knowledge protect kar sake.

* **Modular approaches (Prog Net, OFELs):** Naye tasks ke liye naye modules introduce kiye jate hain, jisse LLM bina purane knowledge ko overwrite kiye seekh sakta hai.

### 13. What is model distillation, and how is it applied to LLMs?

**ANSWER:** **Model distillation** ek technique hai jisme ek chhota, simple model (student) ko train kiya jata hai ki wo ek bade, complex model (teacher) ke behavior ko replicate kare. LLMs ke context mein, student model teacher ke **soft predictions** se seekhta hai, na ki hard labels se, jisse wo nuanced knowledge capture kar pata hai. Ye approach computational requirements aur memory usage ko kam karte hue similar performance maintain karti hai, jisse LLMs ko resource-constrained devices (jaise mobile) par deploy karna ideal ho jata hai.

### 14. How do LLMs handle out-of-vocabulary (OOV) words?

**ANSWER:** **Out-of-vocabulary (OOV)** words un words ko kehte hain jo model ne training ke dauran nahi dekhe hote. LLMs is issue ko **subword tokenization** techniques jaise Byte-Pair Encoding (BPE) aur WordPiece se address karte hain. Yeh methods OOV words ko chhote, known subword units mein tod dete hain. Jaise, 'unhappiness' word ko tokenize kiya ja sakta hai 'un', 'happi', aur 'ness' mein. Isse model un words ko samajh aur generate kar pata hai jo usne kabhi nahi dekhe, bas in subword components ka istemal karke.

### 15. How does the Transformer architecture overcome the challenges faced by traditional Sequence-to-Sequence models?

**ANSWER:** Transformer architecture traditional Seq2Seq models ki key limitations ko kai tarah se overcome karta hai:

* **Parallelization:** Transformers self-attention ka use karke tokens ko parallel mein process karte hain, jisse training aur inference dono fast hote hain.

* **Long-Range Dependencies:** Self-attention ki madad se Transformers long-range dependencies ko effectively capture karte hain. Model sequence mein kisi bhi part par focus kar sakta hai, chahe distance kitna bhi ho.

* **Positional Encoding:** Yeh ensure karta hai ki model token ke order ko samjhe.

* **Efficiency and Scalability:** Transformers bade datasets aur long sequences ke liye better scale karte hain.

* **Context Bottleneck:** Transformers decoder ko saare encoder outputs par attend karne dete hain, jisse context retention improve hoti hai.

### 16. What is overfitting in machine learning, and how can it be prevented?

**ANSWER:** **Overfitting** tab hota hai jab ek machine learning model training data par toh achha perform karta hai lekin unseen ya test data par bura perform karta hai. Aisa isliye hota hai kyunki model ne underlying patterns ke saath-saath data ke noise aur outliers ko bhi seekh liya hota hai. Overcome karne ke techniques:

* **Regularization (L1, L2):** Loss function mein penalty add karna.

* **Dropout:** Training ke dauran randomly neurons ko deactivate karna.

* **Data Augmentation:** Training dataset ko variations ke saath expand karna.

* **Early Stopping:** Training tab rok dena jab validation loss kam hona band ho jaye.

* **Simpler Models:** Features, parameters ya layers kam karke complexity reduce karna.

### 17. What are Generative and Discriminative models?

**ANSWER:** NLP mein, generative aur discriminative do key types ke models hain:

* **Generative Models:** Yeh underlying data distribution ko seekhte hain aur usse naye samples generate kar sakte hain. Yeh inputs aur outputs ki joint probability distribution (\(P(x,y)\)) ko model karte hain. Example: Ek language model jo next word predict karta hai.

* **Discriminative Models:** Yeh different classes ke beech decision boundary seekhne par focus karte hain. Yeh inputs ke basis par outputs ki conditional probability (\(P(y|x)\)) ko model karte hain taaki naye examples ko accurately classify kar sake. Example: Ek sentiment analysis model.

### 18. How is GPT-4 different from its predecessors like GPT-3 in terms of capabilities and applications?

**ANSWER:** GPT-4 mein GPT-3 ke comparison mein kai advancements hain:

* **Improved Understanding:** GPT-4 mein roughly 1 trillion parameters hain, jo GPT-3 ke 175 billion se kaafi zyada hai.

* **Multimodal Capabilities:** GPT-4 text aur images dono process kar sakta hai, jabki GPT-3 sirf text tha.

* **Larger Context Window:** GPT-4 25,000 tokens tak handle kar sakta hai, jabki GPT-3 ki limit 4,096 tokens thi.

* **Better Accuracy and Fine-Tuning:** GPT-4 factual accuracy mein behtar hai aur kam harmful information produce karta hai.

* **Language Support:** GPT-4 ne multilingual performance improve ki hai, 26 languages tak support with higher accuracy.

### 19. What are positional encodings in the context of large language models?

**ANSWER:** **Positional encodings** LLMs mein essential hain kyunki transformer architectures sequence order capture nahi kar pate. Transformers tokens ko self-attention ke through simultaneously process karte hain, isliye wo token order se unaware hote hain. Positional encodings model ko sequence of words samajhne ke liye zaroori information provide karte hain.

* **Additive Approach:** Encodings ko input word embeddings mein add kar diya jata hai, merging static representations with positional data.

* **Sinusoidal Function:** Models trigonometric functions (Sine and Cosine) ka use karke position aur dimension ke basis par yeh encodings generate karte hain.

### 20. What is Multi-head attention?

**ANSWER:** **Multi-head attention** single-head attention ka enhancement hai. Yeh model ko different representation subspaces se simultaneously information attend karne ki capability deta hai. Ek single attention mechanism ke istemal ke bajaye, multi-head attention queries, keys, aur values ko multiple subspaces mein project karta hai alag-alag learned linear transformations ke through. Attention function har projected version par parallel mein apply hota hai, jisse multiple output vectors generate hote hain jinko phir combine karke final result banaya jata hai. Ye approach model ki complex patterns aur relationships ko capture karne ki capability improve karta hai.

### 21. Derive the softmax function and explain its role in attention mechanisms.

**ANSWER:** Softmax function ek vector of real numbers ko probability distribution mein transform karta hai. Ek input vector \(x\) ke liye, i-th element ke liye softmax function defined hai: \(\text{softmax}(x_i) = \exp(x_i) / \sum \exp(x_j)\). Yeh ensure karta hai ki saare output values 0 aur 1 ke beech mein hon aur unka sum 1 ho, jisse unhe probabilities ki tarah interpret kiya ja sakta hai. Attention mechanisms mein, softmax ko attention scores par apply kiya jata hai unhe normalize karne ke liye. Isse model output generate karte time alag-alag tokens ko alag level ki importance de pata hai, aur input sequence ke sabse relevant parts par focus kar pata hai.

### 22. How is the dot product used in self-attention, and what are its implications for computational efficiency?

**ANSWER:** Self-attention mein, **dot product** ka use query (Q) aur key (K) vectors ke beech similarity (alignment) calculate karne ke liye hota hai. Attention scores is formula se compute hote hain: Attention(Q, K, V) = softmax(QK^T / √d_k)V. Dot product model ko decide karne mein madad karta hai ki kis token par zyada focus karna hai. Effective hone ke bawajood, iski quadratic complexity O(n²) (sequence length ke saath) long sequences ke liye ek challenge hai, isliye isse efficient banane ke liye naye approximations develop kiye ja rahe hain.

### 23. Explain cross-entropy loss and why it is commonly used in language modeling.

**ANSWER:** **Cross-entropy loss** predicted probability distribution aur true distribution (correct token ki one-hot encoding) ke beech ke difference ko measure karta hai. Iska formula hai: \(L = - \sum y_i \log(\hat{y}_i)\), jahan \(y_i\) true label hai aur \(\hat{y}_i\) predicted probability hai. Cross-entropy loss incorrect predictions ko zyada penalize karta hai, jisse model correct class ke liye probabilities ko 1 ke kareeb lane ke liye encourage hota hai. Language modeling mein, yeh ensure karta hai ki model sequence mein sahi token ko high confidence ke saath predict kare.

### 24. How do you compute the gradient of the loss function with respect to embeddings?

**ANSWER:** Loss L ka gradient ek embedding vector e ke respect mein compute karne ke liye, hum **chain rule** apply karte hain: \(dL / de = (dL / d\text{logits}) * (d\text{logits} / de)\). Yahan \(dL / d\text{logits}\) output logits ke respect mein loss ka gradient hai, aur \(d\text{logits} / de\) embeddings ke respect mein logits ka gradient hai. **Backpropagation** in gradients ko network layers ke through propagate karta hai, aur embedding vectors ko adjust karta hai taaki loss minimize ho.

### 25. What is the role of the Jacobian matrix in backpropagation through a transformer model?

**ANSWER:** **Jacobian matrix** ek vector-valued function ke outputs ke har element ke inputs ke respect mein partial derivatives ko represent karta hai. Backpropagation mein, yeh capture karta hai ki output vector ka har element har input ke saath kaise badalta hai. Transformer models ke liye, Jacobian multi-dimensional outputs ke gradients compute karne mein essential hai. Yeh ensure karta hai ki har parameter (including weights aur embeddings) sahi tarike se update ho taaki loss function minimize ho.

### 26. Explain the concept of eigenvalues and eigenvectors in the context of matrix factorization for dimensionality reduction.

**ANSWER:** Eigenvalues aur eigenvectors matrices ki structure samajhne ke liye fundamental hain. Ek matrix A ke liye, eigenvector v aur eigenvalue \(\lambda\) equation ko satisfy karte hain: \(Av = \lambda v\). Dimensionality reduction techniques jaise PCA (Principal Component Analysis) mein, eigenvectors principal components ko represent karte hain, aur eigenvalues batate hain ki har component kitna variance capture karta hai. Sabse bade eigenvalues wale components ko select karne se dimensionality reduce hoti hai jabki data ka zyada-tar variance preserve ho jata hai.

### 27. How is the KL divergence used in evaluating LLM outputs?

**ANSWER:** **KL divergence (Kullback-Leibler divergence)** do probability distributions ke beech difference measure karta hai: true distribution P aur predicted distribution Q. Formula: \(KL(P||Q) = \sum P(x) * \log(P(x) / Q(x))\). LLMs mein, yeh measure karta hai ki model ki predicted distribution target distribution se kitni deviate hoti hai. Kam KL divergence ka matlab hai ki model ki predictions true labels se closely match karti hain, isliye yeh language models ko evaluate aur fine-tune karne mein ek useful metric hai.

### 28. Derive the formula for the derivative of the ReLU activation function and discuss its significance.

**ANSWER:** **ReLU (Rectified Linear Unit)** function defined hai: \(f(x) = \max(0, x)\). Iska derivative hai: \(f'(x) = 1\) agar \(x > 0\), otherwise 0. ReLU non-linearity introduce karta hai jabki computational efficiency maintain karta hai. Iski sparsity (negative inputs ke liye zero output) **vanishing gradient problem** ko kam karne mein madad karti hai, isliye yeh deep learning models, including LLMs, mein ek popular choice hai.

### 29. What is the chain rule in calculus, and how does it apply to gradient descent in deep learning?

**ANSWER:** **Chain rule** batata hai ki ek composite function \(f(g(x))\) ka derivative kaise nikale: \(d/dx[f(g(x))] = f'(g(x)) * g'(x)\). Deep learning mein, chain rule ka use **backpropagation** mein hota hai taaki loss function ka gradient har parameter ke respect mein layer-by-layer compute kiya ja sake. Yeh gradient descent ko weights efficiently update karne ki capability deta hai, error signals ko network ke through backward propagate karte hue.

### 30. How do you compute the attention scores in a transformer, and what is their mathematical interpretation?

**ANSWER:** Transformer mein attention scores is tarah compute hote hain: Attention(Q, K, V) = softmax(QK^T / √d_k)V. Yahan Q (queries), K (keys), aur V (values) input ke learned representations hain. Dot product (QK^T) queries aur keys ke beech similarity measure karta hai. √d_k se scale karne se values excessively bade hone se bachte hain, jisse gradients stable rehte hain. Softmax function in scores ko normalize karta hai, har query ke liye sabse relevant tokens par zor deta hai, aur model ka focus generation ke dauran guide karta hai.

### 31. In what ways does Gemini's architecture optimize training efficiency and stability compared to other multimodal LLMs like GPT-4?

**ANSWER:** Gemini ka architecture training efficiency aur stability ko multimodal models jaise GPT-4 ke comparison mein kai tarah se optimize karta hai:

* **Unified Multimodal Design:** Gemini text aur image processing ko ek hi model mein integrate karta hai, jisse parameter sharing improve hoti hai aur complexity reduce hoti hai.

* **Cross-Modality Attention:** Text aur images ke beech enhanced interactions better learning aur training ke dauran stability ki taraf le jate hain.

* **Data-Efficient Pretraining:** Self-supervised aur contrastive learning ka use karke Gemini kam labeled data ke saath bhi efficiently train ho jata hai.

* **Balanced Objectives:** Text aur image losses ka better synchronization ensure karta hai ki training stable rahe aur convergence smooth ho.

### 32. What are different types of Foundation Models?

**ANSWER:** Foundation models bade-scale AI models hote hain jo vast amounts of unlabeled data par unsupervised methods se trained hote hain. Yeh general-purpose knowledge seekhne ke liye design kiye jate hain jise various tasks par apply kiya ja sakta hai. Common Types:

* **Language Models:** Tasks: Machine translation, text summarization, question answering. Examples: BERT, GPT-3.

* **Computer Vision Models:** Tasks: Image classification, object detection, image segmentation. Examples: ResNet, VGGNet.

* **Generative Models:** Tasks: Creative writing, image generation, music composition. Examples: DALL-E, Imagen.

* **Multimodal Models:** Tasks: Image captioning, visual question answering. Examples: PaLM, LaMDA.

### 33. How does Parameter-Efficient Fine-Tuning (PEFT) prevent catastrophic forgetting in LLMs?

**ANSWER:** **Parameter-Efficient Fine-Tuning (PEFT)** LLMs mein catastrophic forgetting ko rokne mein madad karta hai by updating sirf ek chhota sa set of task-specific parameters, jabki model ke zyada-tar original parameters frozen (unchanged) rakh diye jate hain. Ye approach model ko naye tasks ke liye adapt hone deta hai bina pehle se seekhe gaye knowledge ko overwrite kiye. Isse model apni core capabilities ko retain karte hue naye information efficiently seekh leta hai.

### 34. What are the key steps involved in the Retrieval-Augmented Generation (RAG) pipeline?

**ANSWER:** Retrieval-Augmented Generation (RAG) pipeline ke key steps hain:

1. **Retrieval:** Query ko encode kiya jata hai aur precomputed document embeddings ke saath compare karke relevant documents retrieve kiye jate hain.

2. **Ranking:** Retrieve kiye gaye documents ko query ke saath unki relevance ke basis par rank kiya jata hai.

3. **Generation:** Top-ranked documents ko context ki tarah LLM ko provide kiya jata hai, jisse wo zyada informed aur accurate responses generate kar pata hai. Yeh hybrid approach model ki capability ko enhance karta hai by incorporating external knowledge during generation.

### 35. How does the Mixture of Experts (MoE) technique improve LLM scalability?

**ANSWER:** **Mixture of Experts (MoE)** technique LLM scalability ko improve karti hai by using a **gating function** jo har input ke liye sirf kuch expert models (sub-networks) ko activate karti hai, poore model ko nahi. Is selective activation se:

* **Computational load kam hota hai** - Sirf kuch experts har query ke liye active hote hain, jisse resource usage minimize hota hai.

* **High performance maintain hota hai** - Model dynamically har input ke liye sabse relevant experts ko select karta hai, ensuring ki task complexity effectively handle ho.

MoE LLMs ko efficiently scale karne ki capability deta hai, jisse billions of parameters wale bade models ko computational costs control karte hue banana possible hota hai.

### 36. What is Chain-of-Thought (CoT) prompting, and how does it improve complex reasoning in LLMs?

**ANSWER:** **Chain-of-Thought (CoT) prompting** LLMs ko complex reasoning handle karne mein madad karta hai by encouraging them to break down tasks into chhote, sequential steps. Yeh performance improve karta hai:

* **Simulating human-like reasoning** - CoT model ko step-by-step approach apnane ko kehta hai, jaisa ki humans complex issues solve karte hain.

* **Enhancing multi-step task performance** - Yeh particularly un tasks ke liye effective hai jisme logical reasoning ya multi-step calculations shamil hote hain.

* **Increasing accuracy** - Model ko structured thought process se guide karke, CoT errors kam karta hai aur intricate queries par performance improve karta hai. CoT LLMs ki interpretability aur reliability un tasks mein improve karta hai jahan deeper reasoning ki zaroorat hoti hai.

### 37. What is the difference between discriminative AI and Generative AI?

**ANSWER:**

* **Predictive/Discriminative AI:** Iska focus existing data ke basis par data ko predict ya classify karna hai. Yeh conditional probability \(P(y|x)\) ko model karta hai, jahan \(y\) target variable hai aur \(x\) input features. Examples: classification tasks (image recognition), regression, spam detection.

* **Generative AI:** Iska focus naye data samples generate karna hai jo training data se milte-julte hon. Yeh joint probability \(P(x,y)\) ko model karta hai, jisse yeh data ke naye instances create kar sakta hai. Examples: text, images, music generate karna using GANs, VAEs, GPT.

### 38. How does knowledge graph integration enhance LLMs?

**ANSWER:** Knowledge graphs ko LLMs ke saath integrate karne se performance enhance hoti hai by adding structured, factual knowledge. Key benefits:

* **Factual accuracy:** Model knowledge graph ke against information cross-check kar sakta hai, jisse hallucinations (galat information) kam hoti hai aur correctness improve hoti hai.

* **Enhanced reasoning:** Knowledge graphs entities ke beech relationships ka istemal karke logical reasoning support karte hain, jisse complex queries ko handle karna behtar ho jata hai.

* **Contextual understanding:** Structured data model ko context aur relationships samajhne mein madad karta hai, jisse response quality improve hoti hai. Ye integration khaas kar question answering, entity recognition, aur recommendation systems jaise tasks mein valuable hai.

### 39. What is zero-shot learning, and how does it apply to LLMs?

**ANSWER:** **Zero-shot learning** LLMs ko un tasks ko perform karne ki capability deta hai jinke liye unhe explicitly train nahi kiya gaya. Yeh model ki language aur general concepts ki broad understanding ka fayda uthata hai. Task-specific fine-tuning ki zaroorat nahi hoti, balki prompt mein di gayi instructions ke basis par model relevant outputs generate kar leta hai. Example:

* **Text classification:** Model bina specific training ke, sirf prompt ke context ko samajh kar text categorize kar sakta hai.

* **Translation or summarization:** LLMs diye gaye instructions ka use karke text translate ya summarize kar sakte hain, bina task-specific fine-tuning kiye. Yeh LLMs ki tasks ke across generalize karne ki capability ko dikhata hai, jisse wo versatile ban jate hain.

### 40. How does Adaptive Softmax speed up large language models?

**ANSWER:** **Adaptive Softmax** LLMs ko accelerate karta hai by categorizing words ko frequency groups mein. Isse infrequent (kam aane wale) words ke liye computations kam ho jate hain. Ye approach overall computational costs ko lower karta hai jabki accuracy ko preserve rakhta hai, jisse bade vocabularies ko efficiently manage karna effective ho jata hai.

### 41. What is the vanishing gradient problem, and how does the Transformer architecture address it?

**ANSWER:** **Vanishing gradient problem** tab hota hai jab gradients backpropagation ke dauran diminish (kam) hote chale jate hain, jisse deep networks effectively seekh nahi pate, khaas kar RNNs jaise models mein jo long sequences handle karte hain. Transformers ise address karte hain:

* **Self-Attention Mechanism:** Sequence ke saare tokens ke beech relationships simultaneously capture karta hai, sequential dependencies se bachta hai aur time ke saath gradient shrinkage ko rokta hai.

* **Residual Connections:** Layers ke beech skip connections gradients ko directly flow karne dete hain, ensuring wo backpropagation ke dauran strong rahein.

* **Layer Normalization:** Har layer ke andar inputs normalize karta hai, gradient updates ko stabilize karta hai aur vanishing/exploding gradients ko rokta hai.

Yeh innovations deep models ko efficiently seekhne mein madad karte hain, even for long sequences, jo ki earlier architectures ki limitations ko overcome karta hai.

### 42. Explain the concept of 'few-shot learning' in LLMs and its advantages.

**ANSWER:** **Few-shot learning** LLMs ki wo capability hai jisme model sirf kuch examples ke saath hi naye tasks ko samajh aur tackle kar leta hai. Ye model ke extensive pre-training ki wajah se possible hota hai, jo use limited data se generalize karne ki capability deta hai. Main benefits:

* **Reduced Data Needs:** Achha perform karne ke liye kam examples ki zaroorat hoti hai, jisse large, task-specific datasets ki need minimize ho jati hai.

* **Increased Flexibility:** Model minimum additional training ke saath hi various tasks par easily adapt ho sakta hai.

* **Cost Efficiency:** Extensive data ki kam zaroorat aur reduced training times ki wajah se, data collection aur computational resources se judi costs kam ho jati hain.

### 43. You're working on an LLM, and it starts generating offensive or factually incorrect outputs. How would you diagnose and address this issue?

**ANSWER:** Agar LLM offensive ya galat outputs generate kar raha hai, to main sabse pehle:

1. **Patterns analyze karunga,** input prompts check karunga, aur assess karunga ki issue training data ke biases ya gaps ki wajah se to nahi aa raha.

2. **Preprocessing pipeline** ko errors ya biases ke liye review karunga aur **dataset ko imbalances** ke liye examine karunga.

3. **Model ki architecture, hyperparameters, aur fine-tuning** ko evaluate karunga koi structural issue identify karne ke liye.

Solutions mein shamil ho sakte hain: **adversarial training, debiasing techniques, data augmentation,** ya zyada balanced dataset ke saath **retraining**.

### 44. How is the encoder different from the decoder?

**ANSWER:** Transformer architecture mein, encoder aur decoder ke different purposes hain:

* **Encoder:** Input data ko process karta hai aur use abstract representations ke set mein transform kar deta hai. Ye basically **input ko samajhne** ke liye responsible hota hai.

* **Decoder:** In representations ko leta hai aur **final output generate** karta hai. Ye apni generation ke time encoder se information aur sequence mein pehle se generate elements dono ka use karta hai.

### 45. What are the main differences between LLMs and traditional statistical language models?

**ANSWER:**

* **Architecture:** LLMs transformers aur self-attention par based hote hain, jo long-range dependencies easily capture kar lete hain. Traditional models (jaise N-grams, HMMs) isse struggle karte hain.

* **Scale:** LLMs mein billions of parameters hote hain aur yeh massive datasets par train hote hain, jisse better generalization aati hai. Traditional models chhote aur task-specific hote hain.

* **Training:** LLMs unsupervised pre-training aur fine-tuning se guzarte hain. Traditional models har task ke liye labeled data ke saath supervised learning par rely karte hain.

* **Context:** LLMs contextual embeddings generate karte hain, yani word ka meaning context ke hisaab se badalta hai. Traditional models static embeddings use karte hain.

### 46. What is a 'context window'?

**ANSWER:** Large language models (LLMs) mein **'context window'** text ka wo span hota hai (tokens ya words mein measure kiya jata hai) jise model kisi bhi samay process kar sakta hai jab wo language generate ya interpret kar raha hota hai. Iski importance isliye hai kyunki yeh model ki coherent aur contextually relevant responses produce karne ki capability ko influence karta hai. **Bada context window** model ko zyada surrounding information incorporate karne ki capability deta hai, lekin yeh **computational demands** ko bhi badhata hai, jisse performance aur resource efficiency ke beech ek trade-off create hota hai.

### 47. What is a hyperparameter?

**ANSWER:** **Hyperparameter** wo parameter hota hai jo training process shuru hone se pehle set kiya jata hai aur yeh influence karta hai ki model ko kaise train kiya jayega. Yeh training process ke aspects ko control karta hai aur developer ya researcher ise prior knowledge ya experimentation ke basis par choose karta hai. Common examples:

* Train-test split ratio.

* Optimization algorithms mein learning rate (e.g., gradient descent).

* Optimization algorithm ka choice (e.g., Adam optimizer).

* Activation function ka choice (e.g., Sigmoid, ReLU, Tanh).

### 48. Can you explain the concept of attention mechanisms in transformer models?

**ANSWER:** High level par, **attention** mechanism model ko input sequence ke different parts par focus karne ki capability deta hai jab wo predictions kar raha ho. Har token ko equal treat karne ke bajaye, model un relevant words ko **'attend'** karna seekhta hai jo current prediction mein sabse zyada contribute karte hain, chahe unki position kuch bhi ho. Jaise, sentence "The dog chased the ball because it was fast" mein, attention mechanism model ko yeh samajhne mein madad karta hai ki 'it' likely refers to 'the ball' context ke basis par.

### 49. What are Large Language Models?

**ANSWER:** **Large Language Model (LLM)** ek AI system hai jo vast amounts of text par train kiya jata hai taaki human-like language ko samajh sake, generate kar sake aur predict kar sake. Yeh data ke patterns, context aur relationships ko seekhta hai taaki relevant aur coherent responses produce kar sake. LLMs kai tarah ke tasks handle kar sakte hain, jaise sawaalon ke jawab dena, text summarize karna, translations karna aur creative writing.

### 50. What are some common challenges associated with using LLMs?

**ANSWER:** LLMs ke use se kai common challenges judi hain:

* **Computational Resources:** Inhe training aur deployment dono ke liye substantial computing power aur memory ki zaroorat hoti hai.

* **Bias and Fairness:** LLMs apne training data se biases seekh aur reproduce kar sakte hain, jisse unfair outputs aa sakte hain.

* **Interpretability:** LLMs ke complex aur opaque nature ki wajah se yeh samajhna aur explain karna challenging ho sakta hai ki unhone koi decision kyun liya.

* **Data Privacy:** Large datasets par training data privacy aur security ke concerns raise kar sakti hai.

* **Cost:** LLMs ko develop, train aur deploy karna costly ho sakta hai, jiska use chhoti organizations ke liye limit ho sakta hai.

Thursday, 26 February 2026

Power Automate Key Default Timeouts

Key Default Timeouts

Component	Default Timeout Limit	Notes
Overall Cloud Flow Run	30 days	This is the maximum time a flow can run before timing out, including pending steps like approvals.
Synchronous Actions (e.g., HTTP requests)	120 seconds (2 minutes)	This applies to operations where the flow waits for an immediate response.
Individual Action Connections	30 seconds (for most)	This is the time an action waits to establish a connection or receive a response before retrying or failing.
"Do Until" Loops	1 hour	This default can be adjusted up to a maximum of 5,000 iterations or a custom timeout.
Approvals (if not specified)	30 days	The "Start and wait for an approval" action will wait up to the flow's maximum run duration if no specific timeout is set. It is a best practice to set a shorter timeout for this action.
Desktop Flow Session Creation	3 minutes	The time allowed for Power Automate for desktop to create a Windows session for unattended runs.

How to Adjust Timeouts

You can configure timeout and retry policies for many individual actions within the flow designer.

Select the ellipsis (...) menu on the desired action card.
Select Settings.
Configure the Timeout (using the ISO 8601 duration format, e.g., PT5M for 5 minutes) and Retry Policy.
Select Done and save your flow.

Sunday, 30 November 2025

Tokens and Context Windows in LLMs

Last Updated : 23 Jul, 2025

In Large Language Models (LLMs), understanding the concepts of tokens and context windows is essential to comprehend how these models process and generate language.

What are Tokens?

In the context of LLMs, a token is a basic unit of text that the model processes. A token can represent various components of language, including:

Words: In many cases, a token corresponds to a single word (e.g., "apple," "run," "quick").
Subwords: For languages with a rich morphology or for more efficient processing, words may be split into subword tokens. For example, "unhappiness" might be split into "un," "happi," and "ness."
Punctuation: Punctuation marks like periods, commas, and exclamation marks are also treated as individual tokens.
Special Tokens: Special tokens are used for specific purposes, such as indicating the beginning or end of a sentence, padding tokens, or tokens for unknown words.

Tokenization is the process of breaking down text into these smaller units. Different models use different tokenization methods.

LLMs have a maximum number of tokens they can process in a single request. This limit includes both the input (prompt) and the output (generated text).

For example:

GPT-4 has a context window of 8,192 tokens, with some versions supporting up to 32,768 tokens.
Exceeding this limit requires truncating or splitting the text.

What is Context Window?

A context window refers to the span of text (usually in terms of tokens) that a model can consider at one time when making predictions or generating text. In simpler terms, it is the "lookback" or the amount of previous information that the model uses to make sense of the current input.

LLMs, such as GPT-based models, rely heavily on context windows to predict the next token in a sequence. The larger the context window, the more information the model can access to understand the meaning of the text. However, context windows are finite, meaning that models can only consider a certain number of tokens from the input sequence before the context is truncated.

Importance of Context Windows

Understanding Relationships: The context window helps the model understand relationships between tokens and words. For example, the context window allows the model to capture sentence structure, grammar, and even long-range dependencies (like subject-verb agreement).

Text Generation: When generating text, the context window allows the model to predict the next word or token based on the input text. The model's ability to generate coherent and contextually relevant text relies on having enough context.

The size of the context window directly impacts the model’s performance. If the window is too small, the model may lose the ability to consider important context, which can affect accuracy and coherence. On the other hand, larger context windows require more computation and memory, which can increase processing time and cost.

Tokens and Context Window in Modern LLMs

Tokenization in LLMs

Modern LLMs typically use a form of subword tokenization (e.g., Byte Pair Encoding, WordPiece, or SentencePiece) to handle a diverse vocabulary. This method ensures that words or phrases are broken down into smaller, more manageable parts, allowing the model to handle a broader range of inputs without requiring an immense vocabulary.

For example, using subword tokenization, the word "unbelievable" might be split into the following tokens: "un," "believ" and "able".

This way, even words that the model has never seen before can be processed effectively.

Context Windows in Transformer Models

Transformer-based models, such as GPT, BERT, and T5, leverage self-attention mechanisms that allow the model to focus on different parts of the input sequence. The context window in these models is defined by the maximum number of tokens that can be processed in parallel.

For example, GPT-3 has a context window of 2048 tokens, meaning it can process up to 2048 tokens at once when making predictions or generating text.

As the model moves through the text, the context window "slides" over the sequence, considering the most recent tokens within the window. This sliding window approach allows the model to maintain relevance to the most recent parts of the input while discarding older, less relevant tokens.

The following table outlines the tokenization technique and context window size of LLMs:

Model	Tokenization Method	Context Window Size
GPT-3	Byte Pair Encoding (BPE)	2048 tokens
GPT-4	Byte Pair Encoding (BPE)	8192 tokens (varies by configuration)
BERT	WordPiece	512 tokens
T5	SentencePiece	Varies (typically 512–1024)
Llama 3.1 8B	Byte Pair Encoding (BPE)	128,000 tokens
DeepSeek-R1-Distill-Llama-70B	Byte Pair Encoding (BPE)	128,000 tokens
Llama-3.3-70B-SpecDec	Byte Pair Encoding (BPE)	8,192 tokens

Trade-offs and Considerations

Efficiency vs. Accuracy: A larger context window improves the model's ability to generate coherent text and understand complex relationships in the input. However, larger context windows require more computational resources, both in terms of memory and processing time. Balancing efficiency and accuracy is a critical consideration when designing LLMs.
Memory Limitations: LLMs are constrained by the available memory. A larger context window means that the model must allocate more memory for storing tokens and their relationships. When the context window exceeds the model's capacity, earlier tokens may be discarded, potentially leading to a loss of important context.
Fixed Context Windows: Some models have fixed context windows, meaning that once the window size is set during training, it cannot be changed. This limitation may affect the model's ability to handle longer text inputs, forcing truncation or the use of techniques like sliding windows.
Sliding Context Windows: To address the limitations of fixed context windows, some models use a sliding window approach, where the context is updated as the model processes new tokens. This method ensures that the model always operates within the context window, but it may cause some loss of global information as tokens "fall out" of the window.

Understanding these concepts is key to optimizing LLM performance, whether you're training a new model or working with existing ones. As the field of natural language processing continues to evolve, future innovations may focus on improving how models handle tokens and context windows to create even more powerful and efficient LLMs.