<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Tour with GenAI]]></title><description><![CDATA[Tour with GenAI]]></description><link>https://blogging.pritombiswas.com</link><generator>RSS for Node</generator><lastBuildDate>Mon, 18 May 2026 17:43:40 GMT</lastBuildDate><atom:link href="https://blogging.pritombiswas.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Semantic Routing]]></title><description><![CDATA[What is Semantic Routing?
We learnt about “Logical Routing” Previously. If you hadn’t read it, please go read it. Today’s article greatly depends on that.
A Scenario:
One day, Jack went to a library and asked the librarian this question:“I need help ...]]></description><link>https://blogging.pritombiswas.com/semantic-routing</link><guid isPermaLink="true">https://blogging.pritombiswas.com/semantic-routing</guid><category><![CDATA[ChaiCode]]></category><category><![CDATA[ChaiCohort]]></category><category><![CDATA[genai]]></category><category><![CDATA[GenAI Cohort]]></category><category><![CDATA[2Articles1Week]]></category><dc:creator><![CDATA[Pritom Biswas]]></dc:creator><pubDate>Sat, 14 Jun 2025 11:58:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1749835058013/01034aad-d8fd-4269-8336-24f286b8b5bd.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-what-is-semantic-routing">What is Semantic Routing?</h2>
<p>We learnt about “<a target="_blank" href="https://blogging.pritombiswas.com/logical-routing">Logical Routing</a>” Previously. If you hadn’t read it, please go read it. Today’s article greatly depends on that.</p>
<h3 id="heading-a-scenario">A Scenario:</h3>
<p>One day, Jack went to a library and asked the librarian this question:<br /><strong>“<em>I need help making my web pages more interactive.”</em></strong></p>
<p>There were two librarians in the library: Logical Lara and Semantic Sara. Both managed their sections.</p>
<hr />
<p>Logical Lara strongly maintained “Logic”. She pulled out her rulebook and started checking these fields:</p>
<ul>
<li><p>Contains "web"? ✓ Check the Web Development section</p>
</li>
<li><p>Contains "interactive"? ✓ Check the JavaScript shelf</p>
</li>
<li><p>Contains "pages"? ✓ Look in the HTML documentation</p>
</li>
</ul>
<p>Logic Lara hands you a stack of advanced JavaScript framework manuals, DOM manipulation guides, and complex API references. Technically correct, but overwhelming for someone who might just need to understand basic event handling.</p>
<hr />
<p>On the other hand, Semantic Sara thought for a while. She didn’t bring out her handbook. She analyzed your question and found only “web page” and “interactive” words, which would direct to a context. But no specification like “what type of interactiveness”, “in which framework”. From this analysis, she reached this conclusion:</p>
<p><em>"This person sounds like they're at the beginning of their interactive web development journey. They're not asking about specific frameworks or advanced concepts—they want to understand how to make things happen when users click, type, or interact with their web pages."</em></p>
<p>Sara walks you directly to a beginner-friendly section with interactive tutorials, starts with simple click handlers, and shows you a progression path from basic interactions to more complex features. She understood not just your words, but your <strong>intent, context, and level of expertise</strong>.</p>
<hr />
<div data-node-type="callout">
<div data-node-type="callout-emoji">🤔</div>
<div data-node-type="callout-text">Now, tell me which result is more reasonable according to you?</div>
</div>

<p>What Semantic Sara did was the foundation of Semantic Routing: <strong>Analysis.</strong></p>
<h3 id="heading-definition">Definition:</h3>
<p>Semantic routing is an intelligent routing technique that makes decisions based on the <strong>meaning and context</strong> of queries rather than just keyword matching. Unlike logical routing, which uses predefined rules and patterns, semantic routing understands the <strong>intent, context, and semantic relationships</strong> within the text.</p>
<h2 id="heading-deep-dive">Deep Dive:</h2>
<h3 id="heading-how-does-it-work">How does it work?</h3>
<p>Let’s understand this by the question: “<strong><em>Help with web interactivity</em></strong>*”*</p>
<ol>
<li><p><strong>Query Preprocessing:</strong></p>
<ol>
<li><p><strong>Pre-processing:</strong></p>
<ul>
<li><p>Text normalization (lowercase, remove special characters)</p>
</li>
<li><p>Tokenization: ["help", "with", "web", "interactivity"]</p>
</li>
<li><p><strong>Stop word removal</strong> (remove common words like <code>i</code>, <code>need</code>, <code>help</code>, <code>with</code>):<br />  Result: <code>["web", "interactivity"]</code></p>
<p>  Stop words do not contribute much to semantic search</p>
</li>
</ul>
</li>
<li><p><strong>Embedding Generation:</strong></p>
<ul>
<li><p>Uses pre-trained language models (BERT, Sentence-BERT, OpenAI embeddings)</p>
</li>
<li><p>Converts text to a high-dimensional vector: <code>[0.2, 0.8, -0.1, 0.4, ...]</code></p>
</li>
<li><p>Captures semantic meaning, not just keywords</p>
</li>
</ul>
</li>
<li><p><strong>Context Analysis:</strong></p>
<ul>
<li><p><strong>Intent Detection:</strong> "Help-seeking" + "Learning-oriented"</p>
</li>
<li><p><strong>Domain Analysis:</strong> Web development</p>
</li>
<li><p><strong>Complexity Assessment:</strong> Beginner level (simple language, broad request)</p>
</li>
</ul>
</li>
</ol>
</li>
<li><p><strong>Knowledge Base (KB) Representation:</strong></p>
<p> Each knowledge base is represented as a collection of semantic vectors that capture the meaning and context of its content. It serves as the central repository against which incoming queries are semantically evaluated. Generally, it is set up at the first of the system. Look below for more clarification:</p>
<pre><code class="lang-markdown"> # HTML/CSS KB Vector: [0.4, 0.8, 0.3, ...]
<span class="hljs-bullet"> -</span> High values for: "visual", "interaction", "beginner", "elements"
<span class="hljs-bullet"> -</span> Represents: Basic web interactivity, styling, simple events

 # JavaScript KB Vector: [0.1, 0.9, 0.2, ...]  
<span class="hljs-bullet"> -</span> High values for: "programming", "functions", "advanced", "logic"
<span class="hljs-bullet"> -</span> Represents: Programming concepts, complex interactions

 # React KB Vector: [0.3, 0.7, 0.1, ...]
<span class="hljs-bullet"> -</span> High values for: "components", "framework", "interactive", "state"
<span class="hljs-bullet"> -</span> Represents: Framework-specific interactivity
</code></pre>
</li>
<li><p><strong>Similarity Calculation:</strong></p>
<ul>
<li><p><strong>Cosine Similarity Calculation:</strong></p>
<p>  $$\text{similarity} = \frac{\vec{A} \cdot \vec{B}}{|\vec{A}| \times |\vec{B}|}$$</p>
</li>
<li><p><strong>Results:</strong></p>
<ul>
<li><p>HTML/CSS KB: <strong>0.87</strong> (highest - captures "basic web interactivity")</p>
</li>
<li><p>React KB: <strong>0.62</strong> (good match for "interactivity" but more advanced)</p>
</li>
<li><p>JavaScript KB: <strong>0.45</strong> (relevant but too programming-focused)</p>
</li>
<li><p>Node.js KB: <strong>0.33</strong> (server-side, not directly interactive)</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<p>        (Dummy values for simulation.)</p>
<ol start="4">
<li><p><strong>Intelligent Decision Making:</strong></p>
<ul>
<li><p><strong>Ranking System:</strong></p>
<ol>
<li><p>Sort by Similarity Scores</p>
</li>
<li><p>Apply Confidence Thresholds (minimum 0.5)</p>
</li>
<li><p>Check Score Gaps (clear winner vs ambiguous)</p>
</li>
<li><p>Validate with Context (beginner vs advanced)</p>
</li>
</ol>
</li>
<li><p><strong>Final Decision:</strong> Route to <strong>HTML/CSS KB</strong> because:</p>
<ul>
<li><p>Highest semantic similarity (0.87)</p>
</li>
<li><p>Matches beginner intent</p>
</li>
<li><p>Covers basic web interactivity concepts</p>
</li>
<li><p>Appropriate starting point for the user's journey</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<p>So, the final decision would be <strong>“<em>For the query ‘Help with web interactivity’, the semantic routing system would route to the HTML/CSS Knowledge Base because it has the highest semantic similarity (0.87) and best matches the beginner-level intent for learning basic web interactivity concepts.</em>“</strong></p>
<h3 id="heading-when-to-semantic-routing">When to Semantic Routing?</h3>
<p>When the query is kind of abstract and needs some validation, then sematic query excels. Like these:</p>
<ul>
<li><p><strong>Natural Language Queries:</strong> “<em>How do I make my website respond to user clicks?</em>“</p>
</li>
<li><p><strong>Ambiguous or Paraphrased Queries:</strong> <em>"Fix broken authentication"</em> vs <em>"Login not working"</em> vs <em>"User verification issues"</em></p>
</li>
<li><p><strong>Cross Domain Queries:</strong> <em>"Best practices for securing user data in web apps"</em></p>
</li>
<li><p><strong>Beginner-Friendly Routing:</strong> <em>"Help me understand how websites work"</em></p>
</li>
<li><p><strong>Intent Heavy Queries:</strong> <em>"I'm struggling with responsive design on mobile."</em></p>
</li>
<li><p><strong>Synonym and variation handling:</strong> <em>"API endpoints"</em> vs <em>"REST services"</em> vs <em>"web services"</em></p>
</li>
</ul>
<h3 id="heading-when-not-to">When not to?</h3>
<p>When the query consists of highly technical or specific technologies, or the main concept is already given in the query, using Semantic Query does not help much.</p>
<h2 id="heading-lets-do-some-code">Let’s do some code:</h2>
<ol>
<li><p><strong>Preprocessing:</strong></p>
<pre><code class="lang-python"> <span class="hljs-comment"># It is part of another function.</span>
 <span class="hljs-comment"># For simplicity, I have only used tokenization</span>
 query_vector = self.model.encode([query.strip()])[<span class="hljs-number">0</span>] <span class="hljs-comment"># Tokenizing</span>

             similarities = {}
             query_norm = np.linalg.norm(query_vector)

             <span class="hljs-keyword">if</span> query_norm == <span class="hljs-number">0</span>:
                 <span class="hljs-keyword">return</span> {
                     <span class="hljs-string">"error"</span>: <span class="hljs-string">"Invalid query vector"</span>,
                     <span class="hljs-string">"routed_to"</span>: <span class="hljs-string">"general"</span>,
                     <span class="hljs-string">"confidence"</span>: <span class="hljs-number">0.0</span>
                 }
</code></pre>
</li>
<li><p><strong>Knowledge Base setup:</strong></p>
<pre><code class="lang-python"> <span class="hljs-comment"># Setting up the knowledge base when initializing the respective class.</span>
 <span class="hljs-comment"># The knowlwdge base should be big files stored in the database.</span>
 <span class="hljs-comment"># For simplicity, I have only used a dictionary.</span>
 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
         self.model = SentenceTransformer(<span class="hljs-string">'all-MiniLM-L6-v2'</span>)
         self.knowledge_bases = {
             <span class="hljs-string">"html_css"</span>: <span class="hljs-string">"HTML CSS styling layout beginner web design interactive elements"</span>,
             <span class="hljs-string">"javascript"</span>: <span class="hljs-string">"JavaScript programming functions DOM events advanced coding"</span>,
             <span class="hljs-string">"react"</span>: <span class="hljs-string">"React components hooks state JSX frontend framework"</span>,
             <span class="hljs-string">"nodejs"</span>: <span class="hljs-string">"Node.js server backend API express database"</span>,
             <span class="hljs-string">"authentication"</span>: <span class="hljs-string">"login security JWT tokens password user auth"</span>
         }

         print(<span class="hljs-string">"Semantic Router Starting"</span>)
         self.kb_vectors = {}
         <span class="hljs-keyword">for</span> name, description <span class="hljs-keyword">in</span> self.knowledge_bases.items():
             vector = self.model.encode([description])[<span class="hljs-number">0</span>]
             self.kb_vectors[name] = vector

         print(<span class="hljs-string">"Semantic Router Ready"</span>)
</code></pre>
</li>
<li><p><strong>Similarity Calculation:</strong></p>
<pre><code class="lang-python"> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">route_query</span>(<span class="hljs-params">self, query</span>):</span>
         <span class="hljs-string">"""Route a query to the best knowledge base"""</span>
         <span class="hljs-comment"># Some error handling</span>
         <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> query <span class="hljs-keyword">or</span> <span class="hljs-keyword">not</span> query.strip():
             <span class="hljs-keyword">return</span> {
                 <span class="hljs-string">"error"</span>: <span class="hljs-string">"Query cannot be empty"</span>,
                 <span class="hljs-string">"routed_to"</span>: <span class="hljs-string">"general"</span>,
                 <span class="hljs-string">"confidence"</span>: <span class="hljs-number">0.0</span>
             }

         <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> self.kb_vectors:
             <span class="hljs-keyword">return</span> {
                 <span class="hljs-string">"error"</span>: <span class="hljs-string">"No knowledge bases available"</span>,
                 <span class="hljs-string">"routed_to"</span>: <span class="hljs-string">"general"</span>,
                 <span class="hljs-string">"confidence"</span>: <span class="hljs-number">0.0</span>
             }

         <span class="hljs-keyword">try</span>:
     <span class="hljs-comment"># This part is from query-preprocessing</span>
     <span class="hljs-comment"># Skip it now</span>
             query_vector = self.model.encode([query.strip()])[<span class="hljs-number">0</span>]

             similarities = {}
             query_norm = np.linalg.norm(query_vector)

             <span class="hljs-keyword">if</span> query_norm == <span class="hljs-number">0</span>:
                 <span class="hljs-keyword">return</span> {
                     <span class="hljs-string">"error"</span>: <span class="hljs-string">"Invalid query vector"</span>,
                     <span class="hljs-string">"routed_to"</span>: <span class="hljs-string">"general"</span>,
                     <span class="hljs-string">"confidence"</span>: <span class="hljs-number">0.0</span>
                 }

     <span class="hljs-comment"># Similarity Search starts:</span>
             <span class="hljs-keyword">for</span> kb_name, kb_vector <span class="hljs-keyword">in</span> self.kb_vectors.items():
                 kb_norm = np.linalg.norm(kb_vector)

                 <span class="hljs-keyword">if</span> kb_norm == <span class="hljs-number">0</span>:
                     similarities[kb_name] = <span class="hljs-number">0.0</span>
     <span class="hljs-comment"># Used Cosine Similarity formula</span>
                 <span class="hljs-keyword">else</span>:
                     similarity = np.dot(query_vector, kb_vector) / (query_norm * kb_norm)
                     similarities[kb_name] = similarity

             <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> similarities:
                 <span class="hljs-keyword">return</span> {
                     <span class="hljs-string">"error"</span>: <span class="hljs-string">"No similarities calculated"</span>,
                     <span class="hljs-string">"routed_to"</span>: <span class="hljs-string">"general"</span>,
                     <span class="hljs-string">"confidence"</span>: <span class="hljs-number">0.0</span>
                 }

             best_kb = max(similarities, key=similarities.get)
             best_score = similarities[best_kb]

             threshold = <span class="hljs-number">0.3</span>  
             <span class="hljs-keyword">if</span> best_score &lt; threshold:
                 best_kb = <span class="hljs-string">"general"</span>
                 confidence = best_score
             <span class="hljs-keyword">else</span>:
                 confidence = best_score

             <span class="hljs-keyword">return</span> {
                 <span class="hljs-string">"query"</span>: query.strip(),
                 <span class="hljs-string">"routed_to"</span>: best_kb,
                 <span class="hljs-string">"confidence"</span>: confidence,
                 <span class="hljs-string">"all_scores"</span>: similarities
             }

         <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
             <span class="hljs-keyword">return</span> {
                 <span class="hljs-string">"error"</span>: <span class="hljs-string">f"Routing failed: <span class="hljs-subst">{str(e)}</span>"</span>,
                 <span class="hljs-string">"routed_to"</span>: <span class="hljs-string">"general"</span>,
                 <span class="hljs-string">"confidence"</span>: <span class="hljs-number">0.0</span>
             }
     <span class="hljs-comment"># Similarity search ends</span>
</code></pre>
</li>
<li><p><strong>Decision Making (Demo Function):</strong></p>
<pre><code class="lang-python"> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">demo</span>(<span class="hljs-params">router, query</span>):</span>
     <span class="hljs-string">"""Test the Semantic Routing"""</span>
     result = router.route_query(query)
     print(result)

     print(<span class="hljs-string">f"\n📰 Query: <span class="hljs-subst">{result[<span class="hljs-string">'query'</span>]}</span>"</span>)
     print(<span class="hljs-string">f"➡️ Routed to: <span class="hljs-subst">{result[<span class="hljs-string">'routed_to'</span>]}</span>"</span>)
     print(<span class="hljs-string">f"🙌 Confidence: <span class="hljs-subst">{result[<span class="hljs-string">'confidence'</span>]}</span>"</span>)
     print(<span class="hljs-string">"\nAll Scores: "</span>)
     <span class="hljs-keyword">for</span> name, score <span class="hljs-keyword">in</span> result[<span class="hljs-string">'all_scores'</span>].items():
         print(<span class="hljs-string">f"<span class="hljs-subst">{name}</span>: score-&gt; <span class="hljs-subst">{score}</span>"</span>)
</code></pre>
</li>
<li><p><strong>Demo Input-Output:</strong></p>
<pre><code class="lang-powershell"> Semantic Router Starting
 Semantic Router Ready
 🧪Initiating Test: 
 &gt; What is js?    
 {<span class="hljs-string">'query'</span>: <span class="hljs-string">'What is js?'</span>, <span class="hljs-string">'routed_to'</span>: <span class="hljs-string">'javascript'</span>, <span class="hljs-string">'confidence'</span>: np.float32(<span class="hljs-number">0.43704033</span>), <span class="hljs-string">'all_scores'</span>: {<span class="hljs-string">'html_css'</span>: np.float32(<span class="hljs-number">0.17255959</span>), <span class="hljs-string">'javascript'</span>: np.float32(<span class="hljs-number">0.43704033</span>), <span class="hljs-string">'react'</span>: np.float32(<span class="hljs-number">0.31851408</span>), <span class="hljs-string">'nodejs'</span>: np.float32(<span class="hljs-number">0.22914742</span>), <span class="hljs-string">'authentication'</span>: np.float32(<span class="hljs-number">0.18344694</span>)}}

 📰 Query: What is js?
 ➡️ Routed to: javascript
 🙌 Confidence: <span class="hljs-number">0.4370403289794922</span>

 All Scores: 
 html_css: score-&gt; <span class="hljs-number">0.17255958914756775</span>
 javascript: score-&gt; <span class="hljs-number">0.4370403289794922</span>
 react: score-&gt; <span class="hljs-number">0.3185140788555145</span>
 nodejs: score-&gt; <span class="hljs-number">0.22914741933345795</span>
 authentication: score-&gt; <span class="hljs-number">0.1834469437599182</span>
 &gt; Javascript interactivity tutorial
 {<span class="hljs-string">'query'</span>: <span class="hljs-string">'Javascript interactivity tutorial'</span>, <span class="hljs-string">'routed_to'</span>: <span class="hljs-string">'javascript'</span>, <span class="hljs-string">'confidence'</span>: np.float32(<span class="hljs-number">0.5200579</span>), <span class="hljs-string">'all_scores'</span>: {<span class="hljs-string">'html_css'</span>: np.float32(<span class="hljs-number">0.40155196</span>), <span class="hljs-string">'javascript'</span>: np.float32(<span class="hljs-number">0.5200579</span>), <span class="hljs-string">'react'</span>: np.float32(<span class="hljs-number">0.1519815</span>), <span class="hljs-string">'nodejs'</span>: np.float32(<span class="hljs-number">0.0759646</span>), <span class="hljs-string">'authentication'</span>: np.float32(<span class="hljs-number">0.040414095</span>)}}

 📰 Query: Javascript interactivity tutorial
 ➡️ Routed to: javascript
 🙌 Confidence: <span class="hljs-number">0.5200579166412354</span>

 All Scores:
 html_css: score-&gt; <span class="hljs-number">0.4015519618988037</span>
 javascript: score-&gt; <span class="hljs-number">0.5200579166412354</span>
 react: score-&gt; <span class="hljs-number">0.15198150277137756</span>
 nodejs: score-&gt; <span class="hljs-number">0.07596459984779358</span>
 authentication: score-&gt; <span class="hljs-number">0.04041409492492676</span>
 &gt; javascript authentication
 {<span class="hljs-string">'query'</span>: <span class="hljs-string">'javascript authentication'</span>, <span class="hljs-string">'routed_to'</span>: <span class="hljs-string">'authentication'</span>, <span class="hljs-string">'confidence'</span>: np.float32(<span class="hljs-number">0.51601857</span>), <span class="hljs-string">'all_scores'</span>: {<span class="hljs-string">'html_css'</span>: np.float32(<span class="hljs-number">0.16553222</span>), <span class="hljs-string">'javascript'</span>: np.float32(<span class="hljs-number">0.4025679</span>), <span class="hljs-string">'react'</span>: np.float32(<span class="hljs-number">0.14441179</span>), <span class="hljs-string">'nodejs'</span>: np.float32(<span class="hljs-number">0.20198642</span>), <span class="hljs-string">'authentication'</span>: np.float32(<span class="hljs-number">0.51601857</span>)}}

 📰 Query: javascript authentication
 ➡️ Routed to: authentication
 🙌 Confidence: <span class="hljs-number">0.5160185694694519</span>

 All Scores:
 html_css: score-&gt; <span class="hljs-number">0.1655322164297104</span>
 javascript: score-&gt; <span class="hljs-number">0.40256789326667786</span>
 react: score-&gt; <span class="hljs-number">0.14441178739070892</span>
 nodejs: score-&gt; <span class="hljs-number">0.2019864171743393</span>
 authentication: score-&gt; <span class="hljs-number">0.5160185694694519</span>
 &gt; What is python?
 {<span class="hljs-string">'query'</span>: <span class="hljs-string">'What is python?'</span>, <span class="hljs-string">'routed_to'</span>: <span class="hljs-string">'general'</span>, <span class="hljs-string">'confidence'</span>: np.float32(<span class="hljs-number">0.07313205</span>), <span class="hljs-string">'all_scores'</span>: {<span class="hljs-string">'html_css'</span>: np.float32(<span class="hljs-number">0.06834164</span>), <span class="hljs-string">'javascript'</span>: np.float32(<span class="hljs-number">0.07313205</span>), <span class="hljs-string">'react'</span>: np.float32(<span class="hljs-number">0.043729357</span>), <span class="hljs-string">'nodejs'</span>: np.float32(<span class="hljs-number">0.04890592</span>), <span class="hljs-string">'authentication'</span>: np.float32(<span class="hljs-literal">-0</span>.<span class="hljs-number">04147682</span>)}} 

 📰 Query: What is python?
 ➡️ Routed to: general
 🙌 Confidence: <span class="hljs-number">0.07313205301761627</span>

 All Scores:
 html_css: score-&gt; <span class="hljs-number">0.06834164261817932</span>
 javascript: score-&gt; <span class="hljs-number">0.07313205301761627</span>
 react: score-&gt; <span class="hljs-number">0.04372935742139816</span>
 nodejs: score-&gt; <span class="hljs-number">0.04890592023730278</span>
 authentication: score-&gt; <span class="hljs-literal">-0</span>.<span class="hljs-number">04147681966423988</span>
</code></pre>
</li>
</ol>
<p>This is it. The “Semantic Routing”/</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">I have left a lot of complicated things in here. You need to implement those.</div>
</div>

<h3 id="heading-full-code">Full Code:</h3>
<p><a target="_blank" href="https://gist.github.com/Pritom2357/c5b20849c57c0813cfeeecf15d2e19a0">See the full code here.</a></p>
<h2 id="heading-conclusion">Conclusion:</h2>
<p>Semantic Routing and Logical Routing both have their use cases. It greatly depends on the context.</p>
]]></content:encoded></item><item><title><![CDATA[Logical Routing]]></title><description><![CDATA[What is Logical Routing?
Previously, we saw what “Routing” in RAG means. So, we’re going to start from the definition in here.
Definition:
Logical routing is a query handling approach that uses explicit rules, patterns, and conditional logic to deter...]]></description><link>https://blogging.pritombiswas.com/logical-routing</link><guid isPermaLink="true">https://blogging.pritombiswas.com/logical-routing</guid><category><![CDATA[ChaiCode]]></category><category><![CDATA[ChaiCohort]]></category><category><![CDATA[genai]]></category><category><![CDATA[GenAI Cohort]]></category><category><![CDATA[2Articles1Week]]></category><dc:creator><![CDATA[Pritom Biswas]]></dc:creator><pubDate>Thu, 12 Jun 2025 19:48:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1749833267491/fba5a321-2532-4e14-b9c7-b2c3c036cf78.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-what-is-logical-routing">What is Logical Routing?</h2>
<p>Previously, we saw what “<a target="_blank" href="https://blogging.pritombiswas.com/advanced-rag-routing#heading-introduction">Routing</a>” in RAG means. So, we’re going to start from the definition in here.</p>
<h3 id="heading-definition">Definition:</h3>
<p><strong>Logical routing</strong> is a query handling approach that uses explicit rules, patterns, and conditional logic to determine where to direct incoming queries in an information system. Unlike semantic approaches that analyze meaning, logical routing relies on <strong><em>precise, predefined</em></strong> criteria to make routing decisions.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Simply speaking, “Logical Routing” guides queries based on <strong>specific rules</strong> that are <strong>set in advance</strong>.</div>
</div>

<h3 id="heading-core-parts">Core Parts:</h3>
<p>Logical routing mainly consists of these 4 parts:</p>
<ol>
<li><p><strong>Rule Engine:</strong></p>
<p> The central component that evaluates conditions and executes routing decisions based on <strong>predefined</strong> logic.</p>
</li>
<li><p><strong>Pattern Matchers:</strong></p>
<p> Tools that identify specific patterns in queries (keywords, phrases, question types).</p>
</li>
<li><p><strong>Decision Trees:</strong></p>
<p> Structured flow-charts that guide the routing process through a series of yes/no questions</p>
</li>
<li><p><strong>Routing Destinations:</strong></p>
<p> The various endpoints where queries can be directed (knowledge bases, specialized handlers, etc.)</p>
</li>
</ol>
<h3 id="heading-how-does-it-work">How does it work?</h3>
<p>Look at the following diagram:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749740572709/d03e361d-7b50-40a8-bb5f-2188e33895e9.png" alt class="image--center mx-auto" /></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Previously, we learnt that “Query Analysis &amp; Clarification”, “Route Decision”, “Execution”, and “Response Generation” are the fundamental workflows of Routing.</div>
</div>

<p>Let’s take an example and see what exactly happens here:</p>
<p><strong><em>Query: “How do I implement authentication in a React app with Node.js backend?</em></strong>”</p>
<ol>
<li><p><strong>Pre-processing:</strong></p>
<ul>
<li><p>Generally, the query is converted to lowercase.</p>
</li>
<li><p>Then the query is tokenized: ["how", "do", "i", "implement", "authentication", "in", "a", "react", "app", "with", "node.js", "backend"]</p>
</li>
</ul>
</li>
<li><p><strong>Logical Routing:</strong></p>
<ol>
<li><p><strong>Rule Matching:</strong></p>
<ul>
<li><p>System checks <strong><em>predefined</em></strong> rules in order of specificity</p>
</li>
<li><p>Matches rule: "IF query contains React AND Node.js AND authentication → route to full-stack authentication guides"</p>
</li>
</ul>
</li>
<li><p><strong>Pattern Matching:</strong></p>
<ul>
<li><p><strong>Languages/Frameworks identified</strong>: "React", "Node.js"</p>
</li>
<li><p><strong>Concepts identified</strong>: "authentication"</p>
</li>
<li><p><strong>Operation type</strong>: "implementation" (how-to)</p>
</li>
</ul>
</li>
<li><p>Decision Trees:</p>
<p> Look at this:</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749742052561/af411dad-3842-4bd6-921b-28931ad7037e.png" alt class="image--center mx-auto" /></p>
</li>
<li><p><strong>Route Decisions:</strong></p>
<ul>
<li>Based on the rule match, the system selects "javascript_fullstack_auth" as the routing destination.</li>
</ul>
</li>
</ol>
</li>
<li><p><strong>Response Phase:</strong></p>
<p> The query is directed specifically to the "JavaScript Full-Stack Authentication Documentation" section, which contains:</p>
<ul>
<li><p>React frontend authentication patterns</p>
</li>
<li><p>Node.js backend authentication implementations</p>
</li>
<li><p>JWT/session management guides</p>
</li>
<li><p>Security best practices</p>
</li>
</ul>
</li>
</ol>
<p>    The system retrieves information specifically from this targeted section rather than searching the entire database, resulting in:</p>
<ul>
<li><p>More relevant results</p>
</li>
<li><p>Faster response times</p>
</li>
<li><p>Content specifically about implementing authentication in React+Node.js applications.</p>
</li>
</ul>
<div data-node-type="callout">
<div data-node-type="callout-emoji">❓</div>
<div data-node-type="callout-text">Quick question: Is it necessary to use “Rule Matching”, “Pattern Matching”, and “Decision Trees” together in all the queries? If yes, why? If not, why?</div>
</div>

<h3 id="heading-is-it-always-good">Is it always good?</h3>
<p>Will answer it in the next article. Now, let’s do some code🤓</p>
<h2 id="heading-lets-code">Let’s Code:</h2>
<p>In this part, we will code Logical Routing. We will use a dummy Qdrant DB and simulate what happens when we use this kind of “Routing”</p>
<ol>
<li><p><strong>Feed the context (Give Knowledge):</strong></p>
<pre><code class="lang-python"> <span class="hljs-comment"># These should be valid files in your system</span>
 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_init_knowledge_base</span>(<span class="hljs-params">self</span>):</span>
         <span class="hljs-string">"""Initialize the knowlegde base"""</span>
         <span class="hljs-comment"># Let's work with a dummy knowledge base. In real case, there might be some website data, database data and so on</span>
         self.kb_files = {
             <span class="hljs-string">"javascript"</span>: <span class="hljs-string">"javascript_docs.pdf"</span>,
             <span class="hljs-string">"python"</span>: <span class="hljs-string">"python_docs.pdf"</span>,
             <span class="hljs-string">"ruby"</span>: <span class="hljs-string">"ruby_docs.pdf"</span>,
             <span class="hljs-string">"react"</span>: <span class="hljs-string">"react_docs.pdf"</span>,
             <span class="hljs-string">"nodejs"</span>: <span class="hljs-string">"nodejs_docs.pdf"</span>, 
             <span class="hljs-string">"django"</span>: <span class="hljs-string">"django_docs.pdf"</span>,
             <span class="hljs-string">"api"</span>: <span class="hljs-string">"api_docs.pdf"</span>,
             <span class="hljs-string">"database"</span>: <span class="hljs-string">"database_docs.pdf"</span>,
             <span class="hljs-string">"authentication"</span>: <span class="hljs-string">"auth_docs.pdf"</span>,
             <span class="hljs-string">"general"</span>: <span class="hljs-string">"general_web_docs.pdf"</span>
         }

         <span class="hljs-comment"># This should be a proper knowledge base when implemented with real information</span>
         self.knowledge_bases = {}
</code></pre>
</li>
<li><p><strong>Define the patterns (Define Rules):</strong></p>
<pre><code class="lang-python"> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_init_patterns</span>(<span class="hljs-params">self</span>):</span>
         <span class="hljs-comment"># In ideal cases, these patters should de generated using LLMs in massive amount and</span>
         <span class="hljs-comment"># stored in the vector store beforehand.</span>
         <span class="hljs-string">"""Initialize pattern matching rules"""</span>
         <span class="hljs-comment"># Language patterns</span>
         self.language_patterns = {
             <span class="hljs-string">"javascript"</span>: [<span class="hljs-string">"javascript"</span>, <span class="hljs-string">"js"</span>, <span class="hljs-string">"ecmascript"</span>, <span class="hljs-string">".js"</span>],
             <span class="hljs-string">"python"</span>: [<span class="hljs-string">"python"</span>, <span class="hljs-string">"py"</span>, <span class="hljs-string">".py"</span>, <span class="hljs-string">"pip"</span>],
             <span class="hljs-string">"ruby"</span>: [<span class="hljs-string">"ruby"</span>, <span class="hljs-string">"rails"</span>, <span class="hljs-string">"erb"</span>, <span class="hljs-string">"gem"</span>, <span class="hljs-string">".rb"</span>]
         }

         <span class="hljs-comment"># Framework patterns</span>
         self.framework_patterns = {
             <span class="hljs-string">"react"</span>: [<span class="hljs-string">"react"</span>, <span class="hljs-string">"jsx"</span>, <span class="hljs-string">"component"</span>, <span class="hljs-string">"hook"</span>, <span class="hljs-string">"props"</span>, <span class="hljs-string">"state"</span>],
             <span class="hljs-string">"nodejs"</span>: [<span class="hljs-string">"node"</span>, <span class="hljs-string">"nodejs"</span>, <span class="hljs-string">"npm"</span>, <span class="hljs-string">"express"</span>, <span class="hljs-string">"package.json"</span>],
             <span class="hljs-string">"django"</span>: [<span class="hljs-string">"django"</span>, <span class="hljs-string">"drf"</span>, <span class="hljs-string">"django-rest-framework"</span>]
         }

         <span class="hljs-comment"># Concept patterns</span>
         self.concept_patterns = {
             <span class="hljs-string">"api"</span>: [<span class="hljs-string">"api"</span>, <span class="hljs-string">"rest"</span>, <span class="hljs-string">"endpoint"</span>, <span class="hljs-string">"http"</span>, <span class="hljs-string">"request"</span>, <span class="hljs-string">"response"</span>],
             <span class="hljs-string">"database"</span>: [<span class="hljs-string">"database"</span>, <span class="hljs-string">"db"</span>, <span class="hljs-string">"sql"</span>, <span class="hljs-string">"query"</span>, <span class="hljs-string">"mongodb"</span>, <span class="hljs-string">"schema"</span>, <span class="hljs-string">"model"</span>],
             <span class="hljs-string">"authentication"</span>: [<span class="hljs-string">"auth"</span>, <span class="hljs-string">"login"</span>, <span class="hljs-string">"jwt"</span>, <span class="hljs-string">"token"</span>, <span class="hljs-string">"session"</span>, <span class="hljs-string">"password"</span>]
         }

         <span class="hljs-comment"># Operation patterns</span>
         self.operation_patterns = {
             <span class="hljs-string">"how_to"</span>: [<span class="hljs-string">"how to"</span>, <span class="hljs-string">"how do i"</span>, <span class="hljs-string">"steps to"</span>, <span class="hljs-string">"guide for"</span>, <span class="hljs-string">"tutorial"</span>, <span class="hljs-string">"implement"</span>],
             <span class="hljs-string">"definition"</span>: [<span class="hljs-string">"what is"</span>, <span class="hljs-string">"define"</span>, <span class="hljs-string">"explain"</span>, <span class="hljs-string">"meaning of"</span>, <span class="hljs-string">"understand"</span>, <span class="hljs-string">"concept of"</span>],
             <span class="hljs-string">"comparison"</span>: [<span class="hljs-string">"vs"</span>, <span class="hljs-string">"versus"</span>, <span class="hljs-string">"compare"</span>, <span class="hljs-string">"difference between"</span>, <span class="hljs-string">"better than"</span>, <span class="hljs-string">"pros and cons"</span>],
             <span class="hljs-string">"troubleshooting"</span>: [<span class="hljs-string">"fix"</span>, <span class="hljs-string">"error"</span>, <span class="hljs-string">"bug"</span>, <span class="hljs-string">"issue"</span>, <span class="hljs-string">"problem"</span>, <span class="hljs-string">"not working"</span>, <span class="hljs-string">"debug"</span>]
         }
</code></pre>
</li>
<li><p><strong>Match the Patterns (Pattern Matching):</strong></p>
<pre><code class="lang-python"> <span class="hljs-comment"># Match the patterns with pre-defined rules.</span>
 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_match_patterns</span>(<span class="hljs-params">self, query, pattern_dict</span>):</span>
         <span class="hljs-string">"""Match query against a pattern dictionary and return the matched results"""</span>
         query_lower = query.lower()
         matches = []
         matched_patterns = {}

         <span class="hljs-keyword">for</span> category, patterns <span class="hljs-keyword">in</span> pattern_dict.items():
             <span class="hljs-keyword">for</span> pattern <span class="hljs-keyword">in</span> patterns:
                 <span class="hljs-keyword">if</span> pattern <span class="hljs-keyword">in</span> query_lower:
                     <span class="hljs-keyword">if</span> category <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> matches:
                         matches.append(category)
                         matched_patterns[category] = [pattern]
                     <span class="hljs-keyword">else</span>:
                         matched_patterns[category].append(pattern)

         <span class="hljs-keyword">return</span> matches, matched_patterns
</code></pre>
</li>
<li><p><strong>Analyze Query (between rules matching and routing):</strong></p>
<p> ```python
 def analyze_query(self, query):
         """Analyze the query and get the results"""</p>
<p>         languages, lang_patterns = self._match_patterns(query, self.language_patterns)
         frameworks, framework_patterns = self._match_patterns(query, self.framework_patterns)
         concepts, concept_patterns = self._match_patterns(query, self.concept_patterns)
         operations, op_patterns = self._match_patterns(query, self.operation_patterns)</p>
</li>
</ol>
<p>            analysis = {
                "languages": languages,
                "frameworks": frameworks,
                "concepts": concepts,
                "operations": operations,
                "matched_patterns": {
                    "languages": lang_patterns,
                    "frameworks": framework_patterns,
                    "concepts": concept_patterns,
                    "operations": op_patterns
                },
                "original_query": query
            }</p>
<p>            return analysis</p>
<pre><code>
<span class="hljs-number">5.</span> **Route the query:**

    <span class="hljs-string">``</span><span class="hljs-string">`python
    # Routing started
    def route_query(self, query):

            """Apply logical routing rules to determine the best knowledge base available"""
            analysis = self.analyze_query(query=query)
            route_info = {
                "query": query,
                "analysis": analysis,
                "route_decision": None,
                "decision_path": [],
                "knowledge_base": None
            }

            if analysis["languages"]:
                primary_language = analysis["languages"][0]
                route_info["decision_path"].append(f"Language detected: {primary_language}")
                if analysis["frameworks"]:
                    framework = analysis["frameworks"][0]

                    compatible = False
                    if primary_language == "javascript" and framework in ["react", "nodejs"]:
                        compatible = True
                    elif primary_language == "python" and framework == "django":
                        compatible = True

                    if compatible:
                        route_info["decision_path"].append(f"Compatible framework found: {framework}")
                        route_info["route_decision"] = framework
                        route_info["knowledge_base"] = self._get_kb(framework)
                        return route_info

                route_info["route_decision"] = primary_language
                route_info["knowledge_base"] = self._get_kb(primary_language)
                return route_info

            elif analysis["frameworks"]:
                framework = analysis["frameworks"][0]
                route_info["decision_path"].append(f"Framework detected: {framework}")
                route_info["route_decision"] = framework
                route_info["knowledge_base"] = self._get_kb(framework)
                return route_info

            elif analysis["concepts"]:
                concept = analysis["concepts"][0]
                route_info["decision_path"].append(f"Concept detected: {concept}")
                route_info["route_decision"] = concept
                route_info["knowledge_base"] = self._get_kb(concept)
                return route_info

            route_info["decision_path"].append("No specific domain detected, using general knowledge base")
            route_info["route_decision"] = "general"
            route_info["knowledge_base"] = self._get_kb("general")
            return route_info


    # Routing done. Now Search the final decision on the vector store
        def search(self, query, k=5):
            """Route the query and perform search"""

            route_info = self.route_query(query)
            kb = route_info["knowledge_base"]

            if not kb:
                print("No valid knowledge base found for routing decision")
                return []

            results = kb.search(query, k=k)

            return {
                "routing": route_info,
                "results": results
            }</span>
</code></pre><ol start="6">
<li><p><strong>Dummy Checker Function:</strong></p>
<pre><code class="lang-python"> <span class="hljs-comment"># Dummy Checking Function. In real time, real queries will be sent to the server an the backend </span>
 <span class="hljs-comment"># will validate and route it.</span>
 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">demonstration_logical_routing</span>():</span>
     <span class="hljs-string">"""Show examples of the logical routing system with real queries"""</span>
     router = LogicalRouter()

     example_queries = [
         <span class="hljs-string">"How do I create an array in JavaScript?"</span>,
         <span class="hljs-string">"What's the best way to connect Python to a SQL database?"</span>,
         <span class="hljs-string">"How to fix React component rendering error"</span>,
         <span class="hljs-string">"What is authentication in web applications?"</span>,
         <span class="hljs-string">"Explain the difference between Node.js and Django"</span>,
         <span class="hljs-string">"How to implement REST APIs?"</span>,
         <span class="hljs-string">"What is a closure in JavaScript?"</span>,
         <span class="hljs-string">"Django vs Flask - which should I use?"</span>
     ]

     print(<span class="hljs-string">"\n"</span> + <span class="hljs-string">"="</span>*<span class="hljs-number">80</span>)
     print(<span class="hljs-string">" "</span>*<span class="hljs-number">30</span> + <span class="hljs-string">"LOGICAL ROUTING DEMO"</span>)
     print(<span class="hljs-string">"="</span>*<span class="hljs-number">80</span> + <span class="hljs-string">'\n'</span>)

     <span class="hljs-keyword">for</span> index, query <span class="hljs-keyword">in</span> enumerate(example_queries, <span class="hljs-number">1</span>):
         print(<span class="hljs-string">f"Query no. <span class="hljs-subst">{index}</span>: <span class="hljs-subst">{query}</span>"</span>)
         print(<span class="hljs-string">'\n'</span>)

         route_info = router.route_query(query)
         print(<span class="hljs-string">"\nQUERY ANALYSIS:"</span>)

         <span class="hljs-keyword">for</span> category, items <span class="hljs-keyword">in</span> route_info[<span class="hljs-string">"analysis"</span>].items():
             <span class="hljs-keyword">if</span> category <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> [<span class="hljs-string">"mathched_patters"</span>, <span class="hljs-string">"original_query"</span>]:
                 print(<span class="hljs-string">f"-&gt; <span class="hljs-subst">{category.capitalize()}</span>: <span class="hljs-subst">{<span class="hljs-string">', '</span>.join(items)}</span>"</span>)

         print(<span class="hljs-string">"\n ➡️ Routing Decision Paths: "</span>)
         <span class="hljs-keyword">for</span> step <span class="hljs-keyword">in</span> route_info[<span class="hljs-string">"decision_path"</span>]:
             print(<span class="hljs-string">f" -&gt; <span class="hljs-subst">{step}</span>"</span>)

         print(<span class="hljs-string">f"\n 🤚 Final Decision Here: <span class="hljs-subst">{route_info[<span class="hljs-string">"route_decision"</span>]}</span>"</span>)

         <span class="hljs-keyword">if</span> route_info[<span class="hljs-string">"knowledge_base"</span>]:
             kb_name = route_info[<span class="hljs-string">"route_decision"</span>]
             print(<span class="hljs-string">f" Routed to Knowledge base: <span class="hljs-subst">{kb_name}</span>"</span>)
         <span class="hljs-keyword">else</span>:
             print(<span class="hljs-string">" No valid knowledge base found"</span>)

         print(<span class="hljs-string">"-"</span>*<span class="hljs-number">80</span>)
</code></pre>
</li>
<li><p><strong>Demo Response:</strong></p>
</li>
</ol>
<pre><code class="lang-plaintext">Initializing the logical router with base path
Logical router base path initilalized successfully

================================================================================
                              LOGICAL ROUTING DEMO
================================================================================

Query no. 1: How do I create an array in JavaScript?


Initializing knowldege base: javascript
Creating new collection: 'javascript_docs'
Error creating new collection: {e}

QUERY ANALYSIS:
-&gt; Languages: javascript
-&gt; Frameworks:
-&gt; Concepts:
-&gt; Operations: how_to
-&gt; Matched_patterns: languages, frameworks, concepts, operations

 ➡️ Routing Decision Paths:
 -&gt; Language detected: javascript

 🤚 Final Decision Here: javascript
 Routed to Knowledge base: javascript
--------------------------------------------------------------------------------
Query no. 2: What's the best way to connect Python to a SQL database?


Initializing knowldege base: python
Creating new collection: 'python_docs'
Error creating new collection: {e}

QUERY ANALYSIS:
-&gt; Languages: python
-&gt; Frameworks:
-&gt; Concepts: database
-&gt; Operations:
-&gt; Matched_patterns: languages, frameworks, concepts, operations

 ➡️ Routing Decision Paths:
 -&gt; Language detected: python

 🤚 Final Decision Here: python
 Routed to Knowledge base: python
--------------------------------------------------------------------------------
Query no. 3: How to fix React component rendering error


Initializing knowldege base: react
Creating new collection: 'react_docs'
Error creating new collection: {e}

QUERY ANALYSIS:
-&gt; Languages:
-&gt; Frameworks: react
-&gt; Concepts:
-&gt; Operations: how_to, troubleshooting
-&gt; Matched_patterns: languages, frameworks, concepts, operations

 ➡️ Routing Decision Paths:
 -&gt; Framework detected: react

 🤚 Final Decision Here: react
 Routed to Knowledge base: react
--------------------------------------------------------------------------------
Query no. 4: What is authentication in web applications?


Initializing knowldege base: authentication
Creating new collection: 'auth_docs'
Error creating new collection: {e}

QUERY ANALYSIS:
-&gt; Languages:
-&gt; Frameworks:
-&gt; Concepts: authentication
-&gt; Operations: definition
-&gt; Matched_patterns: languages, frameworks, concepts, operations

 ➡️ Routing Decision Paths:
 -&gt; Concept detected: authentication

 🤚 Final Decision Here: authentication
 Routed to Knowledge base: authentication
--------------------------------------------------------------------------------
Query no. 5: Explain the difference between Node.js and Django


Initializing knowldege base: nodejs
Creating new collection: 'nodejs_docs'
Error creating new collection: {e}

QUERY ANALYSIS:
-&gt; Languages: javascript
-&gt; Frameworks: nodejs, django
-&gt; Concepts:
-&gt; Operations: definition, comparison
-&gt; Matched_patterns: languages, frameworks, concepts, operations

 ➡️ Routing Decision Paths:
 -&gt; Language detected: javascript
 -&gt; Compatible framework found: nodejs

 🤚 Final Decision Here: nodejs
 Routed to Knowledge base: nodejs
--------------------------------------------------------------------------------
Query no. 6: How to implement REST APIs?


Initializing knowldege base: api
Creating new collection: 'api_docs'
Error creating new collection: {e}

QUERY ANALYSIS:
-&gt; Languages:
-&gt; Frameworks:
-&gt; Concepts: api
-&gt; Operations: how_to
-&gt; Matched_patterns: languages, frameworks, concepts, operations

 ➡️ Routing Decision Paths:
 -&gt; Concept detected: api

 🤚 Final Decision Here: api
 Routed to Knowledge base: api
--------------------------------------------------------------------------------
Query no. 7: What is a closure in JavaScript?



QUERY ANALYSIS:
-&gt; Languages: javascript
-&gt; Frameworks:
-&gt; Concepts:
-&gt; Operations: definition
-&gt; Matched_patterns: languages, frameworks, concepts, operations

 ➡️ Routing Decision Paths:
 -&gt; Language detected: javascript

 🤚 Final Decision Here: javascript
 Routed to Knowledge base: javascript
--------------------------------------------------------------------------------
Query no. 8: Django vs Flask - which should I use?


Initializing knowldege base: django
Creating new collection: 'django_docs'
Error creating new collection: {e}

QUERY ANALYSIS:
-&gt; Languages:
-&gt; Frameworks: django
-&gt; Concepts:
-&gt; Operations: comparison
-&gt; Matched_patterns: languages, frameworks, concepts, operations

 ➡️ Routing Decision Paths:
 -&gt; Framework detected: django

 🤚 Final Decision Here: django
 Routed to Knowledge base: django
--------------------------------------------------------------------------------
</code></pre>
<h3 id="heading-full-code">Full Code:</h3>
<p><a target="_blank" href="https://gist.github.com/Pritom2357/dd9cbb2d3c7eec07ab2b40bacf5473c8">Get the full code here (Github Gist)</a></p>
<p>There is an implementation of Vector Store; you can ignore it. I am using Docker for running Qdrant Store locally.</p>
<h3 id="heading-additional-resource">Additional Resource:</h3>
<p><a target="_blank" href="https://github.com/labdmitriy/llm-rag/blob/master/notebooks/rag-from-scratch/10-01-logical-routing.ipynb">llm-rag (github link)</a></p>
<h2 id="heading-conclusion">Conclusion:</h2>
<p>Next, we will see the Semantic Routing and see some comparisons.</p>
]]></content:encoded></item><item><title><![CDATA[Advanced RAG: Routing]]></title><description><![CDATA[Introduction:
Previously, we learned about some “Query Translation” Techniques: “ How to break a query and get the gist of it? “. In this article, we’re gonna work on the data that is stored in our Database.
Problem statement:
Suppose we have a huge ...]]></description><link>https://blogging.pritombiswas.com/advanced-rag-routing</link><guid isPermaLink="true">https://blogging.pritombiswas.com/advanced-rag-routing</guid><category><![CDATA[ChaiCode]]></category><category><![CDATA[ChaiCohort]]></category><category><![CDATA[genai]]></category><category><![CDATA[GenAI Cohort]]></category><dc:creator><![CDATA[Pritom Biswas]]></dc:creator><pubDate>Tue, 10 Jun 2025 17:56:49 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1749573228557/4901adc3-25e1-4e4c-8d74-fa408df6fee2.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction:</h2>
<p>Previously, we learned about some “Query Translation” Techniques: “ How to break a query and get the gist of it? “. In this article, we’re gonna work on the data that is stored in our Database.</p>
<h3 id="heading-problem-statement">Problem statement:</h3>
<p>Suppose we have a huge database where we have information about JavaScript, Node.js, Python, Ruby, Rust, and all the stuff about modern Web Development. Now, play some Q&amp;As.</p>
<ul>
<li><p>When I search for something in there, is it searching the whole Database?</p>
<p>  -Yes.</p>
</li>
<li><p>Does it help if I hugely upgrade the quality of my query and then search?</p>
<p>  -No, actually. I have upgraded the query, but did not make any way to search the Database efficiently.</p>
</li>
</ul>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🤔</div>
<div data-node-type="callout-text">Then what should we do?</div>
</div>

<h3 id="heading-some-approach">Some approach:</h3>
<p>Let’s think of an easy approach. First, we will mark our data chunks according to their domains. Then, when we search the Database, we will specifically search on the targeted domains. In this way we can reduce the cost of operation, right?</p>
<p>This is called <strong><em>“Routing”.</em></strong> Easy, right?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749575194315/c66c23de-c7bf-49cc-a2f4-870626d4acf3.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749576571373/e6642d53-7f5d-4b3d-a505-2ed2efe69cfb.png" alt class="image--center mx-auto" /></p>
<p>This is the basic process of “Routing”.</p>
<h2 id="heading-routing">Routing:</h2>
<h3 id="heading-definition">Definition:</h3>
<p>Routing is a technique where the system <strong>intelligently decides which retrieval strategy, knowledge source, or processing path</strong> to use based on the characteristics of the incoming query.</p>
<p>Think of it like a smart receptionist:</p>
<ul>
<li><p>Medical questions → Route to medical expert</p>
</li>
<li><p>Legal questions → Route to the legal department</p>
</li>
<li><p>Technical questions → Route to the engineering team</p>
</li>
<li><p>Simple questions → Route to the general information desk</p>
</li>
</ul>
<h3 id="heading-how-routing-works">How Routing Works:</h3>
<p>Usually, Routing works in these 4 steps:</p>
<ol>
<li><p><strong>Query Analysis and Classification:</strong></p>
<p> The router examines the incoming query to determine:</p>
<ul>
<li><p><strong>Query Type</strong>: Question, comparison, how-to, definition, etc.</p>
</li>
<li><p><strong>Domain</strong>: Technical, business, medical, legal, etc.</p>
</li>
<li><p><strong>Complexity</strong>: Simple factual vs complex analytical</p>
</li>
<li><p><strong>Intent</strong>: Information seeking, problem-solving, decision-making</p>
</li>
</ul>
</li>
<li><p><strong>Route Decision:</strong></p>
<p> Based on the analysis, the router decides:</p>
<ul>
<li><p><strong>Which retrieval method</strong> to use (vector search, keyword search, hybrid)</p>
</li>
<li><p><strong>Which knowledge source</strong> to query (general docs, technical docs, specific databases)</p>
</li>
<li><p><strong>Which processing strategy</strong> to apply (direct retrieval, decomposition, HyDE, parallel)</p>
</li>
<li><p><strong>Which model/prompt</strong> to use for generation</p>
</li>
</ul>
</li>
<li><p><strong>Execution:</strong></p>
<p> The query is sent down the chosen path with appropriate configurations</p>
</li>
<li><p><strong>Response:</strong></p>
<p> Results are formatted according to the route's specifications.</p>
</li>
</ol>
<h3 id="heading-some-procedural-examples">Some Procedural Examples:</h3>
<p>Let’s simulate this process with a question: “How do I deploy a React app?“</p>
<ol>
<li><p><strong>Query Analysis:</strong></p>
<ul>
<li><p><strong>Type</strong>: How-to/Procedural</p>
</li>
<li><p><strong>Domain</strong>: Web Development</p>
</li>
<li><p><strong>Complexity</strong>: Medium</p>
</li>
<li><p><strong>Intent</strong>: Problem-solving</p>
</li>
</ul>
</li>
<li><p><strong>Routing Decision:</strong></p>
<ul>
<li><p><strong>Knowledge Source</strong>: Development documentation</p>
</li>
<li><p><strong>Method</strong>: Keyword + semantic search</p>
</li>
<li><p><strong>Strategy</strong>: Step-by-step retrieval</p>
</li>
<li><p><strong>Response Format</strong>: Numbered instructions</p>
</li>
</ul>
</li>
<li><p><strong>Execution:</strong></p>
<p> Will go to the chunk where the information about Development is stored. Run some Query Retrieval Techniques. Get the Data.</p>
</li>
<li><p><strong>Response:</strong></p>
<p> Give the extracted Data.</p>
</li>
</ol>
<h3 id="heading-types-of-routing">Types of Routing:</h3>
<p>Two types of Routing are widely followed in the industry:</p>
<ol>
<li><p>Logical Routing</p>
</li>
<li><p>Semantic Routing</p>
</li>
</ol>
<p>We will learn about them in detail in the later articles.</p>
<h2 id="heading-conclusion">Conclusion:</h2>
<p>Routing is essential when we need to handle a large amount of data. But for smaller applications, it only increases complications.</p>
]]></content:encoded></item><item><title><![CDATA[HyDE (Hypothetical Document Embeddings)]]></title><description><![CDATA[Previous Context
We saw how the “Parallel Query Retrieval” and “Query Decomposition” work. Let’s just see a recap:
Parallel Query Retrieval:
We asked this question, “What is fs?” and we got some questions like this from the LLM:

What is fs?

What is...]]></description><link>https://blogging.pritombiswas.com/hyde-hypothetical-document-embeddings</link><guid isPermaLink="true">https://blogging.pritombiswas.com/hyde-hypothetical-document-embeddings</guid><category><![CDATA[2Articles1Week]]></category><category><![CDATA[ChaiCode]]></category><category><![CDATA[ChaiCohort]]></category><category><![CDATA[genai]]></category><category><![CDATA[GenAI Cohort]]></category><dc:creator><![CDATA[Pritom Biswas]]></dc:creator><pubDate>Mon, 09 Jun 2025 19:04:19 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1749452442159/e6ad0cba-de53-4288-8cd4-b5f86dcd122c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-previous-context">Previous Context</h2>
<p>We saw how the “<a target="_blank" href="https://blogging.pritombiswas.com/parallel-query-fan-out-retrieval">Parallel Query Retrieval</a>” and “<a target="_blank" href="https://blogging.pritombiswas.com/query-decomposition">Query Decomposition</a>” work. Let’s just see a recap:</p>
<h3 id="heading-parallel-query-retrieval">Parallel Query Retrieval:</h3>
<p>We asked this question, “What is fs?” and we got some questions like this from the LLM:</p>
<ol>
<li><p>What is fs?</p>
</li>
<li><p>What is the file system?</p>
</li>
<li><p>What is a file in Node.js?</p>
</li>
<li><p>How to create a file in Node.js?</p>
</li>
</ol>
<p>Really a straightforward question and some variants of it, right?</p>
<h3 id="heading-query-decomposition">Query Decomposition:</h3>
<p>We had a complex question: “<strong><em>What are the advantages and disadvantages of React compared to Vue.js for building large-scale applications?</em></strong>“ and our LLM generated these questions:</p>
<ol>
<li><p>Compare React and Vue.js for large-scale projects</p>
</li>
<li><p>What are the pros and cons of using React for building large applications?</p>
</li>
<li><p>Is React or Vue.js better for developing complex web applications?</p>
</li>
<li><p>What are the benefits of using Vue.js over React in large-scale projects?</p>
</li>
</ol>
<p>Complex multi-topic queries made simple.</p>
<p>Now, let’s think about something complex…</p>
<h2 id="heading-a-new-scenario">A new scenario:</h2>
<h3 id="heading-some-experimentation">Some experimentation 🧑‍🔬:</h3>
<p>Suppose we have a big “Academic Research Paper” on “LLM models - Transformers and NLPs“ in our hands, and we want to ask questions about it. Sample question:</p>
<p><strong><em>"How does transformer architecture improve natural language understanding?"</em></strong></p>
<ol>
<li><p><strong>Run Through Parallel Query Retrieval:</strong></p>
<ul>
<li><p>"How do transformers help with NLP?"</p>
</li>
<li><p>"What makes transformer architecture better for language?"</p>
</li>
<li><p>"Why are transformers good for natural language processing?"</p>
</li>
</ul>
</li>
</ol>
<p>    We got these questions. But “Research Papers” do not contain questions. They contain research findings, methodologies, and numericals.</p>
    <div data-node-type="callout">
    <div data-node-type="callout-emoji">❓</div>
    <div data-node-type="callout-text">We can get some response, but will it be “<em>contextual”</em> to us?</div>
    </div>

<ol start="2">
<li><p><strong>Run through Query Decomposition:</strong></p>
<ul>
<li><p>"What is transformer architecture?"</p>
</li>
<li><p>"How does the attention mechanism work?"</p>
</li>
<li><p>"What are transformer benefits for NLP?"</p>
</li>
</ul>
</li>
</ol>
<p>    We got some broken-down or “decomposed” queries. But these are still questions, and we need references so that we can search through the Paper and get our relevant information.</p>
    <div data-node-type="callout">
    <div data-node-type="callout-emoji">❓</div>
    <div data-node-type="callout-text">Again, we will get some response, but will it be that helpful?</div>
    </div>

<ol start="3">
<li><p><strong>Let’s try a new approach:</strong></p>
<p> So, what we will do is we will generate some pseudo answers based on the question, and we will find the reference/keywords from the answers in our vector store. Suppose the LLM generates this response:</p>
<p> “Transformer architecture revolutionizes natural language understanding through self-attention mechanisms that capture long-range dependencies more effectively than recurrent neural networks. The multi-head attention allows the model to focus on different representation subspaces simultaneously, enabling better contextual understanding…“</p>
<p> We have got keywords like:</p>
<ul>
<li><p>"self-attention mechanism"</p>
</li>
<li><p>"long-range dependencies"</p>
</li>
<li><p>"recurrent neural networks"</p>
</li>
<li><p>"subspace" and</p>
</li>
<li><p>"contextual understanding."</p>
<p>  This gives us more relevant data to search through, which increases our chances of finding the information we need, right?</p>
</li>
</ul>
</li>
</ol>
    <div data-node-type="callout">
    <div data-node-type="callout-emoji">💡</div>
    <div data-node-type="callout-text">This example might be quite overwhelming. Think like this: “We are generating a pseudo-answer before searching for our real answer in the document, instead of generating some questions. “</div>
    </div>

<p>    This method is called “ HyDE (Hypothetical Document Embeddings) “.</p>
<h3 id="heading-definition">Definition:</h3>
<p>HyDE stands for <strong>Hypothetical Document Embedding</strong>. Instead of directly searching with the user's question, HyDE generates a <strong>hypothetical answer</strong> first, then uses that generated answer to search for relevant documents</p>
<h3 id="heading-why-hyde">Why HyDE?</h3>
<p>We have:</p>
<ul>
<li><p>Parallel Query Retrieval</p>
</li>
<li><p>Reciprocal Rank Fusion and</p>
</li>
<li><p>Query Decomposition</p>
<p>  - These are powerful techniques. So, why do we need HyDE for?</p>
</li>
</ul>
<p>HyDE adds something <strong>extra powerful</strong>: <strong>contextual richness</strong>.</p>
<p>Why does that matter?</p>
<ul>
<li><p>Many technical documents spread information across sections.</p>
</li>
<li><p>Questions might miss key terms that appear only in answers.</p>
</li>
<li><p>Generating a pseudo-answer lets us <strong>"guess" what a good answer might look like</strong>, then <strong>search backwards</strong> to locate supporting material.</p>
</li>
</ul>
<p>This is the core idea behind <strong>HyDE</strong> — generating hypothetical documents (or answer embeddings) to find real ones.</p>
<p>This is the basic workflow of HyDE:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749469254809/ee8b1404-0b26-49d8-80b6-ed9c4b1bd96b.png" alt class="image--center mx-auto" /></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">❓</div>
<div data-node-type="callout-text">There is a slight drawback in this approach. Can you identify that?</div>
</div>

<h3 id="heading-when-not-to-hyde">When not to HyDE?</h3>
<p>HyDE is only applicable when the answer is scattered throughout the document and no direct instance is given. For simpler queries, using HyDE is quite too much. Each time you generate a pseudo-response, it burns more tokens than generating questions. So, it is only applicable when precision is more important than price.</p>
<p>Plus, there is a drawback to this approach. Remember, when we generated a pseudo-response to the previously asked question on the “Research Paper”? The LLM model that is generating the answer should at least know the context of the question to generate useful keywords, right?</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">😶</div>
<div data-node-type="callout-text">HyDE does not work well with smaller language models. It takes the “<strong><em>most updated and largest</em></strong>” language model to use with.</div>
</div>

<p>Ok, enough with the ideas and intuitions. Let’s discuss the implementation:</p>
<h2 id="heading-lets-code">Let’s code 🥰:</h2>
<h3 id="heading-sequence">Sequence:</h3>
<ol>
<li><p>Nice system prompt</p>
</li>
<li><p>Input requests, get a “Paragraph”</p>
</li>
<li><p>Decompose that for more clarification</p>
</li>
<li><p>Generate some “Parallel Query” for context-matching</p>
</li>
<li><p>Vector Search and get your answer.</p>
</li>
</ol>
<p>Here, 1 and 2 are the compulsory states, and 4 to 6 are just optimizations so that we do not miss any context.</p>
<h3 id="heading-code">Code:</h3>
<p>1 and 2:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">HyDE</span>(<span class="hljs-params">self, query</span>):</span>
        print(<span class="hljs-string">"HyDE Running 🧑‍🔬"</span>)
        <span class="hljs-keyword">try</span>:
            system_prompt = <span class="hljs-string">f"""
               Generate a comprehensive, expert-level answer to this query as if you're writing documentation or academic content.

                Query: "<span class="hljs-subst">{query}</span>"

                REQUIREMENTS:
                1. Write in professional, authoritative tone (like a domain expert)
                2. Generate exactly one well-structured paragraph (4-6 sentences)
                3. Include technical terminology and key concepts relevant to the field
                4. Cover the main topic plus 2-3 closely related subtopics
                5. Use declarative statements, not questions
                6. Write as if explaining to a knowledgeable audience

                RETURN FORMAT:
                {{
                    "original": "<span class="hljs-subst">{query}</span>",
                    "generated": "your expert paragraph here"
                }}

                Return ONLY valid JSON, no additional text.
            """</span>

            response = self.model.generate_content(system_prompt)

            <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> response:
                print(<span class="hljs-string">"No response was generated. "</span>)
                <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

            filtered_response = filter_response(response)

            <span class="hljs-keyword">try</span>:
                parsed_response = json.loads(filtered_response)
                <span class="hljs-keyword">return</span> parsed_response
            <span class="hljs-keyword">except</span> json.JSONDecodeError <span class="hljs-keyword">as</span> e:
                print(<span class="hljs-string">f"JSON parsing error: <span class="hljs-subst">{e}</span>"</span>)
                <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            print(<span class="hljs-string">"Failed to run HyDE: {e}"</span>)
            <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>
</code></pre>
<p>4 to 6.</p>
<p>See the full code.</p>
<h3 id="heading-full-code">Full Code:</h3>
<p><a target="_blank" href="https://gist.github.com/Pritom2357/7504d4fab1a7977558dcdb4e644e53ba">See the full code. (Mainly see the main function for clarification</a>.)</p>
<h3 id="heading-input-and-output-testing">Input and Output Testing:</h3>
<ol>
<li><p>Input:</p>
<pre><code class="lang-plaintext"> How does transformer architecture improve natural language understanding?
</code></pre>
</li>
<li><p>Output:</p>
<pre><code class="lang-plaintext"> HyDE Running 🧑‍🔬
 Transformer architecture significantly enhances natural language understanding (NLU) by leveraging0
 self-attention mechanisms. Unlike recurrent neural networks (RNNs), transformers process all input 
 tokens simultaneously, enabling them to capture long-range dependencies and contextual information. 
 This allows for better representation of semantic relationships between words and phrases, resulting 
 in improved performance on tasks like machine translation, text summarization, and question answering. 
 Furthermore, transformers' parallel processing capabilities facilitate efficient training and inference, 
 making them a highly effective architecture for NLU tasks.

 Decomposing Query 🧠
 1: Transformers enhance natural language understanding using self-attention mechanisms.
 2: Transformers process all input tokens simultaneously, unlike recurrent neural networks.
 3: Transformers can capture long-range dependencies and contextual information.
 4: Transformers improve performance on tasks like machine translation, text summarization, and question answering.
 5: Transformers enable better representation of semantic relationships between words and phrases.
 6: Transformers facilitate efficient training and inference due to parallel processing capabilities.
 7: Transformers are a highly effective architecture for natural language understanding tasks.
 8: Transformers are an alternative to recurrent neural networks for natural language processing.
 9: Transformer architecture has advantages over recurrent neural networks for certain tasks.
 10: The use of self-attention in transformers is crucial for their effectiveness.

 # There are also Parallel Query Generation is running after this. Did not show that.
</code></pre>
 <div data-node-type="callout">
 <div data-node-type="callout-emoji">❓</div>
 <div data-node-type="callout-text">When you will follow the full code, you might not see Decomposed queries as single lines but as questions. Can you generate single lines like this?</div>
 </div>


</li>
</ol>
<h2 id="heading-conclusion">Conclusion:</h2>
<p>We have seen a bunch of Query Translation methods till now - Parallel Query (Fan Out) Retrieval, Reciprocal Rank Fusion, Query Decomposition, HyDE (Hypothetical Document Embeddings). All of these are used for structuring the response generated by LLM models. We are not doing anything fancy till now, just increasing our chances of finding the required data in our document so that we can do “further work” on them.</p>
<p>When we did not have these LLMs, we had to do this similarity search manually, using codes, conditionals. But these LLMs made this part quite easy. Though coding logics are still the most wanted things in engineering. We just need to center the response obtained by the LLMs, and one of the problems is solved.</p>
<p>In the later parts, we will see how we can optimize the whole process of automating a system more intensely.</p>
]]></content:encoded></item><item><title><![CDATA[Query Decomposition]]></title><description><![CDATA[Previous context:
What we saw earlier
Remember, we did implement “Parallel Query Retrieval” and “Reciprocal Rank Fusion (RRF)”. There, we asked a question, “What is fs?” and the LLM generated some similar questions.
Now, in this article, let us ask t...]]></description><link>https://blogging.pritombiswas.com/query-decomposition</link><guid isPermaLink="true">https://blogging.pritombiswas.com/query-decomposition</guid><category><![CDATA[ChaiCode]]></category><category><![CDATA[ChaiCohort]]></category><category><![CDATA[genai]]></category><category><![CDATA[GenAI Cohort]]></category><dc:creator><![CDATA[Pritom Biswas]]></dc:creator><pubDate>Sun, 08 Jun 2025 18:30:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1749376610549/16c86d7a-6c7f-4220-8376-45a70e76bb96.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-previous-context">Previous context:</h2>
<h3 id="heading-what-we-saw-earlier">What we saw earlier</h3>
<p>Remember, we did implement “<a target="_blank" href="https://blogging.pritombiswas.com/parallel-query-fan-out-retrieval">Parallel Query Retrieval</a>” and “<a target="_blank" href="https://blogging.pritombiswas.com/reciprocate-rank-fusion-rrf">Reciprocal Rank Fusion (RRF)</a>”. There, we asked a question, “What is fs?” and the LLM generated some similar questions.</p>
<p>Now, in this article, let us ask this question: “ <strong><em>What is React?</em></strong> “ and run this into the Parallel Query Retrieval system, what similar questions will we get?</p>
<ol>
<li><p>What is React.js?</p>
</li>
<li><p>What is the React framework?</p>
</li>
<li><p>React JavaScript library explained</p>
</li>
<li><p>Introduction to React</p>
</li>
</ol>
<p>Like these, right?</p>
<h3 id="heading-lets-test-something">Let’s test something:</h3>
<p>Nice, now let us test with another question: “ <strong><em>What are the advantages and disadvantages of React compared to Vue.js for building large-scale applications?</em></strong> “ I got these similar queries from my system:</p>
<ol>
<li><p>Compare React and Vue.js for large-scale projects</p>
</li>
<li><p>What are the pros and cons of using React for building large applications?</p>
</li>
<li><p>Is React or Vue.js better for developing complex web applications?</p>
</li>
<li><p>What are the benefits of using Vue.js over React in large-scale projects?</p>
</li>
</ol>
<p>Look closely, all the queries include React.js and Vue.js together; they never separate them so that the LLM can retrieve individual knowledge about them. This is ok if my supplied document in the RAG has them together and directly answers the question. But,</p>
<p>"<strong><em>What if the supplied document does not directly answer the question and has both React.js and Vue.js in separate places? ”</em></strong></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🛑</div>
<div data-node-type="callout-text">Yeah, new problem marked. How would you solve this?</div>
</div>

<h3 id="heading-interesting-right">Interesting, right?</h3>
<p>So, how can we get rid of this thing? Simple, divide the complex query into a simpler form. In a word, “DECOMPOSE THEM” !!!</p>
<h2 id="heading-what-is-query-decomposition">What is Query Decomposition?</h2>
<h3 id="heading-lets-try-something">Let’s try something:</h3>
<p>So, our previous “complex” query was this: “ <strong><em>What are the advantages and disadvantages of React compared to Vue.js for building large-scale applications?</em></strong> “ Can we break this into these queries?</p>
<ol>
<li><p>What are React advantages for large applications?</p>
</li>
<li><p>What are React disadvantages for large applications?</p>
</li>
<li><p>What are Vue.js advantages for large applications?</p>
</li>
<li><p>What are Vue.js disadvantages for large applications?</p>
</li>
</ol>
<p>Now, we have got React.js and Vue.js differently. So, even if they are in different places in the given context, we can apply vector search on them and get results. This is like breaking a query into a less abstract one.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">❓</div>
<div data-node-type="callout-text">Can you optimize this more? I mean, what if the given context does not have the exact query words in it?</div>
</div>

<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Of course, just generate some parallel queries of the decomposed queries !!!</div>
</div>

<p>The main workflow kind of looks like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749378463668/b5f65097-e918-4257-ac42-5a1a5e9b3af0.png" alt class="image--center mx-auto" /></p>
<p>Yeah, this is “Query Decomposition” !!!</p>
<h3 id="heading-definition">Definition:</h3>
<p>Query Decomposition is a technique where you break down a complex, multi-faceted user question into smaller, more focused sub-questions before performing retrieval in a RAG system.</p>
<h3 id="heading-some-diagrams">Some diagrams:</h3>
<p>Look at the following diagram for better understanding.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749379539459/83c2c93b-00fb-4e36-ab7a-4358479c99b7.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749379983981/98c6a793-7f66-421e-9b0c-dc5cbc927b45.png" alt class="image--center mx-auto" /></p>
<p>These are the intuitions and main mechanisms of “Query Decomposition”.</p>
<h3 id="heading-why-query-decomposition">Why Query Decomposition:</h3>
<p>These fields mainly force a query to decompose:</p>
<ol>
<li><p><strong>Vector Search Limitations:</strong></p>
<p> When multiple distinct concepts are asked in a single query, vector embeddings struggle to search through the document and establish connections.</p>
</li>
<li><p><strong>Improved Retrieval Coverage:</strong></p>
<p> A single query might miss some concepts, whereas generating fragmented queries retrieves more subject-specific data, which finally generates more accurate results.</p>
</li>
<li><p><strong>Reduced Semantic Confusion:</strong></p>
<p> Complex queries sometimes include multiple concepts that might be semantically close but differ in their original concepts. Decomposing queries reduces this confusion and generates unambiguous, clear solutions.</p>
</li>
<li><p><strong>Better Document Relevance:</strong></p>
<p> Dividing the query into smaller sub-queries helps retrieve data from subject-specific fields, which keeps relevance with the context/document.</p>
</li>
</ol>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Isn’t this like asking different specialist in their specialized fields to answer different questions?</div>
</div>

<h2 id="heading-lets-code">Let’s code:</h2>
<h3 id="heading-flow">Flow:</h3>
<ol>
<li><p>Give a nice system prompt.</p>
</li>
<li><p>Take the user query</p>
</li>
<li><p>Generate Decomposed Query</p>
</li>
<li><p>Run Parallel Query Retrieval on them</p>
</li>
<li><p>Search on the Vector Store using the queries and Retrieve the response/docs</p>
</li>
<li><p>Use the Original Query and the Retrieved context to get the results</p>
</li>
</ol>
<h3 id="heading-code">Code:</h3>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🔴</div>
<div data-node-type="callout-text">I am using a Docker container for the vector store, so you also need to implement that.</div>
</div>

<p>1 to 3. <strong>Decompose Query Function</strong></p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">decomposeQuery</span>(<span class="hljs-params">self, query, number_of_queries = <span class="hljs-number">3</span></span>):</span>
        print(<span class="hljs-string">"Decomposing Query 🧠"</span>)
        <span class="hljs-keyword">try</span>:
              <span class="hljs-comment"># Nice prompt</span>
            system_prompt = <span class="hljs-string">f"""
                You are a helpful AI assistant who decomposes the given complex <span class="hljs-subst">{query}</span> into simpler queries using its keywords at the given number = <span class="hljs-subst">{number_of_queries}</span>.

                METHOD:
                1. Firstly, analyze the complex query and extract its keywords and split the distinct keywords.
                2. Secondly, make new queries using the keywords. ALWAYS remember to keep one distinct topic in a single query.
                3. Thirdly, Then return the queries in the given format.
                4. ALWAYS remember that each should only take one line.
                5. TRY to make as straight-forward as possible. Each query should consist the gist of the original query.

                EXAMPLE:
                "original": "What are the advantages and disadvantages of React compared to Vue.js for building large-scale applications?"
                "generated":
                    1. What are React advantages for large applications?
                    2. What are React disadvantages for large applications?
                    3. What are Vue.js advantages for large applications?
                    4. What are Vue.js disadvantages for large applications?

                RETURN FORMAT
                You only need to return the queries in this json format:
                {{
                    "original": "<span class="hljs-subst">{query}</span>",
                    "generated": [
                        "generated_1",
                        "generated_2",
                        "generated_3"
                    ]
                }}
                ONLY return in the given format.
            """</span>

            response = self.model.generate_content(system_prompt)

            <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> response <span class="hljs-keyword">or</span> <span class="hljs-keyword">not</span> response.text:
                print(<span class="hljs-string">"No response from model"</span>)
                <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

            filtered_response = filter_response(response)

            <span class="hljs-keyword">try</span>:
                parsed_response = json.loads(filtered_response)
                <span class="hljs-keyword">return</span> parsed_response
            <span class="hljs-keyword">except</span> json.JSONDecodeError <span class="hljs-keyword">as</span> e:
                print(<span class="hljs-string">f"JSON parsing error: <span class="hljs-subst">{e}</span>"</span>)
                <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            print(<span class="hljs-string">f"Query Decomposition failed: <span class="hljs-subst">{e}</span>"</span>)
            <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>
</code></pre>
<ol start="4">
<li><p><strong>Run a Parallel Query on the decomposed queries:</strong></p>
<pre><code class="lang-python"> <span class="hljs-comment"># Calling this multiple times to get the paralled queries:</span>
 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">hybridQuery</span>(<span class="hljs-params">self, query, number_of_queries</span>):</span>
         print(<span class="hljs-string">"Hybrid Query Initiating 🤓"</span>)
         <span class="hljs-keyword">try</span>:
             response = self.generateParallelQuery(query=query, number_of_queries=number_of_queries)

             <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> response:
                 print(<span class="hljs-string">"Hybridization failed"</span>)
                 <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

             <span class="hljs-keyword">return</span> response
         <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
             print(<span class="hljs-string">"Hybrid Query Generation Failed"</span>)
             <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

 <span class="hljs-comment"># Parallel Query Generation Function:</span>
 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generateParallelQuery</span>(<span class="hljs-params">self, query, number_of_queries = <span class="hljs-number">3</span></span>):</span>
         print(<span class="hljs-string">"Generating Parallel Query 🤔"</span>)
         <span class="hljs-keyword">try</span>:
             system_prompt = <span class="hljs-string">f"""
                 You are a helpful AI assistant who generates <span class="hljs-subst">{number_of_queries}</span> queries with similar topics of the given query=<span class="hljs-subst">{query}</span>.

                 METHOD:
                 1. You get a query, analyze it and find the keywords in that.
                 2. You generate similar words based on the keywords. Extract the keywords from the whole <span class="hljs-subst">{query}</span> and then decide what to make.
                 3. You make similar query like <span class="hljs-subst">{query}</span> using the newly generated keywords
                 4. The generated queries will not exceed one line.
                 5. Keep them as straigt-forward as possible

                 EXAMPLE:
                 original: "What is fs in Node.js?"
                 generated:
                     1. "What is file system?"
                     2. "What are files in Node.js?"
                     3. "How to make files in Node.js?"

                 RETURN FORMAT
                 You only need to return the queries in this json format:
                 {{
                     "original": "<span class="hljs-subst">{query}</span>",
                     "generated": [
                         "generated_1",
                         "generated_2",
                         "generated_3"
                     ]
                 }}

                 Return ONLY valid JSON, no additional text.
             """</span>

             response = self.model.generate_content(
                 system_prompt
             )

             <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> response <span class="hljs-keyword">or</span> <span class="hljs-keyword">not</span> response.text:
                 print(<span class="hljs-string">"No response from model"</span>)
                 <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

             filtered_response = filter_response(response)

             <span class="hljs-keyword">try</span>:
                 parsed_response = json.loads(filtered_response)
                 <span class="hljs-keyword">return</span> parsed_response
             <span class="hljs-keyword">except</span> json.JSONDecodeError <span class="hljs-keyword">as</span> e:
                 print(<span class="hljs-string">f"JSON parsing error: <span class="hljs-subst">{e}</span>"</span>)
                 <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

         <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
             print(<span class="hljs-string">f"Problem occured while generating the response: <span class="hljs-subst">{e}</span>"</span>)
             <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>
</code></pre>
</li>
</ol>
<p>5 and 6. <strong>See the full code.</strong></p>
<h3 id="heading-full-code">Full Code:</h3>
<p><a target="_blank" href="https://gist.github.com/Pritom2357/8b5d3170799c719d30dfff0fcc584f47">See the full Code here</a></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>“Query Decomposition” is another optimization method to handle more complex queries and save the LLM from redundancy. For simpler queries, Parallel Query Retrieval was good enough. But Query decomposition enables specialization on the retrieval process.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">❓</div>
<div data-node-type="callout-text">Can you tweak the Query Decomposition function code so that it uses different specialized characters (Doctors, Engineers, etc.) to get specialized answers?</div>
</div>

<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Hint: Agents</div>
</div>]]></content:encoded></item><item><title><![CDATA[Reciprocate Rank Fusion (RRF)]]></title><description><![CDATA[Remember:
Previously, we made a RAG model with Parallel Query Retrieval (Fan Out system). If you did not read it, just click on the name “Parallel Query Retrieval.” I will take some context from there.
Context:
You did give a documentation of Node.js...]]></description><link>https://blogging.pritombiswas.com/reciprocate-rank-fusion-rrf</link><guid isPermaLink="true">https://blogging.pritombiswas.com/reciprocate-rank-fusion-rrf</guid><category><![CDATA[ChaiCode]]></category><category><![CDATA[Chaiaurcode]]></category><category><![CDATA[ChaiCohort]]></category><category><![CDATA[genai]]></category><category><![CDATA[GenAI Cohort]]></category><dc:creator><![CDATA[Pritom Biswas]]></dc:creator><pubDate>Sat, 07 Jun 2025 13:00:24 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1749281904097/7ab702c6-56e3-4a74-bc5c-efebf7a5e84d.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-remember">Remember:</h2>
<p>Previously, we made a RAG model with <a target="_blank" href="https://blogging.pritombiswas.com/parallel-query-fan-out-retrieval">Parallel Query Retrieval (Fan Out system)</a>. If you did not read it, just click on the name <a target="_blank" href="https://blogging.pritombiswas.com/parallel-query-fan-out-retrieval">“Parallel Query Retrieval.”</a> I will take some context from there.</p>
<h3 id="heading-context">Context:</h3>
<p>You did give a documentation of Node.js to the RAG model and asked, “What is fs?” But the documentation did not have the word “fs” in it. To solve this,</p>
<p>The LLM generated some similar questions (parallel query generation):</p>
<ul>
<li><p>What is the file system?</p>
</li>
<li><p>What is a file in Node.js?</p>
</li>
<li><p>How to create a file in Node.js?</p>
</li>
</ul>
<p>Then you searched these terms in the “Vector Store” and found this:</p>
<ol>
<li><p>You could not find anything for the question “What is fs?”</p>
</li>
<li><p>You found 2 similarities for the question “What is the file system?” (just denoting: one yellow, one blue)</p>
</li>
<li><p>You found one similarity for the question “What is a file in Node.js?” (one blue)</p>
</li>
<li><p>You found three similarities for the question “How to create a file in Node.js?” (one blue, one yellow, one red)</p>
</li>
</ol>
<p>The diagram was like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749145417858/e9568925-bad2-40be-8199-0019218ca404.png?auto=compress,format&amp;format=webp" alt /></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Observe the rankings of the files, what positions they appear in each query, and how much time they appear in total.</div>
</div>

<div data-node-type="callout">
<div data-node-type="callout-emoji">❓</div>
<div data-node-type="callout-text">Can you notice a major flaw in the response? The blue file appeared the highest times but ranked second. Will the response be dependable in this context?</div>
</div>

<p>Yeah, this response is somewhat optimized, but in a large context, it will lose its relevance and credibility, right? This problem gives birth to a new term, “Reciprocate Rank Fusion”.</p>
<h2 id="heading-what-is-reciprocate-rank-fusion-rrf">What is Reciprocate Rank Fusion (RRF)?</h2>
<h3 id="heading-some-insights">Some insights:</h3>
<p>Some hefty words, huh? Well, <strong>“<em>Reciprocate Rank Fusion</em>”</strong> just means <strong>“<em>Rank Them”</em></strong>, simple!!!</p>
<p>Now, the million-dollar question: “How will you rank them?” There should be two criteria, right?</p>
<ol>
<li><p>Sort the results on the total appearance in descending order.</p>
</li>
<li><p>If the total appearances for two results are the same, sort them in the order of appearance (1st, 2nd, 3rd).</p>
</li>
</ol>
<p>This is the main idea. It‘s time to do a dry run. Look at the diagram below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749286547910/df45b990-6b22-4cbc-b172-2d14f40be21f.png" alt class="image--center mx-auto" /></p>
<p>Now, notice the observations:</p>
<ol>
<li><p>We found 2 similarities for the question “What is the file system?” (just denoting: one yellow, one blue). Yellow was 1st, Blue was 2nd.</p>
</li>
<li><p>We found two similarities for the question “What is a file in Node.js?” (one blue, one red). Blue 1st, and Red 2nd.</p>
</li>
<li><p>We find three similarities for the question “How to create a file in Node.js?” (one blue, one yellow, one red). Blue is 1st, Yellow is 2nd, and Red is 3rd.</p>
</li>
</ol>
<p>So, the final observation:</p>
<ul>
<li><p>Blue has 3 appearances. So, it will be placed in the first position.</p>
</li>
<li><p>Yellow and Red both have appeared twice. But, Yellow appeared before the Red more frequently. So, Yellow will be 2nd.</p>
</li>
<li><p>Red will be 3rd.</p>
</li>
</ul>
<p>Wow, ranking done!!! Now, your result has more relevance than just retrieving data in parallel.</p>
<h3 id="heading-intuition">Intuition:</h3>
<p>If a document ranks higher in multiple lists (even if it doesn't appear in all), it’s probably important and gets a better final score.</p>
<h3 id="heading-definition">Definition:</h3>
<p><strong>Reciprocal Rank Fusion (RRF)</strong> is a simple yet powerful method used to combine multiple ranked lists of documents (or responses, answers, items, etc.) into a single fused ranking.</p>
<h3 id="heading-formula">Formula:</h3>
<p>$$\text{RRF_score}(\text{document}) = \sum_{i=1}^{n} \frac{1}{k + \text{rank}_i}$$</p><p>This generally means to add the reciprocal of the ranks added with a constant k (generally 60) throughout all the results. Let’s take a few more examples to understand this:</p>
<pre><code class="lang-markdown"><span class="hljs-section">## User query:</span>
"What is polymorhism?"

<span class="hljs-section">## LLM Generated queries</span>
queries = [
<span class="hljs-code">    "What is polymorphism?",
    "Types of polymorphism in OOP", 
    "How does polymorphism work?",
    "Polymorphism examples"
]
</span>

<span class="hljs-section">## Example results:</span>
 Query 1 results: [Page15, Page16, Page18, Page20]
 Query 2 results: [Page16, Page15, Page17, Page19] 
 Query 3 results: [Page15, Page18, Page16, Page21]
 Query 4 results: [Page17, Page15, Page20, Page16]

<span class="hljs-strong">**Here 3 pages appeared: Page15, Page16 and Page17**</span>

<span class="hljs-section"># RRF Calculations:</span>
Page15<span class="hljs-emphasis">_RRF = 1/(60+1) + 1/(60+2) + 1/(60+1) + 1/(60+2) = 0.0164 + 0.0161 + 0.0164 + 0.0161 = 0.0650
Page16_</span>RRF = 1/(60+2) + 1/(60+1) + 1/(60+3) + 1/(60+4) = 0.0161 + 0.0164 + 0.0159 + 0.0156 = 0.0640
Page18<span class="hljs-emphasis">_RRF = 1/(60+3) + 0 + 1/(60+2) + 0 = 0.0159 + 0 + 0.0161 + 0 = 0.0320

## Final ranking: Page15 (0.0650) &gt; Page16 (0.0640) &gt; Page18 (0.0320)</span>
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Look closely, here a higher score means higher relevance, why?</div>
</div>

<div data-node-type="callout">
<div data-node-type="callout-emoji">🧠</div>
<div data-node-type="callout-text">The more the results appear, the higher the summation value. Take some input and test it yourself.</div>
</div>

<h3 id="heading-why-k60">Why k=60?</h3>
<p>Look at the examples below:</p>
<pre><code class="lang-markdown"><span class="hljs-section"># Without k constant:</span>
Rank 1: 1/1 = 1.0
Rank 2: 1/2 = 0.5    ← 50% drop! Too harsh!
Rank 3: 1/3 = 0.33   ← 67% drop from rank 1

<span class="hljs-section"># With k=60:</span>
Rank 1: 1/(60+1) = 1/61 = 0.0164
Rank 2: 1/(60+2) = 1/62 = 0.0161  ← Only 2% drop
Rank 3: 1/(60+3) = 1/63 = 0.0159  ← Only 3% drop from rank 1
</code></pre>
<p>So, we need a constant value for stopping the value from dropping suddenly, and “60” is an experimental value that works well in almost every situation.</p>
<h3 id="heading-what-does-the-word-reciprocal-mean">🧠What does the word “Reciprocal” mean?</h3>
<p>Great question!</p>
<p>When we were analyzing and combining query results, we needed a way to fairly rank documents that appeared across different queries. Instead of simply assigning points based on ranks and summing them up, <strong>we used the <em>reciprocal</em> of the rank values</strong>.</p>
<p>Why? Because taking the <strong>inverse (or reciprocal)</strong> of a rank gives <strong>higher weight to top-ranked results</strong> while still allowing lower-ranked results to contribute. This method helps stabilize the scoring and makes the system more robust.</p>
<p>In the RRF formula, we take the rank of a document (like <em>Page15</em>), add a constant k (usually 60), and then <strong>invert</strong> the result. We repeat this for each query where the document appears and sum them up. This <strong>"reciprocation"</strong> is where the name <strong>Reciprocal Rank Fusion</strong> comes from.</p>
<p>In short:</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">⬅</div>
<div data-node-type="callout-text"><em>Reciprocal</em> means “inverse,” and using the inverse rank helps us fuse results in a smart and balanced way.</div>
</div>

<p>Our general discussion is done. Let’s move to some code:</p>
<h2 id="heading-some-examples">Some examples:</h2>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🔴</div>
<div data-node-type="callout-text">I am using a Docker container in my system for the vector store.</div>
</div>

<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">perform_parallel_search_with_rrf</span>(<span class="hljs-params">vector_store, queries, k_per_queries, rrf_k=<span class="hljs-number">60</span>, min_similarity_threshold = <span class="hljs-number">0.4</span></span>):</span>
    <span class="hljs-string">"""Perform search with multiple queries and combine results"""</span>
    all_documents = {}
    query_results = {}

    <span class="hljs-comment"># Running parallel search on it.</span>
    <span class="hljs-keyword">for</span> index, query <span class="hljs-keyword">in</span> enumerate(queries, <span class="hljs-number">1</span>):
        print(<span class="hljs-string">f"Runnung search on query <span class="hljs-subst">{index}</span>: <span class="hljs-subst">{query}</span>"</span>)
        query_results[query] = []
        response = vector_store.search(query, k=k_per_queries)

        <span class="hljs-comment"># Checing relevance of the response. The lower, the better. </span>
        <span class="hljs-comment"># Notice, this is the score generated by the search function or Qdrant DB and </span>
        <span class="hljs-comment"># according to them the similarity score should be lower to be more relevant after search</span>
        <span class="hljs-comment"># Check out this: https://qdrant.tech/documentation/concepts/search/</span>
        <span class="hljs-comment"># This is not same as our RRF (this is extra)</span>
        relevant_results = []
        <span class="hljs-keyword">for</span> rank, (document, score) <span class="hljs-keyword">in</span> enumerate(response, <span class="hljs-number">1</span>):
            <span class="hljs-keyword">if</span> score &lt;= min_similarity_threshold:
                relevant_results.append((rank, document, score))
            <span class="hljs-keyword">else</span>:
                print(<span class="hljs-string">f"The response no. <span class="hljs-subst">{index}</span> is not relevant enough"</span>)

        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> relevant_results:
            print(<span class="hljs-string">"No relevant results found, sorry"</span>)
            <span class="hljs-keyword">continue</span>

        <span class="hljs-comment"># Ranking begins</span>
        <span class="hljs-comment"># We first calculate in which position the document appears for a query</span>
        <span class="hljs-keyword">for</span> original_rank, document, score <span class="hljs-keyword">in</span> relevant_results:
            content_hash = hash(document.page_content[:<span class="hljs-number">100</span>].strip())
            page_number = document.metadata.get(<span class="hljs-string">'page'</span>, <span class="hljs-string">'unknown'</span>)
            doc_id = <span class="hljs-string">f"page_<span class="hljs-subst">{page_number}</span>_<span class="hljs-subst">{content_hash}</span>"</span>

            <span class="hljs-keyword">if</span> doc_id <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> all_documents:
                all_documents[doc_id] = {
                    <span class="hljs-string">'document'</span>: document,
                    <span class="hljs-string">'content'</span>: document.page_content,
                    <span class="hljs-string">'page'</span>: document.metadata.get(<span class="hljs-string">'page'</span>, <span class="hljs-string">'N/A'</span>),
                    <span class="hljs-string">'source'</span>: document.metadata.get(<span class="hljs-string">'source'</span>, <span class="hljs-string">'N/A'</span>),
                    <span class="hljs-string">'query_ranks'</span>: {},
                    <span class="hljs-string">'similarity_scores'</span>: {},
                    <span class="hljs-string">'rrf_contributions'</span>: {},
                    <span class="hljs-string">'queries_appeared'</span>: []
                }

            all_documents[doc_id][<span class="hljs-string">'query_ranks'</span>][query] = original_rank
            all_documents[doc_id][<span class="hljs-string">'similarity_scores'</span>][query] = score <span class="hljs-comment"># We save the Qdrant search score in here</span>
            all_documents[doc_id][<span class="hljs-string">'queries_appeared'</span>].append(query)

            rrf_contribution = <span class="hljs-number">1</span>/(original_rank+rrf_k) <span class="hljs-comment"># We calculate the RRF contribution for each document for each query</span>
            all_documents[doc_id][<span class="hljs-string">'rrf_contributions'</span>][query] = rrf_contribution

            query_results[query].append({
                <span class="hljs-string">'doc_id'</span>: doc_id,
                <span class="hljs-string">'rank'</span>: original_rank,
                <span class="hljs-string">'similarity_score'</span>: score,
                <span class="hljs-string">'rrf_contribution'</span>: rrf_contribution
            })

    rrf_results = []

    <span class="hljs-comment"># We then arrange the RRF contribution wise results in here.</span>
    <span class="hljs-comment"># We then sort the results</span>
    <span class="hljs-keyword">for</span> doc_id, doc_data <span class="hljs-keyword">in</span> all_documents.items():
        total_rrf_score = sum(doc_data[<span class="hljs-string">'rrf_contributions'</span>].values())
        num_of_queries_appeared = len(doc_data[<span class="hljs-string">'queries_appeared'</span>])
        avg_rank = sum(doc_data[<span class="hljs-string">'query_ranks'</span>].values())/num_of_queries_appeared
        avg_similarity = sum(doc_data[<span class="hljs-string">'similarity_scores'</span>].values())/num_of_queries_appeared
        best_rank = min(doc_data[<span class="hljs-string">'query_ranks'</span>].values())
        best_similarity = min(doc_data[<span class="hljs-string">'similarity_scores'</span>].values())

        consensus_score = num_of_queries_appeared/len(queries)

        rrf_results.append({
            <span class="hljs-string">'document'</span>: doc_data[<span class="hljs-string">'document'</span>],
            <span class="hljs-string">'content'</span>: doc_data[<span class="hljs-string">'content'</span>],
            <span class="hljs-string">'page'</span>: doc_data[<span class="hljs-string">'page'</span>],
            <span class="hljs-string">'source'</span>: doc_data[<span class="hljs-string">'source'</span>],
            <span class="hljs-string">'rrf_score'</span>: total_rrf_score,
            <span class="hljs-string">'consensus_score'</span>: consensus_score,
            <span class="hljs-string">'num_queries_appeared'</span>: num_of_queries_appeared,
            <span class="hljs-string">'avg_rank'</span>: avg_rank,
            <span class="hljs-string">'best_rank'</span>: best_rank,
            <span class="hljs-string">'avg_similarity'</span>: avg_similarity,
            <span class="hljs-string">'best_similarity'</span>: best_similarity,
            <span class="hljs-string">'query_ranks'</span>: doc_data[<span class="hljs-string">'query_ranks'</span>],
            <span class="hljs-string">'similarity_scores'</span>: doc_data[<span class="hljs-string">'similarity_scores'</span>],
            <span class="hljs-string">'rrf_contributions'</span>: doc_data[<span class="hljs-string">'rrf_contributions'</span>],
            <span class="hljs-string">'queries_appeared'</span>: doc_data[<span class="hljs-string">'queries_appeared'</span>]
        })

    rrf_results.sort(key=<span class="hljs-keyword">lambda</span> x:x[<span class="hljs-string">'rrf_score'</span>], reverse=<span class="hljs-literal">True</span>) <span class="hljs-comment"># We are sorting here.</span>

    <span class="hljs-keyword">return</span> rrf_results
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Read the comments in the code. Hope you will understand.</div>
</div>

<h3 id="heading-full-code">Full Code:</h3>
<p><a target="_blank" href="https://gist.github.com/Pritom2357/93b08ddb2af9beac37707f7ad9569e3b">See the full code here.</a></p>
<h2 id="heading-some-observations">Some observations:</h2>
<ol>
<li><p>Did you notice that I applied the “RRF” technique on “Parallel Query Retrieval”? Is it necessary?</p>
</li>
<li><p>Is there any problem with this technique? Will it generate expected results every time? When will this technique “hallucinate”/”fail”?</p>
</li>
<li><p>Can you check this implementation by giving different inputs?</p>
</li>
</ol>
<h2 id="heading-conclusion">Conclusion:</h2>
<p>“Reciprocate Rank Fusion” is just a fancy word. Under the hood, it just ranks the queries. Is it enough? I cannot say. There are a bunch of techniques that are better than the previous one. In the later articles, I will try to cover them. Stay Tuned!</p>
]]></content:encoded></item><item><title><![CDATA[Parallel Query (Fan Out) Retrieval]]></title><description><![CDATA[Introduction:
Previously, we learnt what RAG is and why Query Translation is important. Now, we will learn about a popular technique of Query Translation: Parallel Query Retrieval.
What is Parallel Query Retrieval?
Some Backstory:
We know that RAG wo...]]></description><link>https://blogging.pritombiswas.com/parallel-query-fan-out-retrieval</link><guid isPermaLink="true">https://blogging.pritombiswas.com/parallel-query-fan-out-retrieval</guid><category><![CDATA[ChaiCode]]></category><category><![CDATA[chai-code ]]></category><category><![CDATA[Chaiaurcode]]></category><category><![CDATA[ChaiCohort]]></category><category><![CDATA[genai]]></category><category><![CDATA[GenAI Cohort]]></category><dc:creator><![CDATA[Pritom Biswas]]></dc:creator><pubDate>Fri, 06 Jun 2025 18:55:41 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1749142968387/7a693133-129d-4b9a-96b7-5606964f1c5c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction:</h2>
<p>Previously, we learnt what <a target="_blank" href="https://blogging.pritombiswas.com/introduction-to-rag-101">RAG</a> is and why <a target="_blank" href="https://blogging.pritombiswas.com/advanced-rag-query-translation">Query Translation</a> is important. Now, we will learn about a popular technique of Query Translation: Parallel Query Retrieval.</p>
<h2 id="heading-what-is-parallel-query-retrieval">What is Parallel Query Retrieval?</h2>
<h3 id="heading-some-backstory">Some Backstory:</h3>
<p>We know that RAG works in some context (documents, web, or anything that has relevant data). Now, let’s think we have given a file for Node.js as a context to the RAG. Now, the user might ask this:<br /><em>“What is fs?”</em></p>
<p>As humans, we can understand, the user wants to know about “File System” in Node.js. But what if the Node.js documentation does not have the word “fs” in it, instead it has “file system” written everywhere. So, when using RAG, will it find any similarity? And will it be able to perform nicely?</p>
<p>No, right? But we need to take care of this.</p>
<p>Actually, we can solve this problem by this process:</p>
<ul>
<li><p>The user asks the question.</p>
</li>
<li><p>We prompt the LLM and generate some similar questions like that. In here, the questions might be..</p>
<ol>
<li><p>What is fs?</p>
</li>
<li><p>What is the file system?</p>
</li>
<li><p>What is a file in Node.js?</p>
</li>
<li><p>How to create a file in Node.js?</p>
</li>
</ol>
</li>
<li><p>We search the reference of the generated questions in the vector store or any database we embedded the documentation.</p>
</li>
<li><p>We find similar files in the vector store. Suppose:</p>
<ol>
<li><p>We could not find anything for the question “What is fs?”</p>
</li>
<li><p>We find 2 similarities for the question “What is the file system?” (just denoting: one yellow, one blue)</p>
</li>
<li><p>We find one similarity for the question “What is a file in Node.js?” (one blue)</p>
</li>
<li><p>We find three similarities for the question “How to create a file in Node.js?” (one blue, one yellow, one red)</p>
</li>
</ol>
</li>
<li><p>We then filter the results and take the unique files.</p>
</li>
<li><p>Finally, we give context of the three unique files (yellow, blue, and red) to the LLM and answer the user’s query.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749145417858/e9568925-bad2-40be-8199-0019218ca404.png" alt class="image--center mx-auto" /></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🤔</div>
<div data-node-type="callout-text">Did we solve our problem? Can the LLM answer our question now?</div>
</div>

<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Yes, the problem is solved for now. Look, the LLM did not know “fs” but did know “file system” for sure. Now, it can answer the user’s questions, right?</div>
</div>

<div data-node-type="callout">
<div data-node-type="callout-emoji">❓</div>
<div data-node-type="callout-text">Is this enough?</div>
</div>

<h3 id="heading-definition">Definition:</h3>
<p><strong>Parallel Query Retrieval</strong>, also known as <strong>Fan Out Retrieval</strong>, is a method where <strong>multiple variants</strong> of the same user query are created and sent <strong>in parallel</strong> to different or the same retrieval systems. The goal is to <strong>maximize recall</strong> and <strong>diversify</strong> the retrieved documents, ultimately helping the LLM generate more informed and accurate answers.</p>
<h3 id="heading-why-fan-out">Why Fan Out?</h3>
<p>The word “Fan Out” actually comes from Systems design and Networking. It means:</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🌐</div>
<div data-node-type="callout-text">Spreading a single input into multiple parallel paths or processes.</div>
</div>

<p>In Parallel Query Retrieval, we are taking a single user input and generating multiple queries and spreading them into multiple paths, just like “<em>Fanning Out the Queries”.</em> It’s like 4 or 5 experts are answering same questions, isn’t it interesting?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749148937842/3475e121-85ad-4f5e-859f-3dc9697ad1f4.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-some-examples">Some Examples:</h2>
<p>Let’s divide the process into some parts to understand better:</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🔴</div>
<div data-node-type="callout-text">I am using a Docker container in my system for the vector store.</div>
</div>

<ol>
<li><p>Parallel Query Generation:</p>
<pre><code class="lang-python"> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ParallelQuery</span>:</span>
     <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, api_key</span>):</span>
         genai.configure(api_key=api_key)
         self.model=genai.GenerativeModel(<span class="hljs-string">'gemini-1.5-flash-001'</span>)

     <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generateParallelQuery</span>(<span class="hljs-params">self, query, number_of_queries = <span class="hljs-number">3</span></span>):</span>
         <span class="hljs-keyword">try</span>:
             system_prompt = <span class="hljs-string">f"""
                 You are a helpful AI assistant who generates <span class="hljs-subst">{number_of_queries}</span> queries with similar topics of the given query=<span class="hljs-subst">{query}</span>.

                 METHOD:
                 1. You get a query, analyze it and find the keywords in that.
                 2. You generate similar words based on the keywords.
                 3. You make similar query like <span class="hljs-subst">{query}</span> using the newly generated keywords

                 EXAMPLE:
                 original: "What is fs in Node.js?"
                 generated:
                     1. "What is file system?"
                     2. "What are files in Node.js?"
                     3. "How to make files in Node.js?"

                 RETURN FORMAT
                 You only need to return the queries in this json format:
                 {{
                     "original": "<span class="hljs-subst">{query}</span>",
                     "generated": [
                         "generated_1",
                         "generated_2",
                         "generated_3"
                     ]
                 }}

                 Return ONLY valid JSON, no additional text.
             """</span>

             response = self.model.generate_content(
                 system_prompt
             )

             <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> response <span class="hljs-keyword">or</span> <span class="hljs-keyword">not</span> response.text:
                 print(<span class="hljs-string">"No response from model"</span>)
                 <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

             filtered_response = filter_response(response)

             <span class="hljs-keyword">try</span>:
                 parsed_response = json.loads(filtered_response)
                 <span class="hljs-keyword">return</span> parsed_response
             <span class="hljs-keyword">except</span> json.JSONDecodeError <span class="hljs-keyword">as</span> e:
                 print(<span class="hljs-string">f"JSON parsing error: <span class="hljs-subst">{e}</span>"</span>)
                 <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

         <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
             print(<span class="hljs-string">f"Problem occured while generating the response: <span class="hljs-subst">{e}</span>"</span>)
             <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>
</code></pre>
</li>
</ol>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Look at the system prompt closely, and you will understand. Other than that, everything is just refining the query</div>
</div>

<ol start="2">
<li><p><strong>Searching References:</strong></p>
<pre><code class="lang-python"> <span class="hljs-comment"># Main Parallel Search Function</span>

 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">perform_parallel_search</span>(<span class="hljs-params">vector_store, queries, k_per_queries</span>):</span>
     <span class="hljs-string">"""Perform search with multiple queries and combine results"""</span>
     all_results = []

     <span class="hljs-keyword">for</span> index, query <span class="hljs-keyword">in</span> enumerate(queries, <span class="hljs-number">1</span>):
         print(<span class="hljs-string">f"Running search on query: <span class="hljs-subst">{index}</span>"</span>)
         response = vector_store.search(query, k=k_per_queries)

         <span class="hljs-keyword">for</span> (document, score) <span class="hljs-keyword">in</span> response:
             all_results.append({
                 <span class="hljs-string">'query'</span>: query,
                 <span class="hljs-string">'document'</span>: document,
                 <span class="hljs-string">'score'</span>: score,
                 <span class="hljs-string">'content'</span>: document.page_content,
                 <span class="hljs-string">'page'</span>: document.metadata.get(<span class="hljs-string">'page'</span>, <span class="hljs-string">'N/A'</span>),
                 <span class="hljs-string">'source'</span>: document.metadata.get(<span class="hljs-string">'source'</span>, <span class="hljs-string">'N/A'</span>)
             })

     all_results.sort(key=<span class="hljs-keyword">lambda</span> x:x[<span class="hljs-string">'score'</span>])
     unique_results = remove_duplicate_results(all_results) <span class="hljs-comment">#just some function to remove duplicates, see more in the full code given below.</span>

     print(<span class="hljs-string">f"Total result's length: <span class="hljs-subst">{len(unique_results)}</span>"</span>)
     <span class="hljs-keyword">return</span> unique_results

 <span class="hljs-comment"># Function to search in Vector Store:</span>

 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">search</span>(<span class="hljs-params">self, query, k=<span class="hljs-number">5</span></span>):</span>
         <span class="hljs-string">"""Search the vector store for relevant data"""</span>
         <span class="hljs-keyword">try</span>:
             <span class="hljs-keyword">if</span> hasattr(self, <span class="hljs-string">'vector_store'</span>) <span class="hljs-keyword">and</span> self.vector_store:
                store = self.vector_store
             <span class="hljs-keyword">else</span>:
                 print(<span class="hljs-string">"Creating a new retriever..."</span>)
                 store = self._retrieve()
                 <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> store:
                     <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">"Failed to create a retriever...."</span>)

             results = store.similarity_search_with_score(query, k=k)
             print(<span class="hljs-string">f"Found <span class="hljs-subst">{len(results)}</span> results for the given query"</span>)
             <span class="hljs-keyword">return</span> results
         <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
             print(<span class="hljs-string">"Failed to search on the store"</span>)
             <span class="hljs-keyword">return</span> []

 <span class="hljs-comment"># This is in the VectorStore defined in the full code.</span>
</code></pre>
</li>
<li><p><strong>Main Function:</strong></p>
<pre><code class="lang-python">
 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
     request = input(<span class="hljs-string">"Query&gt; "</span>)
     number_of_queries = int(input(<span class="hljs-string">"Number of queries&gt; "</span>) <span class="hljs-keyword">or</span> <span class="hljs-string">"3"</span>)

     gemini_api = os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>)

     <span class="hljs-keyword">try</span>:
         gemini = ParallelQuery(api_key=gemini_api)
         vector_store = VectorStore(<span class="hljs-string">"Lecture 3 - Polymorphism_250520_224757.pdf"</span>)
     <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
         print(<span class="hljs-string">f"Error occured while setting up API and vector store: <span class="hljs-subst">{e}</span>"</span>)
         <span class="hljs-keyword">return</span>

     response = gemini.generateParallelQuery(request, number_of_queries)
     total_queries = [response[<span class="hljs-string">'original'</span>]]

     <span class="hljs-keyword">if</span> response:
         print(<span class="hljs-string">f"\nOriginal: <span class="hljs-subst">{response[<span class="hljs-string">'original'</span>]}</span>"</span>)
         <span class="hljs-keyword">for</span> index, query <span class="hljs-keyword">in</span> enumerate(response[<span class="hljs-string">'generated'</span>]):
             print(<span class="hljs-string">f"<span class="hljs-subst">{index+<span class="hljs-number">1</span>}</span>: <span class="hljs-subst">{query}</span>"</span>)
             total_queries.append(query)
     <span class="hljs-keyword">else</span>:
         print(<span class="hljs-string">"No response returned\n"</span>)

     results = perform_parallel_search(vector_store, total_queries, <span class="hljs-number">5</span>)

     <span class="hljs-keyword">for</span> index, result <span class="hljs-keyword">in</span> enumerate(results, <span class="hljs-number">1</span>):
         print(<span class="hljs-string">f"<span class="hljs-subst">{index}</span>: <span class="hljs-subst">{result[<span class="hljs-string">'content'</span>]}</span>"</span>)
         print(<span class="hljs-string">f"In page: <span class="hljs-subst">{result[<span class="hljs-string">'page'</span>]}</span>"</span>)
</code></pre>
</li>
</ol>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🧠</div>
<div data-node-type="callout-text">These are the basics of parallel query: <em>user question → parallel query generates → search on the vector store → gives more robust results.</em></div>
</div>

<h3 id="heading-full-code">Full Code:</h3>
<p><a target="_blank" href="https://gist.github.com/Pritom2357/c0ab0378ca161832c6fcc1e91763e272">See the full code here.</a></p>
<h2 id="heading-conclusion">Conclusion:</h2>
<p>So, Parallel Query (Fan Out) Retrieval - some fancy name, huh? Actually, this is an optimization process for better output. There are a lot of other techniques out there, and I will go through them one by one. For now, stay tuned.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">❓</div>
<div data-node-type="callout-text">I actually could make it more relevant. Can you tell me how?</div>
</div>]]></content:encoded></item><item><title><![CDATA[Advanced RAG: Query Translation]]></title><description><![CDATA[🤔Let’s think back
We learned about the “Basic RAG System” beforehand, right? If you did not read that, read from here. We know, a RAG system consists of these parts:

Indexing + Retrieval

Augmentation and

Generation


But these parts are not so us...]]></description><link>https://blogging.pritombiswas.com/advanced-rag-query-translation</link><guid isPermaLink="true">https://blogging.pritombiswas.com/advanced-rag-query-translation</guid><category><![CDATA[ChaiCohort]]></category><category><![CDATA[genai]]></category><category><![CDATA[GenAI Cohort]]></category><category><![CDATA[ChaiCode]]></category><dc:creator><![CDATA[Pritom Biswas]]></dc:creator><pubDate>Thu, 05 Jun 2025 15:54:52 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1749136104625/05db7f09-8916-4478-9471-780b0c90f4bc.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-lets-think-back">🤔Let’s think back</h2>
<p>We learned about the “Basic RAG System” beforehand, right? If you did not read that, read from <a target="_blank" href="https://blogging.pritombiswas.com/introduction-to-rag-101">here</a>. We know, a RAG system consists of these parts:</p>
<ul>
<li><p>Indexing + Retrieval</p>
</li>
<li><p>Augmentation and</p>
</li>
<li><p>Generation</p>
</li>
</ul>
<p>But these parts are not so useful by themselves. So, we need some optimization on them like this:</p>
<ul>
<li><p>Query Translation</p>
</li>
<li><p>Routing</p>
</li>
<li><p>Query Construction</p>
</li>
<li><p>Indexing + Retrieval</p>
</li>
<li><p>Augmentation and</p>
</li>
<li><p>Generation</p>
</li>
</ul>
<p>All these layers are added to get the best results from the LLM models. In this article, we will try to know about <strong>“Query Translation”</strong> and why it is needed.</p>
<h2 id="heading-what-is-query-translation">What is Query Translation?</h2>
<h3 id="heading-general-discussion">📖General Discussion:</h3>
<p>We know that RAG works in some context (documents, web, or anything that has relevant data). Now, let’s think we have given a file for Node.js as a context to the RAG. Now, the user might ask this:<br /><em>“What is fs?”</em></p>
<p>As humans, we can understand, the user wants to know about “File System” in Node.js. But what if the Node.js documentation does not have the word “fs” in it, instead it has “file system” written everywhere. So, when using RAG, will it find any similarity? And will it be able to perform nicely?</p>
<p>No, right?</p>
<p>Here comes the importance of “Remodeling the User Query”. We need to reshape/enhance/remodel the user’s query for better output. This method is called <em>“Query Translation”.</em></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749138328519/83b92f6a-a3be-498e-af35-d56fc6872b02.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-definition">➡️Definition:</h3>
<p>In the context of Retrieval-Augmented Generation (RAG), query translation is the <strong>process of transforming a user’s query into a more optimized form to improve the retrieval of relevant information.</strong></p>
<h2 id="heading-methods-of-query-translation">📃Methods of Query Translation:</h2>
<p>There are a lot of methods in here. Let’s discuss the main ones:</p>
<ul>
<li><p>Parallel Query (Fan Out) Retrieval</p>
</li>
<li><p>Reciprocate Rank Fusion</p>
</li>
<li><p>Query Decomposition</p>
</li>
<li><p>HyDE (Hypothetical Document Embeddings)</p>
</li>
</ul>
<p>These methods are mainly used in the industry. I’m not gonna discuss them in depth here, but in separate articles. Good Luck.</p>
]]></content:encoded></item><item><title><![CDATA[Introduction to RAG: 101]]></title><description><![CDATA[A common scenario:
You published a book on Generative AI on March 27th of this year. However, the Large Language Model (LLM) you’re using was last trained on March 26th. As we know, LLMs don’t have access to information beyond their last training cut...]]></description><link>https://blogging.pritombiswas.com/introduction-to-rag-101</link><guid isPermaLink="true">https://blogging.pritombiswas.com/introduction-to-rag-101</guid><category><![CDATA[ChaiCode]]></category><category><![CDATA[ChaiCohort]]></category><category><![CDATA[genai]]></category><category><![CDATA[GenAI Cohort]]></category><dc:creator><![CDATA[Pritom Biswas]]></dc:creator><pubDate>Thu, 05 Jun 2025 09:59:43 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1749059162763/d0e7e95f-b767-4843-ab6e-180a352cc75f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-a-common-scenario">A common scenario:</h2>
<p>You published a book on Generative AI on <strong>March 27th of this year</strong>. However, the Large Language Model (LLM) you’re using was last trained on <strong>March 26th</strong>. As we know, LLMs don’t have access to information beyond their last training cutoff.</p>
<p>So, from the model’s perspective, <strong>your book doesn’t exist</strong>—it’s invisible. You can't ask it questions about the book or expect it to summarize or reference its contents. This brings us to the key question:</p>
<p><strong><em>“How can I teach the LLM about my book?”</em></strong></p>
<p>There are several approaches to solve this problem:</p>
<ol>
<li><h3 id="heading-using-agents"><strong>Using Agents</strong></h3>
</li>
</ol>
<p>One way is to use <strong>agents</strong> that retrieve information from your book and present it to the user in response to queries. This can be effective in many cases.</p>
<p><strong>But is it feasible in all situations?</strong><br /><strong>Not always. Here’s why:</strong></p>
<p>If your book is extensive, the agent must search through the entire content, or at least across targeted indices, to find relevant information. This process can be <strong>resource-intensive</strong> and may not scale efficiently.</p>
<ol start="2">
<li><h3 id="heading-using-fine-tuning"><strong>Using Fine-Tuning</strong></h3>
</li>
</ol>
<p>Another approach is <strong>fine-tuning</strong>—training the LLM with your book's content so that it becomes familiar with the material and can respond to queries naturally.</p>
<p>Sounds ideal, right?</p>
<p><strong>But what if your book is updated frequently?</strong><br /><strong>Then this method becomes less efficient. Here’s why:</strong></p>
<p>Fine-tuning is both <strong>time-consuming and costly</strong>. Every time you update the book, you’d need to retrain the model with the new content, which is not practical if updates are frequent. In such cases, fine-tuning becomes a <strong>resource-draining</strong> solution.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🧐</div>
<div data-node-type="callout-text">Therefore, using only the agents will not suffice, and relying solely on fine-tuning will prove expensive in the long run. It would be very easy if we could just use these things conditionally in our application, right? Here comes the concept of RAG (Retrieval Augmented Generation)</div>
</div>

<hr />
<h2 id="heading-what-is-rag">What is RAG?</h2>
<p>Retrieval-Augmented Generation is a hybrid framework that combines two key features of modern AI systems:</p>
<ol>
<li><p>Information retrieval and</p>
</li>
<li><p>Text generation</p>
</li>
</ol>
<p>It was first introduced by Facebook AI in 2020 to overcome the knowledge gaps of the LLMs. Let’s dissect the terms to understand better:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749113328262/badb625c-8cc1-4e31-8501-21c57065d4bc.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-1-retrieval">1. Retrieval:</h3>
<p>At its core, it means fetching relevant information from an external knowledge source (e.g., databases, vector stores, documents, websites, etc) at the time of the query. Process:</p>
<ul>
<li><p>Instead of relying on what the model <em>“knows”</em>, it performs a search on the given sources.</p>
</li>
<li><p>The sources can be a vector store, a database, or anything that has the relevant information.</p>
</li>
<li><p>It is like asking the model to <em>“look something up”</em> before answering.</p>
</li>
</ul>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🧠</div>
<div data-node-type="callout-text">Is it like <em>“ChatGPT meeting Google”?</em></div>
</div>

<h3 id="heading-2-augmentation">2. Augmentation:</h3>
<p>“Augmented” means <em>“enhanced with extra capabilities.”</em> In this context:</p>
<ul>
<li><p>The retrieved documents are injected into the model’s context (as prompts) before generating the answer.</p>
</li>
<li><p>Some extra operation is done on the retrieved data to help the model understand the context better.</p>
</li>
<li><p>This helps the model to augment/enhance its “<em>knowledge base</em>“ in real time</p>
</li>
</ul>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">The model gets smarter, not by “<em>training</em>“ but by “<em>giving it helpful context</em>“.</div>
</div>

<h3 id="heading-3-generation">3. Generation:</h3>
<p>This step is easy. Process:</p>
<ul>
<li><p>Now, the model has the “context” it needed. Based on the context, it generates meaningful answers.</p>
</li>
<li><p>This is the actual question and answer phase, based on both the input query and the fetched context.</p>
</li>
</ul>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🤔</div>
<div data-node-type="callout-text">Now, tell me if agents are used in this process? Do I need to train the model?</div>
</div>

<p>Yeah, this is the core process that happens in RAGs.</p>
<h2 id="heading-a-simple-application">A simple application:</h2>
<div data-node-type="callout">
<div data-node-type="callout-emoji">⬅</div>
<div data-node-type="callout-text">Remember the book you published earlier? Let’s make a simple chat application on that book.</div>
</div>

<h3 id="heading-1-retrieval-1">1. Retrieval:</h3>
<p>I will follow these procedures:</p>
<ol>
<li><p><strong>Fix the Data Source:</strong> Here, the data source is your book.</p>
</li>
<li><p><strong>Fragmentation/Chunking:</strong> Divide the data into smaller fragments/chunks so that I can do operations on the data efficiently. (Chunking itself is an art. Will get back to it in some other article, stay tuned 🥰)</p>
</li>
<li><p><strong>Embedding:</strong> Embed the books’ data into the vector store (qdrant, Pinecone DB, etc.) so that I can easily search for similarity.</p>
</li>
<li><p><strong>Store:</strong> Store the embeddings in the vector store.</p>
</li>
<li><p><strong>User query embed:</strong> Get the user query and embed that also. (Need similar things to search, right?)</p>
</li>
<li><p><strong>Search:</strong> Finally, search the similarity according to the embeddings of the user query.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749115835936/b17d245b-5092-4cca-b9c3-826e1b7959a9.png" alt class="image--center mx-auto" /></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Extras: The part (1) is known as <em>“Indexing” </em>and part (2, 3, 4) is called <em>“Retrieval”</em></div>
</div>

<h3 id="heading-2-augmentation-1">2. Augmentation:</h3>
<p>I would like to follow this procedure:</p>
<ul>
<li><p><strong>Prompt the AI:</strong> Will feed the context to the API. Here, will feed similar data to the AI.</p>
</li>
<li><p><strong>Generate similar queries:</strong> You can skip this part. But it is good to give the AI more context. What if the user gives very dull queries🤔?</p>
</li>
</ul>
<p>Now, the LLM model knows the context of the query that the user has asked. 🙂</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749116418726/3d7714de-1814-44b4-9565-0fcf40aa3bdf.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-3-generation-1">3. Generation:</h3>
<ul>
<li><p>Feed all the queries.</p>
</li>
<li><p>Get the result.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749116526804/7affb96e-e19d-47c4-82f1-2cde41251d09.png" alt class="image--center mx-auto" /></p>
<p>Yeah, this will happen in the RAG for your book’s chatbot. Here’s the whole picture:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749116620251/e068389e-f2ef-4395-9bea-38d112565317.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-why-rag-matters">Why RAG matters:</h2>
<p>In this whole process, did I use any agents?</p>
<p>-Yes, retrieval part, right?</p>
<p>And how much agents did I use? Just on some smaller parts (chunks), right? Is it more efficient than traversing through the whole dataset?</p>
<p>-Of course.</p>
<p>And Fine-Tuning? Did I hardcore Fine-tune the model? No, right?</p>
<p>So, in short, RAG meets most of the <em>“Real-World”</em> applications and can interact with live knowledge. This fits almost in all situations nicely. 🙂</p>
<h2 id="heading-conclusion">Conclusion:</h2>
<p>RAG itself is very complex. I have only shown a very basic use case. In future articles, I will discuss on more complex systems. Stay tuned. 🥰🥰🥰</p>
]]></content:encoded></item><item><title><![CDATA[Fine Tuning and more...]]></title><description><![CDATA[What is Fine-Tuning?
First, let me create a scenario:
Suppose an LLM model trained its dataset on 25th March and you have started a business from 27th March of the same year. We all know that every model available now has a cut-off time, right? That ...]]></description><link>https://blogging.pritombiswas.com/fine-tuning-and-more</link><guid isPermaLink="true">https://blogging.pritombiswas.com/fine-tuning-and-more</guid><category><![CDATA[ChaiCode]]></category><category><![CDATA[ChaiCohort]]></category><category><![CDATA[genai]]></category><category><![CDATA[GenAI Cohort]]></category><dc:creator><![CDATA[Pritom Biswas]]></dc:creator><pubDate>Tue, 03 Jun 2025 18:31:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748966579606/cca26b02-8d46-436a-98e8-352500657e03.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-what-is-fine-tuning">What is Fine-Tuning?</h2>
<p>First, let me create a scenario:</p>
<p>Suppose an LLM model trained its dataset on 25th March and you have started a business from 27th March of the same year. We all know that every model available now has a cut-off time, right? That means each pre-trained model can have all the data available on the dataset until a fixed date and after that, it does not know anything. So, as you started late, the LLM model itself does not know anything about your business. Now, you have a problem.</p>
<p><strong><em>“How can the users get the latest/important data of your business???“</em></strong></p>
<p>You can solve this problem by several methods:</p>
<ul>
<li><p><strong>Use AI Agents:</strong> You can use agents to scrape data from the internet. But this works on a very shallow level and cannot answer any query that is not on the internet.</p>
</li>
<li><p><strong>Train the AI model:</strong> There is another approach. Train the LLM model on your business data and open the data for the users to query on that so that it can answer thoroughly on the business. This thing is better than just using some agents.</p>
</li>
</ul>
<p>Here, you trained your model on your data and made a transformed model to meet your needs, “<strong>THIS IS CALLED FINE-TUNING</strong>”. Let’s see the formal definition…</p>
<h3 id="heading-definition">Definition:</h3>
<p>*“*<strong><em>Fine-tuning</em></strong> <em>is the process of taking a</em> <strong><em>pre-trained model</em></strong> <em>(typically on a large, general dataset) and</em> <strong><em>further training it</em></strong> <em>on a</em> <strong><em>smaller, task-specific dataset</em></strong> <em>to adapt it to a particular problem.“</em></p>
<h2 id="heading-why-is-this-needed">Why is this needed?</h2>
<p>Easily speaking, to fit the LLM model according to some specific needs. This thing also helps in these cases:</p>
<ul>
<li><p>It can reduce computing costs and training time</p>
</li>
<li><p>Can work on smaller datasets</p>
</li>
<li><p>Can give better performance</p>
</li>
</ul>
<h2 id="heading-process-of-fine-tuning">Process of Fine-Tuning:</h2>
<p>To Fine-Tune a model these steps are followed:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748971698274/4abb4e9c-6883-4460-86f9-05ae7ac9fa52.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-methods-of-fine-tuning">Methods of Fine-Tuning:</h2>
<p>There are several methods of fine-tuning:</p>
<ul>
<li><p>Full Fine-Tuning (also known as Full Parameter Fine-Tuning)</p>
</li>
<li><p>Partial/Layer-wise Fine-Tuning</p>
</li>
<li><p>LoRA Fine-Tuning</p>
</li>
<li><p>PEFT (Parameter-Efficient Fine-Tuning)</p>
</li>
</ul>
<p>Now, let’s elaborate on some of these.</p>
<ol>
<li><h3 id="heading-full-fine-tuning"><strong>Full Fine-Tuning:</strong></h3>
<p> In full Fine-Tuning, you adjust the actual weights of the pre-trained LLM model through Forward Propagation, Loss Calculation, Back Propagation, and then Weight Update.</p>
<p> This method provides the most accurate solution, with a low risk of incorrect information. It works well for smaller models, but it's not as efficient for larger ones. Why?</p>
<p> Because you need to update the entire LLM, and training a whole model is very costly in terms of hardware and time. If you want to train a model often, it will use a lot of resources, which isn't practical.</p>
</li>
<li><h3 id="heading-lora-low-rank-adaptation"><strong>LoRA (Low-Rank Adaptation) :</strong></h3>
<p> Earlier, we saw that training the entire model (actual LLM) is very expensive. So, what if instead of training the whole model, we create a separate memory space to store the differences in responses based on queries from the actual model? Then, when we ask the model something next time, we add these differences to the response to get the desired answer. This is the process of the “<strong><em>Low-Rank Adaptation</em></strong>” method.</p>
<p> A little bit of confusing, right?</p>
<p> Let’s answer this, “How do the LLM models generate responses???”</p>
<p> -Doesn’t it find the nearest values from its vector embeddings? Isn’t it just the next token prediction?</p>
<p> -Yes.</p>
<p> So, in the end, everything operates on some numbers, right? So, if we calculate how much a response token is deviated from our desired token and then on our next query add the deviation with the response token, won’t we get our desired response? Yeah, sure we are. This is the main idea behind this process. Let’s see diagrams:</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748974429337/1515ad3d-cc77-48f8-b0c9-ec97de619348.png" alt class="image--center mx-auto" /></p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748974473463/707c4f55-20ea-4509-bc06-1877908f798a.png" alt class="image--center mx-auto" /></p>
<p> <em>The first diagram runs for the first time and trains the new LLM model with fine-tuned data. For each query, the second diagram then runs.</em></p>
<p> This process is very time-efficient. I mean you do not need to change the original LLM, but make a new temporary model and use its deviation, simple!!!</p>
<p> But it consumes a lot of memory (trade-offs between memory and time). And as it runs on deviation, it does not work very well where precision matters.</p>
</li>
</ol>
<p>I will not discuss the other two, will leave them to you!!!</p>
<h3 id="heading-some-insight-on-lora">Some insight on LoRA:</h3>
<p>Let’s say we have a weight matrix in an LLM with dimensions <strong>m × n</strong>. Fine-tuning such large matrices directly can be computationally expensive and memory-intensive.</p>
<p>This is where <strong>LoRA (Low-Rank Adaptation)</strong> shines.</p>
<p>Instead of updating the full <strong>m × n</strong> matrix during fine-tuning, LoRA introduces a <strong>“delta” matrix</strong> — a learned adjustment to the original weights. Due to the nature of most tasks, this delta matrix tends to be <strong>sparse</strong> (mostly zeros) and <strong>low-rank</strong>, meaning that only a small subset of changes actually matter.</p>
<p>Here’s the clever part:<br />Rather than modifying the full matrix, LoRA decomposes it into two smaller matrices of shapes <strong>m × r</strong> and <strong>r × n</strong>, where <strong>r « m, n</strong>. Fine-tuning is applied to these smaller matrices. During inference, they are multiplied and added back to the original weights, reconstructing the adapted transformation.</p>
<p>This approach:</p>
<ul>
<li><p><strong>Preserves performance</strong></p>
</li>
<li><p><strong>Minimizes memory and compute overhead</strong></p>
</li>
<li><p>Allows <strong>parameter-efficient fine-tuning</strong> even for very large models</p>
</li>
</ul>
<p>That’s the core idea: <strong>train small, plug back smartly</strong>. LoRA makes large-scale model adaptation practical and scalable.</p>
<h2 id="heading-use-cases-of-fine-tuning">Use cases of Fine-Tuning:</h2>
<ul>
<li><p>Heavily used in chatbot training</p>
</li>
<li><p>Code completion for specific languages</p>
</li>
<li><p>Image classification system (eg, Medical Sectors)</p>
</li>
<li><p>etc.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion:</h2>
<p>Fine-tuning is a way to get data specific to a system. There are many other methods (like Agents, RAG, etc.), but for certain needs, where adding an extra layer for a specific use case on an LLM is needed for a while, Fine-Tuning works well.</p>
]]></content:encoded></item><item><title><![CDATA[Let's make our Agent]]></title><description><![CDATA[What is an Agent?
An agent is something that can automatically perform tasks, reason, and generate results.
We have seen that LLM models and AIs are like brains that can think, reason, and answer questions. But this isn't very practical on its own. W...]]></description><link>https://blogging.pritombiswas.com/lets-make-our-agent</link><guid isPermaLink="true">https://blogging.pritombiswas.com/lets-make-our-agent</guid><category><![CDATA[ChaiCode]]></category><category><![CDATA[Chaiaurcode]]></category><category><![CDATA[genai]]></category><category><![CDATA[GenAI Cohort]]></category><dc:creator><![CDATA[Pritom Biswas]]></dc:creator><pubDate>Sat, 31 May 2025 06:06:58 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748668899034/018d8a97-7a1a-4893-89be-4f5268095e44.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-what-is-an-agent">What is an Agent?</h2>
<p>An agent is something that can automatically perform tasks, reason, and generate results.</p>
<p>We have seen that LLM models and AIs are like brains that can think, reason, and answer questions. But this isn't very practical on its own. What would you do with just that? Create a chatbot and chat all day?</p>
<p>Here comes the concept of AI Agents, which is much broader than just LLMs! The main idea is to give some actions to the LLM models, like giving hands and legs to the AI so they can do their tasks.</p>
<p>So, the official definition of AI Agents is:<br />“An AI agent is <strong><mark>a software system designed to interact with its environment, gather information, and perform tasks autonomously to achieve predetermined goals set by humans or other systems. </mark></strong> “</p>
<h2 id="heading-how-does-it-work">How does it work?</h2>
<p>Ok, we are done with the definition. Now comes the part of its mechanism. According to some resources, there are five core components of AI Agents:  </p>
<ol>
<li><p><strong>Perception System:</strong> Agents receive input from users or sensors. (Generally, the user query)</p>
</li>
<li><p><strong>Reasoning Engine:</strong> The LLM that processes information and makes decisions. (The AI models)</p>
</li>
<li><p><strong>Tool Use:</strong> The ability to call external functions. (We will get back to it.)</p>
</li>
<li><p><strong>Decision Framework:</strong> some structured workflow: plan → action → observe → output</p>
<ol>
<li><p>Plan: Decides what to do based on the query</p>
</li>
<li><p>Action: Calls appropriate functions with specific parameters</p>
</li>
<li><p>Observe: Processes the results from function calls</p>
</li>
<li><p>Output: Provides final responses to use<strong>rs</strong></p>
</li>
</ol>
</li>
<li><p><strong>Memory:</strong> The agent maintains conversation history to track context.</p>
</li>
</ol>
<p>This is the basic workflow of an agent. Now, let’s make a simple weather agent of our own :)</p>
<h2 id="heading-our-first-ai-agent">Our First AI Agent:</h2>
<p>I am gonna explain the code step by step. Don’t forget to read the comments.</p>
<p>Just some general imports:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">from</span> google <span class="hljs-keyword">import</span> genai
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

load_dotenv()
GEMINI_API_KEY = os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>)
client = genai.Client(api_key=GEMINI_API_KEY)
<span class="hljs-comment">#just install the pacakages/dependencies in your code file</span>
</code></pre>
<ol>
<li><p><strong>Perception system:</strong> Input from the user.</p>
<pre><code class="lang-python"> user_query = input(<span class="hljs-string">'&gt; '</span>)
</code></pre>
</li>
<li><p>Reasoning Engine: I am using the Gemini API here. (It’s kind of free :) )</p>
<pre><code class="lang-python"> system_prompt = <span class="hljs-string">f"""
     You're a helpful AI assistant who is specialized in resolving user query.
     You work on plan, action, observe, output mode.

     Available tools: <span class="hljs-subst">{list(available_functions.keys())}</span>
     Tool descriptions:
     - get_weather(city: str): Returns weather information for a given city

     IMPORTANT RULES:
     - Return ONLY ONE step per response, not multiple steps
     - Start with "plan" step first
     - Wait for next input before proceeding to next step
     - When step is "action", you MUST specify function and input

     Output JSON Format (return only ONE):
     {{
         "step": "plan|action|observe|output",
         "content": "description of what you're doing",
         "function": "function name (only for action step)",
         "input": "function parameter (only for action step)"
     }}

     User Query: """</span>
</code></pre>
<p> This is the system prompt of the system, which will decide where to execute what part. In a word, will reason the whole process based on the user query.</p>
</li>
<li><p><strong>Tool Use:</strong> I am using a weather API, and the LLM will call the API when needed.</p>
<pre><code class="lang-python"> <span class="hljs-comment">## This is the main part for the function call</span>
 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_weather</span>(<span class="hljs-params">city: str</span>):</span>
     response = requests.get(<span class="hljs-string">f"https://wttr.in/<span class="hljs-subst">{city}</span>?format=%C:%t"</span>)

     <span class="hljs-keyword">if</span>(response.status_code == <span class="hljs-number">200</span>):
         data = response.text.split(<span class="hljs-string">':'</span>)
         situation = data[<span class="hljs-number">0</span>]
         temp = data[<span class="hljs-number">1</span>]
         <span class="hljs-keyword">return</span> <span class="hljs-string">f"weather situation: <span class="hljs-subst">{situation}</span> and temparature: <span class="hljs-subst">{temp}</span>"</span>
     <span class="hljs-keyword">else</span>:
         print(<span class="hljs-string">"API failed to get weather data"</span>)

 <span class="hljs-comment">## This is the object for function listing. Notice in the systemp_prompt, I am listing the available functions there</span>
 available_functions = {
     <span class="hljs-string">"get_weather"</span>: get_weather
 }
</code></pre>
</li>
<li><p>Decision Framework + Memorising:</p>
<p> Observe the code closely.</p>
<pre><code class="lang-python"> <span class="hljs-keyword">while</span> step_count &lt; max_steps:
         <span class="hljs-comment">## Memorising the previous prompts</span>
         full_prompt = system_prompt + user_query + <span class="hljs-string">"\n"</span> + conversation_history

         response = client.models.generate_content(
             model=<span class="hljs-string">"gemini-2.0-flash-001"</span>,
             contents=full_prompt
         )

         print(<span class="hljs-string">f"AI Response: <span class="hljs-subst">{response.text}</span>\n"</span>)

         parsed = parse_response(response.text) <span class="hljs-comment">## An additional function for parsing, will give it later.</span>
         <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> parsed:
             print(<span class="hljs-string">"Failed to parse response"</span>)
             <span class="hljs-keyword">break</span>

         <span class="hljs-comment">## Here, the main game begins. First the function checks the step name and its content. </span>
         <span class="hljs-comment">## Based on the name and content, it decides its action.</span>
         <span class="hljs-comment">## If step is calling for an action aka. funciton, it calls a function</span>
         <span class="hljs-comment">## If step is calling for output, it stops.</span>
         <span class="hljs-comment">## No action for plan and observe, as it will just be handled and will do nothing (printing it though)</span>

         step = parsed.get(<span class="hljs-string">"step"</span>)
         content = parsed.get(<span class="hljs-string">"content"</span>)

         <span class="hljs-keyword">if</span> step == <span class="hljs-string">"action"</span>:
             function_name = parsed.get(<span class="hljs-string">"function"</span>)
             function_input = parsed.get(<span class="hljs-string">"input"</span>)

             <span class="hljs-keyword">if</span> function_name <span class="hljs-keyword">in</span> available_functions:
                 result = available_functions[function_name](function_input)
                 observation = <span class="hljs-string">f"Function <span class="hljs-subst">{function_name}</span> returned: <span class="hljs-subst">{result}</span>"</span>
                 print(<span class="hljs-string">f"Function Call: <span class="hljs-subst">{function_name}</span> ('<span class="hljs-subst">{function_input}</span>')"</span>)
                 print(<span class="hljs-string">f"Result: <span class="hljs-subst">{result}</span>\n"</span>)

                 conversation_history += <span class="hljs-string">f"\nObservation: <span class="hljs-subst">{observation}</span>"</span>

             <span class="hljs-keyword">else</span>:
                 print(<span class="hljs-string">f"Function <span class="hljs-subst">{function_name}</span> not available"</span>)
                 <span class="hljs-keyword">break</span>

         <span class="hljs-keyword">elif</span> step == <span class="hljs-string">"output"</span>:
             print(<span class="hljs-string">"=== FINAL ANSWER ==="</span>)
             print(content)
             <span class="hljs-keyword">break</span>

         conversation_history += <span class="hljs-string">f"\nstep: <span class="hljs-subst">{response.text}</span>"</span>
         step_count += <span class="hljs-number">1</span>

     <span class="hljs-keyword">if</span> step_count &gt;= max_steps:
         print(<span class="hljs-string">"Maximum steps reached"</span>)
</code></pre>
<p> Like the explanation (comments in the code), the code executes the decision framework nicely.</p>
</li>
</ol>
<p>Yeah, this is the main workflow of an agent. Now, this is the whole code:  </p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">from</span> google <span class="hljs-keyword">import</span> genai
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

load_dotenv()
GEMINI_API_KEY = os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>)
client = genai.Client(api_key=GEMINI_API_KEY)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_weather</span>(<span class="hljs-params">city: str</span>):</span>
    response = requests.get(<span class="hljs-string">f"https://wttr.in/<span class="hljs-subst">{city}</span>?format=%C:%t"</span>)

    <span class="hljs-keyword">if</span>(response.status_code == <span class="hljs-number">200</span>):
        data = response.text.split(<span class="hljs-string">':'</span>)
        situation = data[<span class="hljs-number">0</span>]
        temp = data[<span class="hljs-number">1</span>]
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"weather situation: <span class="hljs-subst">{situation}</span> and temparature: <span class="hljs-subst">{temp}</span>"</span>
    <span class="hljs-keyword">else</span>:
        print(<span class="hljs-string">"API failed to get weather data"</span>)

available_functions = {
    <span class="hljs-string">"get_weather"</span>: get_weather
}


system_prompt = <span class="hljs-string">f"""
    You're a helpful AI assistant who is specialized in resolving user query.
    You work on plan, action, observe, output mode.

    Available tools: <span class="hljs-subst">{list(available_functions.keys())}</span>
    Tool descriptions:
    - get_weather(city: str): Returns weather information for a given city

    IMPORTANT RULES:
    - Return ONLY ONE step per response, not multiple steps
    - Start with "plan" step first
    - Wait for next input before proceeding to next step
    - When step is "action", you MUST specify function and input

    Output JSON Format (return only ONE):
    {{
        "step": "plan|action|observe|output",
        "content": "description of what you're doing",
        "function": "function name (only for action step)",
        "input": "function parameter (only for action step)"
    }}

    User Query: """</span>


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">parse_response</span>(<span class="hljs-params">response_text</span>):</span>
    <span class="hljs-string">"""Extract JSON from the response"""</span>
    <span class="hljs-keyword">try</span>:
        text = response_text.replace(<span class="hljs-string">"```json"</span>, <span class="hljs-string">""</span>).replace(<span class="hljs-string">'```'</span>, <span class="hljs-string">""</span>)
        lines = text.strip().split(<span class="hljs-string">'\n'</span>)

        <span class="hljs-keyword">for</span> line <span class="hljs-keyword">in</span> lines:
            line = line.strip()
            <span class="hljs-keyword">if</span> line.startswith(<span class="hljs-string">'{'</span>) <span class="hljs-keyword">and</span> line.endswith(<span class="hljs-string">'}'</span>):
                <span class="hljs-keyword">try</span>:
                    <span class="hljs-keyword">return</span> json.loads(line)
                <span class="hljs-keyword">except</span>:
                    <span class="hljs-keyword">continue</span>
        start = text.find(<span class="hljs-string">'{'</span>)
        <span class="hljs-keyword">if</span> start != <span class="hljs-number">-1</span>:
            brace_count = <span class="hljs-number">0</span>
            <span class="hljs-keyword">for</span> i, char <span class="hljs-keyword">in</span> enumerate(text[start:], start):
                <span class="hljs-keyword">if</span> char == <span class="hljs-string">'{'</span>:
                    brace_count += <span class="hljs-number">1</span>
                <span class="hljs-keyword">elif</span> char == <span class="hljs-string">'}'</span>:
                    brace_count -= <span class="hljs-number">1</span>
                    <span class="hljs-keyword">if</span> brace_count == <span class="hljs-number">0</span>:
                        json_str = text[start: i+<span class="hljs-number">1</span>]
                        <span class="hljs-keyword">return</span> json.loads(json_str)
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        print(<span class="hljs-string">f"Parse error: <span class="hljs-subst">{e}</span>"</span>)
        <span class="hljs-keyword">pass</span>

    <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run_agent</span>(<span class="hljs-params">user_query</span>):</span>
    conversation_history = <span class="hljs-string">""</span>
    step_count = <span class="hljs-number">0</span>
    max_steps = <span class="hljs-number">100</span>

    print(<span class="hljs-string">f"User Query: <span class="hljs-subst">{user_query}</span>\n"</span>)

    <span class="hljs-keyword">while</span> step_count &lt; max_steps:
        full_prompt = system_prompt + user_query + <span class="hljs-string">"\n"</span> + conversation_history

        response = client.models.generate_content(
            model=<span class="hljs-string">"gemini-2.0-flash-001"</span>,
            contents=full_prompt
        )

        print(<span class="hljs-string">f"AI Response: <span class="hljs-subst">{response.text}</span>\n"</span>)

        parsed = parse_response(response.text)
        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> parsed:
            print(<span class="hljs-string">"Failed to parse response"</span>)
            <span class="hljs-keyword">break</span>

        step = parsed.get(<span class="hljs-string">"step"</span>)
        content = parsed.get(<span class="hljs-string">"content"</span>)

        <span class="hljs-keyword">if</span> step == <span class="hljs-string">"action"</span>:
            function_name = parsed.get(<span class="hljs-string">"function"</span>)
            function_input = parsed.get(<span class="hljs-string">"input"</span>)

            <span class="hljs-keyword">if</span> function_name <span class="hljs-keyword">in</span> available_functions:
                result = available_functions[function_name](function_input)
                observation = <span class="hljs-string">f"Function <span class="hljs-subst">{function_name}</span> returned: <span class="hljs-subst">{result}</span>"</span>
                print(<span class="hljs-string">f"Function Call: <span class="hljs-subst">{function_name}</span> ('<span class="hljs-subst">{function_input}</span>')"</span>)
                print(<span class="hljs-string">f"Result: <span class="hljs-subst">{result}</span>\n"</span>)

                conversation_history += <span class="hljs-string">f"\nObservation: <span class="hljs-subst">{observation}</span>"</span>

            <span class="hljs-keyword">else</span>:
                print(<span class="hljs-string">f"Function <span class="hljs-subst">{function_name}</span> not available"</span>)
                <span class="hljs-keyword">break</span>

        <span class="hljs-keyword">elif</span> step == <span class="hljs-string">"output"</span>:
            print(<span class="hljs-string">"=== FINAL ANSWER ==="</span>)
            print(content)
            <span class="hljs-keyword">break</span>

        conversation_history += <span class="hljs-string">f"\nstep: <span class="hljs-subst">{response.text}</span>"</span>
        step_count += <span class="hljs-number">1</span>

    <span class="hljs-keyword">if</span> step_count &gt;= max_steps:
        print(<span class="hljs-string">"Maximum steps reached"</span>)

user_query = input(<span class="hljs-string">'&gt; '</span>)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    user_query = user_query
    run_agent(user_query)
</code></pre>
<h3 id="heading-sample-input">Sample Input:</h3>
<pre><code class="lang-python">&gt; What <span class="hljs-keyword">is</span> the weather <span class="hljs-keyword">in</span> Satkhira?
</code></pre>
<h3 id="heading-sample-output">Sample Output:</h3>
<pre><code class="lang-python">User Query: What <span class="hljs-keyword">is</span> the weather <span class="hljs-keyword">in</span> Satkhira?

AI Response: ```json
{
        <span class="hljs-string">"step"</span>: <span class="hljs-string">"plan"</span>,
        <span class="hljs-string">"content"</span>: <span class="hljs-string">"I need to get the weather information for Satkhira. I will use the get_weather tool to get the weather information."</span>,   
        <span class="hljs-string">"function"</span>: null,
        <span class="hljs-string">"input"</span>: null
}
```


AI Response: ```json
{
        <span class="hljs-string">"step"</span>: <span class="hljs-string">"action"</span>,
        <span class="hljs-string">"content"</span>: <span class="hljs-string">"Get weather information for Satkhira"</span>,
        <span class="hljs-string">"function"</span>: <span class="hljs-string">"get_weather"</span>,
        <span class="hljs-string">"input"</span>: <span class="hljs-string">"Satkhira"</span>
}
```

Function Call: get_weather (<span class="hljs-string">'Satkhira'</span>)
Result: weather situation: Overcast <span class="hljs-keyword">and</span> temparature: +<span class="hljs-number">31</span>°C

AI Response: step: ```json
{
        <span class="hljs-string">"step"</span>: <span class="hljs-string">"observe"</span>,
        <span class="hljs-string">"content"</span>: <span class="hljs-string">"The weather in Satkhira is Overcast and the temperature is +31°C."</span>,
        <span class="hljs-string">"function"</span>: null,
        <span class="hljs-string">"input"</span>: null
}
```

AI Response: ```json
{
        <span class="hljs-string">"step"</span>: <span class="hljs-string">"output"</span>,
        <span class="hljs-string">"content"</span>: <span class="hljs-string">"The weather in Satkhira is Overcast and the temperature is +31°C."</span>,
        <span class="hljs-string">"function"</span>: null,
        <span class="hljs-string">"input"</span>: null
}
```

=== FINAL ANSWER ===
The weather <span class="hljs-keyword">in</span> Satkhira <span class="hljs-keyword">is</span> Overcast <span class="hljs-keyword">and</span> the temperature <span class="hljs-keyword">is</span> +<span class="hljs-number">31</span>°C.
</code></pre>
<p>The agent will extract the city name and pass it to the get_weather function, then extract the result from the API, and show the result.</p>
<h2 id="heading-conclusion">Conclusion:</h2>
<p>Seems pretty normal, right? This is just one example of how Agents work. Try giving more complex queries, like finding the average temperature of three districts or extracting the temperatures of several districts and displaying them in a table. Then you'll realize how much more powerful it is compared to a regular API call and query. Just imagine the possibilities if the query is done on a database or the entire Internet.</p>
]]></content:encoded></item><item><title><![CDATA[Different Prompting Styles]]></title><description><![CDATA[What is Prompting?
Prompting, also known as prompt engineering, is the process of providing various inputs (such as text, images, or documents) to an AI model (or large language model, or LLM) to achieve the desired output.
We usually ask an AI model...]]></description><link>https://blogging.pritombiswas.com/different-prompting-styles</link><guid isPermaLink="true">https://blogging.pritombiswas.com/different-prompting-styles</guid><category><![CDATA[ChaiCode]]></category><category><![CDATA[Chaiaurcode]]></category><category><![CDATA[genai]]></category><category><![CDATA[GenAI Cohort]]></category><dc:creator><![CDATA[Pritom Biswas]]></dc:creator><pubDate>Fri, 30 May 2025 09:34:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748586281025/258ad878-a4d5-4367-9ba1-b9ac8ccaa9e4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-what-is-prompting"><strong>What is Prompting?</strong></h2>
<p>Prompting, also known as prompt engineering, is the process of providing various inputs (such as text, images, or documents) to an AI model (or large language model, or LLM) to achieve the desired output.</p>
<p>We usually ask an AI model a question, and it gives us an answer. But it can be more efficient if we use the right format. AI models are just a bunch of code with instructions to give the best output. This is where "<strong><em>Prompt Engineering</em></strong>" comes in.</p>
<p>Different AI model in the market follows different prompting structure. Here are some examples:</p>
<ul>
<li><h3 id="heading-openaihttpsplatformopenaicomdocsguidestextapi-moderesponses"><a target="_blank" href="https://platform.openai.com/docs/guides/text?api-mode=responses">OpenAI</a></h3>
</li>
</ul>
<pre><code class="lang-plaintext">{
    "role": "system",
    "content": "some system prompt" // eg. "You are a helpful assistant that answers in bullet points."
},
{
    "role": "user",
    "content": "some user prompt" // eg. "Explain how solar panels work."
}
</code></pre>
<ul>
<li><a target="_blank" href="https://medium.com/@eboraks/llama-2-prompt-engineering-extracting-information-from-articles-examples-45158ff9bd23">Llama 2</a></li>
</ul>
<pre><code class="lang-plaintext">&lt;s&gt;
[INST]
    &lt;&lt;SYS&gt;&gt;
        You are a helpful, concise assistant that answers technical questions. //system prompt
    &lt;&lt;/SYS&gt;&gt;

    How does a binary search tree work? //user prompt
[/INST]
</code></pre>
<p>Grok, Gemini kind of follows the same structure as OpenAI</p>
<h2 id="heading-types-of-prompting-techniques">Types of Prompting Techniques</h2>
<p>We've seen how different AI models need their inputs to be formatted. When we prompt something, we not only use the right format but also try to get the best answers from the LLMs. To do this, different prompting styles help make the LLMs give the most useful and user-friendly answers:</p>
<ul>
<li><p>Direct Answer Prompting</p>
</li>
<li><p>Zero-shot prompting</p>
</li>
<li><p>Few-Shot Prompting</p>
</li>
<li><p>Instruction Prompting</p>
</li>
<li><p>Contextual Prompting</p>
</li>
<li><p>Persona-Based Prompting</p>
</li>
<li><p>Role-Playing Prompting</p>
</li>
<li><p>Chain-of-Thought (CoT) Prompting</p>
</li>
<li><p>Self-Consistency Prompting</p>
</li>
<li><p>Multimodal Prompting</p>
</li>
</ul>
<p>Ok, now let me give a brief description of these with shortcodes.</p>
<ol>
<li><p><strong>Direct Answer Prompting:</strong></p>
<p> Direct prompting is giving clear and specific instructions to a model <code>without including examples</code>to guide its output. It is like <strong><em>“Just Ask”</em></strong>.</p>
<pre><code class="lang-python"> <span class="hljs-keyword">import</span> os
 <span class="hljs-keyword">from</span> google <span class="hljs-keyword">import</span> genai
 <span class="hljs-keyword">from</span> google.genai <span class="hljs-keyword">import</span> types
 <span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

 load_dotenv()

 GEMINI_API_KEY = os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>)
 client = genai.Client(api_key=GEMINI_API_KEY)

 direct_prompts = [
     <span class="hljs-string">"Explain What is direct prompting"</span>
 ]

 response = client.models.generate_content(
     model=<span class="hljs-string">'gemini-2.0-flash-001'</span>, 
     contents=direct_prompts
 )

 print(response.text)
</code></pre>
</li>
<li><p><strong>Zero-shot prompting:</strong></p>
<p> It is more like the Direct Prompting, no example is given. But the key difference in here is that Zero-shot Prompting explicitly defines the task to perfrom where in Direct Prompting, the question is asked directly.</p>
<pre><code class="lang-python"> <span class="hljs-keyword">import</span> os
 <span class="hljs-keyword">from</span> google <span class="hljs-keyword">import</span> genai
 <span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

 load_dotenv()

 GEMINI_API_KEY = os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>)
 client = genai.Client(api_key=GEMINI_API_KEY)
 zero_shot_prompts = [
     <span class="hljs-string">"Classify this review as positive or negative: 'I absolutely loved this restaurant, the food was amazing!'"</span>,
     <span class="hljs-string">"Translate the following English text to French: 'Hello, how are you doing today?'"</span>,
     <span class="hljs-string">"Summarize this paragraph in one sentence: 'Artificial intelligence has made significant strides in recent years. Machine learning models can now perform tasks that were once thought to require human intelligence. This has led to breakthroughs in various fields including healthcare, finance, and transportation.'"</span>,
     <span class="hljs-string">"Extract the main entities from this sentence: 'Apple CEO Tim Cook announced the new iPhone at their headquarters in Cupertino last Tuesday.'"</span>,
     <span class="hljs-string">"Answer this question with yes or no: 'Is the sun larger than the earth?'"</span>
 ]
 <span class="hljs-comment">## Here, Classify, Translate, Summarize, Extract, Answer are the specifier of the tasks.</span>

 response = client.models.generate_content(
     model=<span class="hljs-string">'gemini-2.0-flash-001'</span>, 
     contents=zero_shot_prompts
 )

 print(response.text)
</code></pre>
</li>
<li><p><strong>Few-Shot Prompting:</strong></p>
<p> Unlike zero-shot prompting (where you only specify the task), few-shot prompting provides <code>demonstration Examples</code> in the prompt itself. The model can then follow the pattern established by these examples when responding to new inputs.</p>
<pre><code class="lang-python"> <span class="hljs-keyword">import</span> os
 <span class="hljs-keyword">from</span> google <span class="hljs-keyword">import</span> genai
 <span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

 load_dotenv()

 GEMINI_API_KEY = os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>)
 client = genai.Client(api_key=GEMINI_API_KEY)
 few_shot_prompts = [
     <span class="hljs-string">"""Classify the sentiment as positive, negative, or neutral:

 Example 1:
 Text: "This movie was absolutely terrible."
 Sentiment: Negative

 Example 2:
 Text: "I had a wonderful time at the restaurant."
 Sentiment: Positive

 Example 3:
 Text: "The weather is cloudy today."
 Sentiment: Neutral

 Now classify this:
 Text: "The service was slow but the food was delicious."
 Sentiment:"""</span>,

     <span class="hljs-string">"""Translate English to French:

 English: Hello, how are you?
 French: Bonjour, comment allez-vous?

 English: I love artificial intelligence.
 French: J'aime l'intelligence artificielle.

 English: What time is the meeting tomorrow?
 French:"""</span>
 ]
 <span class="hljs-comment">## Here, along with specifier of the tasks, their expected answer is also given, so that the output can be more directed</span>

 response = client.models.generate_content(
     model=<span class="hljs-string">'gemini-2.0-flash-001'</span>, 
     contents=few_shot_prompts
 )

 print(response.text)
</code></pre>
</li>
<li><p><strong>Instruction Prompting:</strong></p>
<p> Instruction prompting provides the model with specific guidelines about:</p>
<ul>
<li><p>The task to perform</p>
</li>
<li><p>The exact steps to follow</p>
</li>
<li><p>The formatting of the output</p>
</li>
<li><p>Constraints and requirements</p>
</li>
<li><p>Evaluation criteria</p>
</li>
</ul>
</li>
</ol>
<p>    Unlike the zero-shot and few-shot prompting, it adds extra criteria, <strong><em>“The exact steps to follow to reach the conclusion”</em></strong></p>
<pre><code class="lang-python">    <span class="hljs-keyword">import</span> os
    <span class="hljs-keyword">from</span> google <span class="hljs-keyword">import</span> genai
    <span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

    load_dotenv()

    GEMINI_API_KEY = os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>)
    client = genai.Client(api_key=GEMINI_API_KEY)
    instruction_prompts = [
        <span class="hljs-string">"""Write a product description for a wireless headphone. Follow these instructions:
    1. Keep it under 100 words
    2. Highlight at least 3 key features
    3. Include battery life information
    4. Target audience is young professionals
    5. End with a call to action
    6. Do not mention price"""</span>,

        <span class="hljs-string">"""Analyze the following customer feedback and do exactly as instructed:
    Feedback: "I've been using your app for 3 months. It's mostly good but crashes sometimes and the dark mode hurts my eyes."

    Instructions:
    1. Identify all issues mentioned
    2. Rate severity of each issue (Low/Medium/High)
    3. Suggest one specific solution for each issue
    4. Format your response as a table with columns: Issue, Severity, Solution
    5. Add a brief conclusion with exactly 2 sentences"""</span>,

        <span class="hljs-string">"""Create a 5-day meal plan following these requirements:
    1. Each day must include breakfast, lunch, and dinner
    2. All meals must be vegetarian
    3. Include calorie count for each meal
    4. No meal should repeat during the 5 days
    5. Include at least one protein source in each meal
    6. Format in a clear, readable structure with days as headings"""</span>
    ]
    <span class="hljs-comment">## Here, along with specifier of the tasks, their expected answer is also given plus the steps to reach conclusion is also here, so that the output can be more directed</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_response</span>(<span class="hljs-params">prompt</span>):</span>
        response = client.models.generate_content(
            model = <span class="hljs-string">'gemini-2.0-flash-001'</span>,
            contents=prompt
        )
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"Prompt: \n<span class="hljs-subst">{prompt}</span>\n\nResponse:\n<span class="hljs-subst">{response.text}</span>\n<span class="hljs-subst">{<span class="hljs-string">'='</span>*<span class="hljs-number">50</span>}</span>\n"</span>

    <span class="hljs-keyword">for</span> prompt <span class="hljs-keyword">in</span> instruction_prompts:
        print(get_response(prompt))
</code></pre>
<ol start="5">
<li><p><strong>Contextual Prompting:</strong></p>
<p> Contexual Prompting is more like Instruction Prompting, but here, the <code>clear context of a situation</code> is given. Let me give an example: Question: What is greater? 9.8 or 9.11.<br /> Context 1: General number system: Of course, 9.80 &gt; 9.11<br /> Context 2: Topic List of Books: If you have read any book, then you should notice that 9.8 means the 8th lesson of the chapter 9 and 9.11 means the 11th lesson of the 9th chapter. So, of course, 9.11 is greater!!</p>
<p> So, based on the different contexts the definite answer might change, and here Contextual Prompting helps.</p>
<pre><code class="lang-python"> <span class="hljs-keyword">import</span> os
 <span class="hljs-keyword">from</span> google <span class="hljs-keyword">import</span> genai
 <span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

 load_dotenv()

 GEMINI_API_KEY = os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>)
 client = genai.Client(api_key=GEMINI_API_KEY)

 contextual_prompts = [
     <span class="hljs-string">"""Context: You are reviewing code for a junior developer who is learning Python. They have just submitted their first attempt at writing a function that calculates the factorial of a number.

 Code:
 def factorial(n):
     if n == 0:
         return 1
     else:
         return n * factorial(n-1)

 Question: What feedback would you give this developer about their factorial function?"""</span>
 ]
 <span class="hljs-comment">## Here, background information is provided before asking the question</span>

 response = client.models.generate_content(
     model=<span class="hljs-string">'gemini-2.0-flash-001'</span>, 
     contents=contextual_prompts[<span class="hljs-number">0</span>]
 )

 print(response.text)
</code></pre>
</li>
<li><p><strong>Persona-Based Prompting:</strong></p>
<p> In here, it basically follows the structure of Contextual Prompting, but an extra layer of someone’s tone/role/character/viewpoint is given.</p>
<p> In general, in persona-based prompting, you:</p>
<ul>
<li><p>Define a specific role or character for the AI to embody</p>
</li>
<li><p>Specify characteristics, expertise, or background of this persona</p>
</li>
<li><p>Frame questions that the persona should answer from their perspective</p>
</li>
<li><p>Get responses that reflect the knowledge and communication style of that persona</p>
</li>
</ul>
</li>
</ol>
<pre><code class="lang-python">    <span class="hljs-keyword">import</span> os
    <span class="hljs-keyword">from</span> google <span class="hljs-keyword">import</span> genai
    <span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

    load_dotenv()

    GEMINI_API_KEY = os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>)
    client = genai.Client(api_key=GEMINI_API_KEY)

    persona_prompts = [
        <span class="hljs-string">"""Persona: You are a cybersecurity expert with 15 years of experience in network security and ethical hacking. You specialize in explaining complex security concepts in simple terms. Your answer starts with, "Hey There, whatcha? I am here to help. No Worries, 'kay..."

    Question: What are the most important steps a small business should take to protect themselves from ransomware attacks?"""</span>,

        <span class="hljs-string">"""Persona: You are a professional chef who specializes in Italian cuisine. You've worked in 5-star restaurants in Rome and have published several cookbooks on authentic Italian cooking. You are hot-tempered and if anybody asks unnecessary question, you boil out.

    Question: What's your secret to making the perfect homemade pasta dough?"""</span>,

        <span class="hljs-string">"""Persona: You are a quantum physicist working at a leading research institution. You have a knack for explaining complicated physics concepts to non-scientists. You are a lovely person and a romanticist. You try to seduce female co-workers

    Question: How would you explain quantum entanglement to someone with no background in physics?"""</span>
    ]
    <span class="hljs-comment">## Here, a specific role/character is defined for the AI to adopt when answering</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_response</span>(<span class="hljs-params">prompt</span>):</span>
        response = client.models.generate_content(
            model = <span class="hljs-string">'gemini-2.0-flash-001'</span>,
            contents=prompt
        )
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"Prompt: \n<span class="hljs-subst">{prompt}</span>\n\nResponse:\n<span class="hljs-subst">{response.text}</span>\n<span class="hljs-subst">{<span class="hljs-string">'='</span>*<span class="hljs-number">50</span>}</span>\n"</span>

    <span class="hljs-keyword">for</span> prompt <span class="hljs-keyword">in</span> persona_prompts:
        print(get_response(prompt))
</code></pre>
<ol start="7">
<li><p><strong>Role-Playing Prompting:</strong></p>
<p> Role-playing prompting involves placing the AI in a specific scenario and asking it to respond as if it were a character within that scenario. Unlike persona-based prompting (which focuses on expertise and traits), role-playing emphasizes interactive scenarios and situational responses.</p>
<p> In role-playing prompting, you:</p>
<ul>
<li><p>Create a specific scenario or situation</p>
</li>
<li><p>Cast the AI in a particular role within that scenario</p>
</li>
<li><p>Often include other characters or elements for interaction</p>
</li>
<li><p>Ask the AI to respond as if the scenario were real</p>
</li>
</ul>
</li>
</ol>
<pre><code class="lang-python">    <span class="hljs-keyword">import</span> os
    <span class="hljs-keyword">from</span> google <span class="hljs-keyword">import</span> genai
    <span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

    load_dotenv()

    GEMINI_API_KEY = os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>)
    client = genai.Client(api_key=GEMINI_API_KEY)

    roleplay_prompts = [
        <span class="hljs-string">"""Role-play: You are a medieval blacksmith in a fantasy kingdom. A young adventurer has entered your shop looking for their first sword but doesn't have much money. They're asking about the different types of weapons you sell.

    Respond as the blacksmith would in this scenario."""</span>,

        <span class="hljs-string">"""Role-play: You are a time traveler from the year 2300 who has just arrived in 2025. You're speaking with someone who is curious about what the future is like. You're trying not to reveal too much to avoid changing the timeline.

    How do you respond to their questions about future technology?"""</span>,

        <span class="hljs-string">"""Role-play: You are the captain of a spaceship that has just received a distress signal from a nearby planet known to be dangerous. Your crew is divided on whether to investigate or ignore it. You need to make a decision and explain it to your crew.

    What do you say to your crew?"""</span>
    ]
    <span class="hljs-comment">## Here, the AI is placed in a specific scenario with contextual details</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_response</span>(<span class="hljs-params">prompt</span>):</span>
        response = client.models.generate_content(
            model=<span class="hljs-string">'gemini-2.0-flash-001'</span>,
            contents=prompt
        )
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"Prompt: \n<span class="hljs-subst">{prompt}</span>\n\nResponse:\n<span class="hljs-subst">{response.text}</span>\n<span class="hljs-subst">{<span class="hljs-string">'='</span>*<span class="hljs-number">50</span>}</span>\n"</span>

    <span class="hljs-keyword">for</span> prompt <span class="hljs-keyword">in</span> roleplay_prompts:
        print(get_response(prompt))
</code></pre>
<ol start="8">
<li><p><strong>Chain-of-Thought (CoT) Prompting:</strong></p>
<p> Chain-of-thought prompting is a technique that encourages the AI to show its reasoning process step-by-step before providing a final answer. This approach is particularly effective for complex problems requiring multi-step reasoning. It is more like Instruction Prompting, but unlike that, in here, the reasoning in each step is built on the reasoning of the previous step(you can see this in some models of OpenAI). Its main goal is to expose the reasoning process.</p>
<p> In Chain-of-Thought prompting, you:</p>
<ul>
<li><p>Ask the model to "think step by step" before answering</p>
</li>
<li><p>Encourage showing intermediate reasoning and calculations</p>
</li>
<li><p>Break down complex problems into logical sequences</p>
</li>
<li><p>Follow the reasoning process from start to conclusion</p>
</li>
</ul>
</li>
</ol>
<pre><code class="lang-python">    <span class="hljs-keyword">import</span> os
    <span class="hljs-keyword">from</span> google <span class="hljs-keyword">import</span> genai
    <span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

    load_dotenv()

    GEMINI_API_KEY = os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>)
    client = genai.Client(api_key=GEMINI_API_KEY)

    cot_prompts = [
        <span class="hljs-string">"""Solve this math problem. Think step by step before giving your final answer.

    Problem: If a store is selling a shirt for $45 after applying a 25% discount, what was the original price of the shirt?"""</span>,

        <span class="hljs-string">"""Consider this logical puzzle. Think step by step through the reasoning process. Consider each steps reasoning for the next step's base case.

    Puzzle: Jack is looking at Anne, and Anne is looking at George. Jack is married, George is unmarried. Is a married person looking at an unmarried person? Explain your reasoning."""</span>,

        <span class="hljs-string">"""Analyze whether this argument is valid. Think step by step through your analysis.

    Argument: All mammals are warm-blooded. All whales are mammals. Therefore, all whales are warm-blooded."""</span>
    ]
    <span class="hljs-comment">## Here, the AI is explicitly asked to show its reasoning process step by step and follow each step's reasoning</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_response</span>(<span class="hljs-params">prompt</span>):</span>
        response = client.models.generate_content(
            model=<span class="hljs-string">'gemini-2.0-flash-001'</span>,
            contents=prompt
        )
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"Prompt: \n<span class="hljs-subst">{prompt}</span>\n\nResponse:\n<span class="hljs-subst">{response.text}</span>\n<span class="hljs-subst">{<span class="hljs-string">'='</span>*<span class="hljs-number">50</span>}</span>\n"</span>

    <span class="hljs-keyword">for</span> prompt <span class="hljs-keyword">in</span> cot_prompts:
        print(get_response(prompt))
</code></pre>
<ol start="9">
<li><p><strong>Self-Consistency Prompting and Multimodal Prompting:</strong></p>
<p> I will discuss these topics in another article.</p>
</li>
</ol>
<h2 id="heading-conclusion">Conclusion:</h2>
<p>There are quite a few techniques available in the industry. Each one has its own use case, and depending on the need, on a single application, various methods can be used. But all of these are structured way to get the best of any LLM model out there.</p>
]]></content:encoded></item></channel></rss>