Skip to main content

Command Palette

Search for a command to run...

Query Decomposition

Complex queries made simple

Updated
6 min read
Query Decomposition

Previous context:

What we saw earlier

Remember, we did implement “Parallel Query Retrieval” and “Reciprocal Rank Fusion (RRF)”. There, we asked a question, “What is fs?” and the LLM generated some similar questions.

Now, in this article, let us ask this question: “ What is React? “ and run this into the Parallel Query Retrieval system, what similar questions will we get?

  1. What is React.js?

  2. What is the React framework?

  3. React JavaScript library explained

  4. Introduction to React

Like these, right?

Let’s test something:

Nice, now let us test with another question: “ What are the advantages and disadvantages of React compared to Vue.js for building large-scale applications? “ I got these similar queries from my system:

  1. Compare React and Vue.js for large-scale projects

  2. What are the pros and cons of using React for building large applications?

  3. Is React or Vue.js better for developing complex web applications?

  4. What are the benefits of using Vue.js over React in large-scale projects?

Look closely, all the queries include React.js and Vue.js together; they never separate them so that the LLM can retrieve individual knowledge about them. This is ok if my supplied document in the RAG has them together and directly answers the question. But,

"What if the supplied document does not directly answer the question and has both React.js and Vue.js in separate places? ”

🛑
Yeah, new problem marked. How would you solve this?

Interesting, right?

So, how can we get rid of this thing? Simple, divide the complex query into a simpler form. In a word, “DECOMPOSE THEM” !!!

What is Query Decomposition?

Let’s try something:

So, our previous “complex” query was this: “ What are the advantages and disadvantages of React compared to Vue.js for building large-scale applications? “ Can we break this into these queries?

  1. What are React advantages for large applications?

  2. What are React disadvantages for large applications?

  3. What are Vue.js advantages for large applications?

  4. What are Vue.js disadvantages for large applications?

Now, we have got React.js and Vue.js differently. So, even if they are in different places in the given context, we can apply vector search on them and get results. This is like breaking a query into a less abstract one.

Can you optimize this more? I mean, what if the given context does not have the exact query words in it?
💡
Of course, just generate some parallel queries of the decomposed queries !!!

The main workflow kind of looks like this:

Yeah, this is “Query Decomposition” !!!

Definition:

Query Decomposition is a technique where you break down a complex, multi-faceted user question into smaller, more focused sub-questions before performing retrieval in a RAG system.

Some diagrams:

Look at the following diagram for better understanding.

These are the intuitions and main mechanisms of “Query Decomposition”.

Why Query Decomposition:

These fields mainly force a query to decompose:

  1. Vector Search Limitations:

    When multiple distinct concepts are asked in a single query, vector embeddings struggle to search through the document and establish connections.

  2. Improved Retrieval Coverage:

    A single query might miss some concepts, whereas generating fragmented queries retrieves more subject-specific data, which finally generates more accurate results.

  3. Reduced Semantic Confusion:

    Complex queries sometimes include multiple concepts that might be semantically close but differ in their original concepts. Decomposing queries reduces this confusion and generates unambiguous, clear solutions.

  4. Better Document Relevance:

    Dividing the query into smaller sub-queries helps retrieve data from subject-specific fields, which keeps relevance with the context/document.

💡
Isn’t this like asking different specialist in their specialized fields to answer different questions?

Let’s code:

Flow:

  1. Give a nice system prompt.

  2. Take the user query

  3. Generate Decomposed Query

  4. Run Parallel Query Retrieval on them

  5. Search on the Vector Store using the queries and Retrieve the response/docs

  6. Use the Original Query and the Retrieved context to get the results

Code:

🔴
I am using a Docker container for the vector store, so you also need to implement that.

1 to 3. Decompose Query Function

def decomposeQuery(self, query, number_of_queries = 3):
        print("Decomposing Query 🧠")
        try:
              # Nice prompt
            system_prompt = f"""
                You are a helpful AI assistant who decomposes the given complex {query} into simpler queries using its keywords at the given number = {number_of_queries}.

                METHOD:
                1. Firstly, analyze the complex query and extract its keywords and split the distinct keywords.
                2. Secondly, make new queries using the keywords. ALWAYS remember to keep one distinct topic in a single query.
                3. Thirdly, Then return the queries in the given format.
                4. ALWAYS remember that each should only take one line.
                5. TRY to make as straight-forward as possible. Each query should consist the gist of the original query.

                EXAMPLE:
                "original": "What are the advantages and disadvantages of React compared to Vue.js for building large-scale applications?"
                "generated":
                    1. What are React advantages for large applications?
                    2. What are React disadvantages for large applications?
                    3. What are Vue.js advantages for large applications?
                    4. What are Vue.js disadvantages for large applications?

                RETURN FORMAT
                You only need to return the queries in this json format:
                {{
                    "original": "{query}",
                    "generated": [
                        "generated_1",
                        "generated_2",
                        "generated_3"
                    ]
                }}
                ONLY return in the given format.
            """

            response = self.model.generate_content(system_prompt)

            if not response or not response.text:
                print("No response from model")
                return None

            filtered_response = filter_response(response)

            try:
                parsed_response = json.loads(filtered_response)
                return parsed_response
            except json.JSONDecodeError as e:
                print(f"JSON parsing error: {e}")
                return None

        except Exception as e:
            print(f"Query Decomposition failed: {e}")
            return None
  1. Run a Parallel Query on the decomposed queries:

     # Calling this multiple times to get the paralled queries:
     def hybridQuery(self, query, number_of_queries):
             print("Hybrid Query Initiating 🤓")
             try:
                 response = self.generateParallelQuery(query=query, number_of_queries=number_of_queries)
    
                 if not response:
                     print("Hybridization failed")
                     return None
    
                 return response
             except Exception as e:
                 print("Hybrid Query Generation Failed")
                 return None
    
     # Parallel Query Generation Function:
     def generateParallelQuery(self, query, number_of_queries = 3):
             print("Generating Parallel Query 🤔")
             try:
                 system_prompt = f"""
                     You are a helpful AI assistant who generates {number_of_queries} queries with similar topics of the given query={query}.
    
                     METHOD:
                     1. You get a query, analyze it and find the keywords in that.
                     2. You generate similar words based on the keywords. Extract the keywords from the whole {query} and then decide what to make.
                     3. You make similar query like {query} using the newly generated keywords
                     4. The generated queries will not exceed one line.
                     5. Keep them as straigt-forward as possible
    
                     EXAMPLE:
                     original: "What is fs in Node.js?"
                     generated:
                         1. "What is file system?"
                         2. "What are files in Node.js?"
                         3. "How to make files in Node.js?"
    
                     RETURN FORMAT
                     You only need to return the queries in this json format:
                     {{
                         "original": "{query}",
                         "generated": [
                             "generated_1",
                             "generated_2",
                             "generated_3"
                         ]
                     }}
    
                     Return ONLY valid JSON, no additional text.
                 """
    
                 response = self.model.generate_content(
                     system_prompt
                 )
    
                 if not response or not response.text:
                     print("No response from model")
                     return None
    
                 filtered_response = filter_response(response)
    
                 try:
                     parsed_response = json.loads(filtered_response)
                     return parsed_response
                 except json.JSONDecodeError as e:
                     print(f"JSON parsing error: {e}")
                     return None
    
             except Exception as e:
                 print(f"Problem occured while generating the response: {e}")
                 return None
    

5 and 6. See the full code.

Full Code:

See the full Code here

Conclusion

“Query Decomposition” is another optimization method to handle more complex queries and save the LLM from redundancy. For simpler queries, Parallel Query Retrieval was good enough. But Query decomposition enables specialization on the retrieval process.

Can you tweak the Query Decomposition function code so that it uses different specialized characters (Doctors, Engineers, etc.) to get specialized answers?
💡
Hint: Agents

Tour with GenAI

Part 5 of 12

This series explores how LLMs like ChatGPT go beyond chat, diving into automation, from sending requests to getting intelligent responses. Learn how real-world LLM-powered systems are built behind the scenes.

Up next

Reciprocate Rank Fusion (RRF)

Another optimization technique - Ranking