Query Decomposition
Complex queries made simple

Previous context:
What we saw earlier
Remember, we did implement “Parallel Query Retrieval” and “Reciprocal Rank Fusion (RRF)”. There, we asked a question, “What is fs?” and the LLM generated some similar questions.
Now, in this article, let us ask this question: “ What is React? “ and run this into the Parallel Query Retrieval system, what similar questions will we get?
What is React.js?
What is the React framework?
React JavaScript library explained
Introduction to React
Like these, right?
Let’s test something:
Nice, now let us test with another question: “ What are the advantages and disadvantages of React compared to Vue.js for building large-scale applications? “ I got these similar queries from my system:
Compare React and Vue.js for large-scale projects
What are the pros and cons of using React for building large applications?
Is React or Vue.js better for developing complex web applications?
What are the benefits of using Vue.js over React in large-scale projects?
Look closely, all the queries include React.js and Vue.js together; they never separate them so that the LLM can retrieve individual knowledge about them. This is ok if my supplied document in the RAG has them together and directly answers the question. But,
"What if the supplied document does not directly answer the question and has both React.js and Vue.js in separate places? ”
Interesting, right?
So, how can we get rid of this thing? Simple, divide the complex query into a simpler form. In a word, “DECOMPOSE THEM” !!!
What is Query Decomposition?
Let’s try something:
So, our previous “complex” query was this: “ What are the advantages and disadvantages of React compared to Vue.js for building large-scale applications? “ Can we break this into these queries?
What are React advantages for large applications?
What are React disadvantages for large applications?
What are Vue.js advantages for large applications?
What are Vue.js disadvantages for large applications?
Now, we have got React.js and Vue.js differently. So, even if they are in different places in the given context, we can apply vector search on them and get results. This is like breaking a query into a less abstract one.
The main workflow kind of looks like this:

Yeah, this is “Query Decomposition” !!!
Definition:
Query Decomposition is a technique where you break down a complex, multi-faceted user question into smaller, more focused sub-questions before performing retrieval in a RAG system.
Some diagrams:
Look at the following diagram for better understanding.


These are the intuitions and main mechanisms of “Query Decomposition”.
Why Query Decomposition:
These fields mainly force a query to decompose:
Vector Search Limitations:
When multiple distinct concepts are asked in a single query, vector embeddings struggle to search through the document and establish connections.
Improved Retrieval Coverage:
A single query might miss some concepts, whereas generating fragmented queries retrieves more subject-specific data, which finally generates more accurate results.
Reduced Semantic Confusion:
Complex queries sometimes include multiple concepts that might be semantically close but differ in their original concepts. Decomposing queries reduces this confusion and generates unambiguous, clear solutions.
Better Document Relevance:
Dividing the query into smaller sub-queries helps retrieve data from subject-specific fields, which keeps relevance with the context/document.
Let’s code:
Flow:
Give a nice system prompt.
Take the user query
Generate Decomposed Query
Run Parallel Query Retrieval on them
Search on the Vector Store using the queries and Retrieve the response/docs
Use the Original Query and the Retrieved context to get the results
Code:
1 to 3. Decompose Query Function
def decomposeQuery(self, query, number_of_queries = 3):
print("Decomposing Query 🧠")
try:
# Nice prompt
system_prompt = f"""
You are a helpful AI assistant who decomposes the given complex {query} into simpler queries using its keywords at the given number = {number_of_queries}.
METHOD:
1. Firstly, analyze the complex query and extract its keywords and split the distinct keywords.
2. Secondly, make new queries using the keywords. ALWAYS remember to keep one distinct topic in a single query.
3. Thirdly, Then return the queries in the given format.
4. ALWAYS remember that each should only take one line.
5. TRY to make as straight-forward as possible. Each query should consist the gist of the original query.
EXAMPLE:
"original": "What are the advantages and disadvantages of React compared to Vue.js for building large-scale applications?"
"generated":
1. What are React advantages for large applications?
2. What are React disadvantages for large applications?
3. What are Vue.js advantages for large applications?
4. What are Vue.js disadvantages for large applications?
RETURN FORMAT
You only need to return the queries in this json format:
{{
"original": "{query}",
"generated": [
"generated_1",
"generated_2",
"generated_3"
]
}}
ONLY return in the given format.
"""
response = self.model.generate_content(system_prompt)
if not response or not response.text:
print("No response from model")
return None
filtered_response = filter_response(response)
try:
parsed_response = json.loads(filtered_response)
return parsed_response
except json.JSONDecodeError as e:
print(f"JSON parsing error: {e}")
return None
except Exception as e:
print(f"Query Decomposition failed: {e}")
return None
Run a Parallel Query on the decomposed queries:
# Calling this multiple times to get the paralled queries: def hybridQuery(self, query, number_of_queries): print("Hybrid Query Initiating 🤓") try: response = self.generateParallelQuery(query=query, number_of_queries=number_of_queries) if not response: print("Hybridization failed") return None return response except Exception as e: print("Hybrid Query Generation Failed") return None # Parallel Query Generation Function: def generateParallelQuery(self, query, number_of_queries = 3): print("Generating Parallel Query 🤔") try: system_prompt = f""" You are a helpful AI assistant who generates {number_of_queries} queries with similar topics of the given query={query}. METHOD: 1. You get a query, analyze it and find the keywords in that. 2. You generate similar words based on the keywords. Extract the keywords from the whole {query} and then decide what to make. 3. You make similar query like {query} using the newly generated keywords 4. The generated queries will not exceed one line. 5. Keep them as straigt-forward as possible EXAMPLE: original: "What is fs in Node.js?" generated: 1. "What is file system?" 2. "What are files in Node.js?" 3. "How to make files in Node.js?" RETURN FORMAT You only need to return the queries in this json format: {{ "original": "{query}", "generated": [ "generated_1", "generated_2", "generated_3" ] }} Return ONLY valid JSON, no additional text. """ response = self.model.generate_content( system_prompt ) if not response or not response.text: print("No response from model") return None filtered_response = filter_response(response) try: parsed_response = json.loads(filtered_response) return parsed_response except json.JSONDecodeError as e: print(f"JSON parsing error: {e}") return None except Exception as e: print(f"Problem occured while generating the response: {e}") return None
5 and 6. See the full code.
Full Code:
Conclusion
“Query Decomposition” is another optimization method to handle more complex queries and save the LLM from redundancy. For simpler queries, Parallel Query Retrieval was good enough. But Query decomposition enables specialization on the retrieval process.



