The Correct Way to Use Chain-of-Thought Prompting: Avoiding Common Pitfalls
Published:
I recently attended an AI in Finance conference and was surprised to discover that many researchers are using chain-of-thought (CoT) prompting incorrectly. This powerful technique can significantly improve reasoning in LLMs — but only when implemented properly.
Key Takeaways
- Zero-shot CoT requires two separate prompting rounds, not one.
- Combining reasoning and the final answer in a single prompt introduces answer bleeding.
- CoT is for structured reasoning; explainable prompting is for human-readable justification — know when to use which.
What is Zero-Shot Chain-of-Thought?
Zero-shot CoT involves two distinct rounds of prompting without using any task-specific examples:
- Round 1 — Prompt the model to generate step-by-step reasoning.
- Round 2 — Explicitly ask for the final answer based on that reasoning.
This differs from few-shot CoT, which includes labeled examples.
Why two rounds? Separating reasoning from the final answer prevents the model from "anchoring" on a premature conclusion and then rationalizing backwards.
Example Question
Consider the question: “A company just announced a 20% dividend increase while simultaneously reporting declining revenues. Is this news good or bad?”
Incorrect: Single-Stage Approach
Common Mistake Many researchers combine reasoning and answer extraction into a single prompt. This allows the model to peek at its own conclusion while still "reasoning."
WRONG
# Single prompt — reasoning and answer are entangled
response = llm.generate(
prompt="Let's think step by step: A company just announced..."
)
# Output includes both reasoning AND final answer in one response
Correct: Two-Stage Approach
CORRECT
# STEP 1: Trigger reasoning only
reasoning_prompt = (
"Q: A company announced a 20% dividend increase "
"but declining revenues... "
"A: Let's think step by step."
)
intermediate_response = llm.generate(reasoning_prompt)
# STEP 2: Extract final answer from the reasoning
answer_prompt = f"""
Based on this analysis: '{intermediate_response}'
Is the news good or bad? Answer ONLY 'good' or 'bad'."""
final_answer = llm.generate(answer_prompt)
Why This Matters
| Problem | Single-Stage | Two-Stage (Correct) |
|---|---|---|
| Answer bleeding | Model sees its conclusion while reasoning | Reasoning is isolated from the answer |
| Transparency | Tangled output, hard to audit | Clean separation of logic and decision |
| Hallucination risk | Higher — model may fabricate justifications | Lower — reasoning is evaluated independently |
Explainable Prompting vs. Chain-of-Thought
Although both aim to improve interpretability, they serve different purposes:
| Feature | Explainable Prompting | Chain-of-Thought |
|---|---|---|
| Goal | Human-readable justification | Structured multi-step reasoning |
| Output | Single response with embedded rationale | Two-step: reasoning → answer |
| Best for | Summaries, end-user reports | Complex logic, quantitative analysis |
| Prompt style | "Explain why…" | "Let's think step by step" |
Rule of thumb Use explainable prompting when the audience needs to understand the conclusion. Use CoT when correctness and traceability matter more than readability.
Implementation Checklist
- Never include “so the answer is…” in the initial reasoning prompt.
- Always split into two separate prompts/responses.
- Validate intermediate reasoning before extracting the final answer.
References
- Wei, J. et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS 2022.
- Kojima, T. et al. (2022). Large Language Models are Zero-Shot Reasoners. NeurIPS 2022.
