Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models
Authors: Yongjiang Liu, Haoxi Li, Xiaosong Ma, Jie Zhang, Song Guo
Source: arXiv 2507.02663
Published: 2025-07-03
Added: 2026-03-09 05:44 UTC
Open Paper
Back to Library
Abstract / Extracted Text
Recent Large Reasoning Models (LRMs) excel at complex reasoning tasks but often suffer from overthinking, generating overly long and redundant reasoning trajectories. To explore its essence, our empirical analysis reveals that LRMs are primarily limited to recognizing task properties (i.e., difficulty levels) like humans before solving the problem, leading to a one-size-fits-all reasoning process. Inspired by this, a pressing and natural question emerges: Can we explicitly bootstrap such ability to alleviate overthinking in LRMs? In this paper, we propose Think-How-to-Think (TH2T), a novel two-stage fine-tuning strategy that progressively inspires LRMs' difficulty cognition and redundancy cognition of LRMs. Specifically, we first inject difficulty hypnosis into output prefixes to guide the model toward adaptive reasoning depth, trained on a hybrid dataset mixing short and long reasoning paths. Then, we incorporate redundancy hypnosis, which supervises the intermediate reasoning steps to identify and eliminate unnecessary reasoning patterns. Experiments on 7B/14B/32B models demonstrate that TH2T significantly reduces inference costs by over 70% on easy tasks and 40% on hard tasks while maintaining performance stability. The resulting outputs exhibit clear signs of difficulty-aware capabilities and reduced redundancy (e.g., reflection and looping).
Latest Summary
Key Findings
- Overthinking in LRMs: Large Reasoning Models (LRMs) tend to "overthink," frequently producing needlessly long and redundant reasoning chains, especially failing to calibrate their depth of reasoning to match task difficulty.
- Cognitive Limitation: Empirical analysis shows that LRMs lack intrinsic ability to autonomously recognize problem difficulty, resulting in a monolithic, one-size-fits-all reasoning approach.
- Dual-Process Inspiration: The solution is motivated by human "dual-process" cognition theory: fast/intuitive (System 1) vs. slow/analytical (System 2), and the need for appropriate allocation based on initial difficulty assessment.
- TH2T Framework: The Think-How-to-Think (TH2T) methodology is introduced as a two-stage fine-tuning process to progressively inspire both difficulty and redundancy cognition in LRMs using self-hypnosis signals.
- Stage 1: Injects "Difficulty Hypnosis" into outputs to stimulate difficulty assessment and concise reasoning on easier tasks.
- Stage 2: Adds "Redundancy Hypnosis" during reasoning steps to suppress unnecessary reflective/looping structures, reducing detours.
- Performance Improvements: TH2T reduces inference cost (token length) by over 70% on easy tasks and about 40% on hard tasks, with negligible or positive impact on accuracy.
- Difficulty-Awareness: Post-TH2T models display accurate (>90%) autonomous difficulty calibration, leading to tailored reasoning depth and high internal decision confidence.
- Redundancy Reduction: TH2T dramatically lowers the frequency of redundant reflections and looping, especially in easy and moderately hard tasks.
- Generalization: Efficiency gains and robust performance with TH2T persist across different LRM backbones and are validated on diverse mathematical and QA benchmarks.
- Ablation Studies: Both difficulty and redundancy hypnosis are crucial—Difficulty Hypnosis mostly aids concise easy-task reasoning, while Redundancy Hypnosis further improves handling of hard tasks.
Practical Takeaways
- Promote Metacognitive Skills in LRMs: Explicitly train models to appraise task difficulty and adapt reasoning depth, rather than applying universal response strategies.
- Leverage Output-Level Interventions: Use targeted output signals ("self-hypnosis" cues) during training to shape not only how models answer, but also how they internally regulate their reasoning process.
- Stage-Wise Fine-Tuning: Employ a two-stage approach—first, inject high-level global planning (difficulty awareness), then enforce fine-grained local correction (redundancy awareness).
- Reduce Computational Waste: Significant reductions in tokens and latency can be achieved without sacrificing task accuracy by teaching LRMs to avoid over-elaboration, especially on simple tasks.
- Design for Flexibility and Robustness: Built-in output cues are more reliable and robust than prompt-based controls, improving instruction following and consistency.
- Use Internal Metrics for Validation: First-token confidence and structural response analysis can serve as intrinsic signals validating metacognitive improvements.
- Apply to Diverse Tasks and Models: The TH2T strategy generalizes well across model sizes, domains, and both in-distribution and out-of-distribution benchmarks.
- Continuous Difficulty Modelling: While discrete difficulty levels work well for interpretability, future work should explore more granular and model-adaptive difficulty estimation.
- Combine with Other Efficiency Techniques: TH2T can complement reinforcement learning or other model compression/fine-tuning strategies for broader efficiency and quality gains in LLM deployment.
Community Reviews
No reviews yet.