Forward-Backward Reasoning in Large Language Models for Mathematical Verification

Weisen Jiang^1,2, Han Shi³, Longhui Yu⁴, Zhengying Liu³, Yu Zhang^1s, Zhenguo Li³, James T. Kwok²,

¹Southern University of Science and Technology, ²Hong Kong University of Science and Technology
³Huawei Noah’s Ark Lab, ⁴Peking University

Paper Code

Abstract

Self-Consistency samples diverse reasoning chains with answers and chooses the final answer by majority voting. It is based on forward reasoning and cannot further improve performance by sampling more reasoning chains when saturated. To further boost performance, we introduce backward reasoning to verify candidate answers. Specifically, for mathematical tasks, we mask a number in the question and ask the LLM to answer a backward question created by a simple template, i.e., to predict the masked number when a candidate answer is provided. Instead of using forward or backward reasoning alone, we propose FOBAR to combine FOrward and BAckward Reasoning for verification. Extensive experiments on six standard mathematical data sets and three LLMs show that FOBAR achieves state-of-the-art performance. In particular, FOBAR outperforms Self-Consistency, which uses forward reasoning alone, demonstrating that combining forward and backward reasoning is more accurate in verification. In addition, FOBAR achieves higher accuracy than existing verification methods, showing the effectiveness of the simple template used in backward reasoning and the proposed combination.

Contributions

Introduce Backward Reasoning to Mathematical Verification

A simple template is proposed to create backward questions: We mask a number in the question, and append the template with a candidate answer
We design a CoT prompt for the LLM to predict the masked number
We estimate the probability of the candidate answer based on the number of correct chains in the backward direction

Propose FOBAR to Combine Forward and Backward Reasoning for Verifying Candidate Answers

Combine forward and backward probabilities of candidate answers in a soft manner $$\mathbb{P} (\hat{A}_c) \propto (\mathbb{P} (\hat{A}_c; \text{forward}))^{\alpha} (\mathbb{P} (\hat{A}_c; \text{backward}))^{1-\alpha} $$
When $\alpha=1$, it recovers the SOTA Self-Consistency
When $\alpha=0$, it recovers the Backward Reasoning for Verification
The combination is flexible, other methods like arithmetic mean can be used

Extensive Experiments on Various Mathematical Data Sets

FOBAR achieves SOTA performance
FOBAR outperforms Self-Consistency, demonstrating that combining forward and backward reasoning together is better
FOBAR achieves higher accuracy than Self-Verification, confirming that using the simple template in backward reasoning and the proposed combination is more effective

Performance of Self-Consistency Saturates Quickly When Increasing #Paths $M_F$

Template for Generating Questions in Backward Reasoning

Question

Jim spends 2 hours watching TV and then decides to go to bed and reads for half as long. He does this 3 times a week. How many hours does he spend on TV and reading in 4 weeks? (ground-truth answer: 36)
(candidate answers: 36, 12)

Template

If we know the answer to the above question is $\hat{A}_c$, what is the value of unknown variable ${\bf x}$?

Questions for Backward Reasoning (candidate answer: 36)

Jim spends ${\bf x}$ hours watching TV and then decides to go to bed and reads for half as long. He does this 3 times a week. How many hours does he spend on TV and reading in 4 weeks? If we know the answer to the above question is 36, what is the value of unknown variable ${\bf x}$?

Jim spends 2 hours watching TV and then decides to go to bed and reads for half as long. He does this ${\bf x}$ times a week. How many hours does he spend on TV and reading in 4 weeks? If we know the answer to the above question is 36, what is the value of unknown variable ${\bf x}$?

Jim spends 2 hours watching TV and then decides to go to bed and reads for half as long. He does this 3 times a week. How many hours does he spend on TV and reading in ${\bf x}$ weeks? If we know the answer to the above question is 36, what is the value of unknown variable ${\bf x}$?

Questions for Backward Reasoning (candidate answer: 12)

Jim spends ${\bf x}$ hours watching TV and then decides to go to bed and reads for half as long. He does this 3 times a week. How many hours does he spend on TV and reading in 4 weeks? If we know the answer to the above question is 12, what is the value of unknown variable ${\bf x}$?

Jim spends 2 hours watching TV and then decides to go to bed and reads for half as long. He does this ${\bf x}$ times a week. How many hours does he spend on TV and reading in 4 weeks? If we know the answer to the above question is 12, what is the value of unknown variable ${\bf x}$?

Jim spends 2 hours watching TV and then decides to go to bed and reads for half as long. He does this 3 times a week. How many hours does he spend on TV and reading in ${\bf x}$ weeks? If we know the answer to the above question is 12, what is the value of unknown variable ${\bf x}$?

Experiments

Main Results

Usefulness of Forward and Backward Reasoning

Ablation Study

Variants of $\alpha$.

Geometric mean versus arithmetic mean.

Variants of $M_F$.

Variants of $M_B$.

BibTeX


@InProceedings{jiang2024fobar,
  title={Forward-Backward Reasoning in Large Language Models for Mathematical Verification},
  author={Jiang, Weisen and Shi, Han and Yu, Longhui and Liu, Zhengying and Zhang, Yu and Li, Zhenguo and Kwok, James T},
  booktitle={Findings of Annual Meeting of the Association for Computational Linguistics},
  year={2024}
}