Google AI developed a deep learning language model called Minerva, which could solve mathematical quantitative problems using step-by-step reasoning.

In the recently published paper related to Minerva, researchers explained the development of this deep learning model. They achieved a state-of-the-art solution by training a deep learning model on a large training dataset containing quantitative reasoning with symbolic expressions. The final model, Minerva, could solve quantitative mathematical problems on STEM reasoning problems.

Minerva analyzes the question using natural language processing and mathematical notation processing techniques. It is reminiscent of the relevant formulas, constants, and step-by-step solutions that involve numerical calculation. It generates solutions that include symbolic manipulation and numerical calculation without relying on a calculator to get the final answers. By generating different answers to the problem with different assigned probabilities, Minerva used majority voting to select the final answer. The following image shows an example of Minerva’s output for a quantitative mathematical problem.

**Test Minerva’s answer to a mathematical problem**

Minerva was built on Pathway’s Language Model (PaLM, 540 billion parameter, tightly activated, transformer language model) with more mathematical data sets like arXiv, text containing LaTeX and MathJax or other mathematical formats. To train the model on symbolic data, symbolic mathematical notations are preserved in the training data set. This process is shown in the following diagram.

**Symbolic mathematical expressions are preserved for the training of Minerva**

To benchmark Minerva’s performance, STEM benchmarks, ranging from primary school to master’s level, were used. Researchers used data sets such as MATH (High school math competition level problems), MMLU-STEM (massive multitask language comprehension benchmark focused on STEM covering topics such as engineering, chemistry, mathematics and physics at high school and university level) and GSM8k (grade) school mathematics problems , which involves basic arithmetic operations that can be solved by a talented middle school student). It shows significant performance on MATH and MMLU-STEM, as shown in the following graphs:

**Minerva’s performance**

One of the important limitations of Minerva is that the model’s response could not be evaluated automatically. As stated in the blog post:

Our approach to quantitative reasoning is not grounded in formal mathematics. Minerva analyzes questions and generates answers using a mixture of natural language and mathematical LaTeX expressions without any explicit underlying mathematical structure. This approach has an important limitation, as the model’s response cannot be verified automatically. Even when the final answer is known and can be verified, the model can arrive at a correct final answer using incorrect justification steps that cannot be automatically detected. This limitation is not present in formal methods of proving theorem (see, e.g., Coq, Isabelle, HOL, Lean, Metamath, and Mizar).

To evangelize NLP models for quantitative reasoning, Google AI shared an interactive sample explorer so the public could explore Minerva’s possibilities.

Using natural language processing and deep learning in mathematical reasoning is a challenging area of research. There are other articles with the source codes in this area, such as the graph for tree learning, goal-directed tree-structured neural model for math word problems. Paper with code also has some other source code papers on this domain for further reading.