Why do we care about predicting results? Isn’t the point of studies to determine actual results?
There’s a few ways they say it may help, this one seems the main one.
We foresee a future in which LLMs serve as forward-looking generative models of the scientific literature. LLMs can be part of larger systems that assist researchers in determining the best experiment to conduct next. One key step towards achieving this vision is demonstrating that LLMs can identify likely results. For this reason, BrainBench involved a binary choice between two possible results. LLMs excelled at this task, which brings us closer to systems that are practically useful. In the future, rather than simply selecting the most likely result for a study, LLMs can generate a set of possible results and judge how likely each is. Scientists may interactively use these future systems to guide the design of their experiments.
I think this isn’t really about predicting something. That’s just a means to benchmark AI. You can either ask it questions to probe knowledge. Or test if it can look forward, reason and jump to some conclusions. In other words predict something. They tried how well it performed at that. Not because these predictions itself are useful. But because you can use them to measure the AI’s capabilities at similar tasks.