It’s only open source if the training data is and it probably isn’t, is it?
I don’t know, though DeepSeek talk of theirs being “fully” open-source.
Part of the advantage of doing this (apart from helping bleed your rivals dry) is to get the benefit of others working on your model. So it makes sense to maximise openness and access.
There’s a few ways they say it may help, this one seems the main one.
We foresee a future in which LLMs serve as forward-looking generative models of the scientific literature. LLMs can be part of larger systems that assist researchers in determining the best experiment to conduct next. One key step towards achieving this vision is demonstrating that LLMs can identify likely results. For this reason, BrainBench involved a binary choice between two possible results. LLMs excelled at this task, which brings us closer to systems that are practically useful. In the future, rather than simply selecting the most likely result for a study, LLMs can generate a set of possible results and judge how likely each is. Scientists may interactively use these future systems to guide the design of their experiments.