The correctness issue is the most tricky one I suspect, although I imagine quality of training data plays a huge role there. One of the problems with commercial western LLMs is that they just throw garbage they scrape off the internet at them indiscriminately. However, if you trained it specifically on quality medical data that’s been curated, that would be a very different story. The other thing you can do is tag the dataset with metadata references to the actual studies, or cases, so that when the answer is produced it can be matched against that.
Ultimately, this isn’t fundamentally different from what a human doctor does. They learn how to correlate symptoms with common ailments, and then use their experience to make a call on what the problem might be. Then the patient undergoes testing to confirm that’s the issue for more serious cases. So, I can see a similar process being done with LLMs where they can come up with the most likely explanation for the symptoms provided, and they might even be able to do a better job, since they work with vastly more data than a human can. And this can act as a way to focus further investigation. I would imagine you’d still want the human in the loop, but you could save a lot of time doing the initial assessment this way.
The correctness issue is the most tricky one I suspect, although I imagine quality of training data plays a huge role there. One of the problems with commercial western LLMs is that they just throw garbage they scrape off the internet at them indiscriminately. However, if you trained it specifically on quality medical data that’s been curated, that would be a very different story. The other thing you can do is tag the dataset with metadata references to the actual studies, or cases, so that when the answer is produced it can be matched against that.
Ultimately, this isn’t fundamentally different from what a human doctor does. They learn how to correlate symptoms with common ailments, and then use their experience to make a call on what the problem might be. Then the patient undergoes testing to confirm that’s the issue for more serious cases. So, I can see a similar process being done with LLMs where they can come up with the most likely explanation for the symptoms provided, and they might even be able to do a better job, since they work with vastly more data than a human can. And this can act as a way to focus further investigation. I would imagine you’d still want the human in the loop, but you could save a lot of time doing the initial assessment this way.