When AI is tested on questions it can't model from pre-existing answers on the internet, it only scores 10% in the test.

Lugh · 10 months ago

When AI is tested on questions it can't model from pre-existing answers on the internet, it only scores 10% in the test.

Lugh · 10 months ago

Some people are naively amazed at AI scoring 99% in bar and medical exams, when all it is doing is reproducing correct answers from internet discussions on the exam questions. A new AI benchmark called “Humanity’s Last Exam” has stumped top models. It will take independent reasoning to get 100% on this test, when that day comes does it mean AGI will be here?

Carrolade@lemmy.world · 10 months ago

Finer point, but it’s not measuring independent reasoning, afaik they’re still fully incapable of that. This test is measuring esoteric knowledge, like hummingbird anatomy and the ability to translate ancient Palmyran writing.

Current LLMs should eventually be able to ace this sort of test, as their databases grow. They could still be incapable of independent reasoning, though.

A test for independent reasoning could be something like giving it all the evidence for a never-before-discussed criminal case and asking if the accused is innocent or guilty based off the evidence. This would require a large amount of context and understanding of human societies, and the ability to infer from that what the evidence represents. Would it understand that a sound alibi means the accused is likely innocent, would it actually understand the simple concept that a physical person cannot be in two different places simultaneously, unlike how a quantum particle can seem to be? A person understands this very intuitively, but an LLM does not yet comprehend what “location” even is, even if it can provide a perfect definition of the term from a dictionary and talk about it by repeating others’ conversations.

Anyways, still an interesting project.

NuraShiny [any]@hexbear.net · 10 months ago

No, because this test will now be discussed and invalidated for that purpose.

Lugh · 10 months ago

They say the answer to this issue is they’ve released public question samples, but the real questions are kept private.

https://agi.safe.ai/

Not_mikey@slrpnk.net · 10 months ago

It will take independent reasoning to get 100% on this test

And an entire university staff. They went around and asked a bunch of PHDs what’s the hardest question you can think of. I like to think I have independent reasoning and I doubt I could answer one question correct on this exam, much less 10%.

This doesn’t prove ai doesn’t have independent reasoning it just proves it doesn’t have the obscure knowledge needed to reason about the questions.

Do you think the bar does not require independent reasoning? Granted I’ve never taken it but most high level standardized tests require a lot of reasoning. If you had a completely open book / internet access and took the SAT / ACT without any ability to reason you’d still fail horribly for the science and math sections.

When AI is tested on questions it can't model from pre-existing answers on the internet, it only scores 10% in the test.

When AI is tested on questions it can't model from pre-existing answers on the internet, it only scores 10% in the test.

Researchers just stumped AI with their most difficult test — but for how long?