• LughOPMA
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    2
    ·
    edit-2
    3 days ago

    I’d put money on humans scoring even less on subjects they’ve never heard of.

    They are testing is the ability to reason. The AI, or human, can still use the internet to find out the answer. Here’s a sample question that illustrates the distinction.

    Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.

    • Not_mikey@slrpnk.net
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      3 days ago

      Failing that question doesn’t mean it can’t independently reason, it just means it doesn’t have the knowledge to reason about it. That question is basically do you know how many paired tendons are attached to each of those bones and can you add them up. If the ai, like 99.999% of people, don’t know how many tendons are attached to those bones it can’t reason the answer.

      If you give the a.i. a similar question with something it knows it can reason through it fine. For example the question:

      How many legs do 13 humans, 4 cats and 63 dogs have in total?

      Chat gpt 4o gives the answer:

      To calculate the total number of legs:

      Humans: Each human has 2 legs. 13 × 2 = 26 13×2=26 legs.

      Dogs: Each dog has 4 legs. 63 × 4 = 252 63×4=252 legs.

      Cats: Each cat has 4 legs. 4 × 4 =16 4×4= 16 legs.

      Now, add them together: 26 + 252 + 16 = 294 26+252+16=294.

      Total legs = 294.

      I guess I can’t guarantee it’s never seen this question before but I’d say the odds are pretty low and the odds that it’s doing independent reasoning as you call it is high.

    • FearfulSalad@ttrpg.network
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 days ago

      That reads like the sort of thing Wolfram Alpha was designed to absolutely obliterate, if only the raw data representing each of those keywords had been loaded in.