It’s clear that companies are currently unable to make chatbots like ChatGPT comply with EU law, when processing data about individuals. If a system cannot produce accurate and transparent results, it cannot be used to generate data about individuals. The technology has to follow the legal requirements, not the other way around.

  • GenderNeutralBro@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 months ago

    ChatGPT is not an information repository.

    ChatGPT is not an information repository.

    ChatGPT is not an information repository.

    The correct answer to this problem is not “we can’t correct it”; it is “this class of task is completely out of scope for ChatGPT, and we will do everything we can to make sure users understand that”. Unfortunately, OpenAI knows damn well this is how the public perceives and uses its product and seems happy to let this misconception persist.

    We do need laws to curb this, but it’s really more a marketing issue than a technological issue. The underlying technology is amazing; the applications built around it are mostly garbage. What we have here is a hype trainwreck.

    • gedaliyah@lemmy.worldOPM
      link
      fedilink
      arrow-up
      0
      ·
      6 months ago

      Yet, LLMs are trained on data - an information repository. They are capable of accessing and recalling the contents of that information repository, and relaying information from that repository to an end user. It may not be an information repository functionally, but it legally seems to have the capabilities to be classified as one. (I am neither a lawyer nor a programmer, and I am not in the EU.)

      The software breaks the law, and the people who built it knew that this was likely the case. It was developed as a research project, which has very different legal requirements from a consumer product. They might not outright ban the software, but they might issue some hefty fines, etc. Banning a product is not the only recourse of the courts.

      • CarbonatedPastaSauce@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        6 months ago

        They are capable of accessing and recalling the contents of that information repository, and relaying information from that repository to an end user.

        This is not correct based on my understanding of LLMs, but I am certainly not an expert. As I understand it, it’s basically a statistics exercise in how they determine what order to put words into. They don’t ‘look stuff up’ in their training data. They probably don’t even have access to their training data once the model is complete. These models are trained on terabytes of data but are small enough to fit in memory, so it’s impossible for them to still have access to all that. But it wouldn’t matter if they did, because that’s not how they work.

        • UnpluggedFridge@lemmy.world
          link
          fedilink
          arrow-up
          0
          ·
          6 months ago

          LLMs do not look stuff up (except when they have an API that allows them to), but I think OP’s point still stands. The statistical next token predictor metaphor is useful , but in many regards that’s what text and language are. If you can understand that certain words are linked to certain other words, then you should be able to appreciate that certain groups of words can be associated in a way that is functionally the same as data.

          I have not memorized the pytorch documentation, but I can use what I understand about pytorch and other libraries to infer specific aspects of the library that I am not familiar with. Functionally, this is no different than if I accessed the documentation directly. If I communicate this information to others I have functioned as a data repository. The repository works on a more abstract and error-prone level, but it works nonetheless.

          Here is another very concrete example: LLMs know George Washington’s birthday. Not because they look up that information, but because of the learned associations between George Washington, birthday, and his actual date of birth.

          • CarbonatedPastaSauce@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            6 months ago

            I can use what I understand about pytorch and other libraries to infer specific aspects of the library that I am not familiar with.

            This is what LLM’s can’t do though. They can’t use what they understand because they don’t understand anything. They can’t infer, they can’t reason, they can’t evaluate or compare. They can spit out words that make it look like they did those things, but they didn’t.