Futurology Today
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
LughMA to FuturologyEnglish · 1 year ago

Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

www.nature.com

external-link
message-square
9
link
fedilink
12
external-link

Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

www.nature.com

LughMA to FuturologyEnglish · 1 year ago
message-square
9
link
fedilink
Two-faced AI language models learn to hide deception
www.nature.com
external-link
‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.
alert-triangle
You must log in or register to comment.
  • sbv@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    So they’re saying ai is software?

    Maybe Volkswagen will start using it in their emissions control systems.

  • Daxtron2@startrek.website
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    LLM trained on adversarial data, behaves in an adversarial way. Shocking

    • CanadaPlus
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      1 year ago

      Yeah. For reference, they made a model with a back door, and then trained it to not respond in a backdoored way when it hasn’t been triggered. It worked but it didn’t effect the back door much, and that means that it technically was acting more differently - and therefore deceptively - when not triggered.

      Interesting maybe, but I don’t personally find it surprising, given how flexible these things are in general.

  • Possibly linux@lemmy.zip
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Great, we are all going to die

  • mateomaui@reddthat.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Just… don’t hook it up to the defense grid.

    • Possibly linux@lemmy.zip
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Sorry, to late for that

      • mateomaui@reddthat.com
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        Alright, I’ll be out back digging the bomb shelter.

        • Possibly linux@lemmy.zip
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 year ago

          Its too late for that honestly

          • mateomaui@reddthat.com
            link
            fedilink
            English
            arrow-up
            2
            ·
            1 year ago

            Alright, I’ll switch to digging holes for the family burial ground.

Futurology

futurology

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !futurology@futurology.today
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 235 users / day
  • 437 users / week
  • 1.46K users / month
  • 6.33K users / 6 months
  • 91 local subscribers
  • 2.6K subscribers
  • 1.84K Posts
  • 11.6K Comments
  • Modlog
  • mods:
  • voidx
  • Lugh
  • Espiritdescali
  • AwesomeLowlander
  • BE: 0.19.11
  • Modlog
  • Legal
  • Instances
  • Docs
  • Code
  • join-lemmy.org