• diffuselight@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    1 year ago

    Cost reduction in the field is orders of magnitude potential. Look at llama running on everything down to a raspy pi after 2 months.

    There are massive gains to be made - once we have dedicated hardware for transformers, that’s orders of magnitude more.

    See your phone being able to playback 24h of video but die after 3h of browsing? Dedicated hardware codec support

      • diffuselight@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        1 year ago

        The trajectory is such that current L2 70B models are easily beating 3.5 and are approaching GPT4 performance - an A6000 can run them comfortably and this is a few months only after release.

        Nah the trajectory is not in favor of proprietary, especially since they will have to dumb down due to alignment more and more

        https://www.anyscale.com/blog/llama-2-is-about-as-factually-accurate-as-gpt-4-for-summaries-and-is-30x-cheaper?trk=feed_main-feed-card_feed-article-content

          • diffuselight@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            1 year ago

            A 30B model which will be fine for specialized tasks runs on a 3090 or any modern mac today.

            We are months away from being affordable at current trajectory

              • diffuselight@lemmy.world
                link
                fedilink
                arrow-up
                2
                ·
                edit-2
                1 year ago

                I think at this point we are arguing belief.

                I actually work with this stuff daily and there is a number of 30B models that are exceeding chatGPT for specific tasks such as coding or content generation, especially when enhanced with a lora.

                airoboros-33b1gpt4-1.4.SuperHOT-8k for example comfortably outputs > 10 tokens/s on a 3090 and beats GPT-3.5 on writing stories, probably because it’s uncensored. It’s also got 8k context instead of 4.

                Several recent LLama 2 based models exceed chatgpt on coding and classification tasks and are approaching GPT4 territory. Google bard has already been clobbered into a pulp.

                The speed of advances is stunning.

                M- architecture macs can run large LLMs via llama.cpp because of unified memory interface - in fact a recent macbook air with 64GB can comfortably run most models just fine. Even notebook AMD GPUs with shared memory have started running generative AI in the last week.

                You can follow along at chat.lmsys.org. Open source LLMs are only a few months but have started encroaching on the proprietary leaders who have years of headstart

                  • diffuselight@lemmy.world
                    link
                    fedilink
                    arrow-up
                    2
                    arrow-down
                    1
                    ·
                    1 year ago

                    I doubt someone who can’t google the price of macbook air can afford or even operate anything remotely useful in the LLM space.