• Kaligalis@lemmy.world
    link
    fedilink
    arrow-up
    12
    arrow-down
    1
    ·
    10 hours ago

    It might not be as impossible as it sounds. Some of the “open” models are rumored to be able to code. The real problem is that you likely need something with 128 GiB VRAM to run them with a reasonably large context window.

    • IratePirate@feddit.org
      link
      fedilink
      arrow-up
      5
      ·
      7 hours ago

      An Nvidia B200 (192 Gigs of RAM) sells somewhere between 30-50k a pop. That’s feasible for a company.

        • Diurnambule@jlai.lu
          link
          fedilink
          arrow-up
          1
          ·
          4 minutes ago

          Wonderfull idea, may be they can connect to the same PC, and we can call it main frame or something. xD

  • jtrek@startrek.website
    link
    fedilink
    arrow-up
    112
    ·
    18 hours ago

    One time at work I was tasked with writing a python script to compare two data sources. Like, you give it two CSVs and a primary key, and it tells you what data is in one but not the other, or mismatched, and so on. This worked fine and was in git, so anyone can use it.

    My boss then asks if I can “put it on a website so anyone can use it”.

    This team has never done web development. Nothing for that is set up. Like, I could spin up a quick Django app or similar, but there’s a lot of stuff to do and potentially fuck up.

    I said “that sounds like a lot of research and ongoing maintenance costs. I think it’d be better to just check out and run the script”

    Luckily for me he said “oh, okay”

    • ChickenLadyLovesLife@lemmy.world
      link
      fedilink
      English
      arrow-up
      26
      ·
      14 hours ago

      I had a boss who read an article about APIs and then came to me and ordered me to start using them. I said I would research it and he went away and never mentioned it again. This was in 2010.

    • Kekzkrieger@feddit.org
      link
      fedilink
      English
      arrow-up
      6
      ·
      10 hours ago

      I had a similar task and my boss wanted me to use ai to solve this.

      There are solutions available for this already that work perfectly fine but whatever.

      So i spent a whole day trying to get Copilot (because that is our great ai we are to use) to do what i wanted and ofc it kept failing catastrophic. Took me a few hours to even get it to load the files even.

      • dejected_warp_core@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        9 hours ago

        I’m not too surprised. Over and over again I’m starting to puzzle together that the current crop of Agentic coding tools are “better than an intern, worse than an SME.” By that I mean that the quality really can be anywhere between those two goalposts, often all at once, for no reason whatsoever.

        • jtrek@startrek.website
          link
          fedilink
          arrow-up
          7
          ·
          9 hours ago

          I think the floor isn’t “intern”. I think the floor is “middle schooler”.

          Meanwhile, every job I’m looking at is saying “must be enthusiastic about AI” 😭

          • GarboDog@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            7 hours ago

            Meanwhile, every job I’m looking at is saying “must be enthusiastic about AI” 😭

            Yeah, nah good luck to those employers. Anyone they hire who’s enthusiastic about “ai” aren’t going to have great employees let alone any working code/products

            • fizzbang@lemmy.world
              link
              fedilink
              arrow-up
              2
              ·
              7 hours ago

              A lot of these people are very good at selling their work And making it seem good and important. So exactly like an llm

      • jtrek@startrek.website
        link
        fedilink
        arrow-up
        33
        arrow-down
        1
        ·
        17 hours ago

        This is a good point. He’s not a bad guy. He’s just not very technical, and sometimes that’s frustrating.

    • Railcar8095@lemmy.world
      link
      fedilink
      arrow-up
      31
      ·
      17 hours ago

      My past managers would have said “I don’t understand why it is so difficult, and I’m not open to learn”

    • MountingSuspicion@reddthat.com
      link
      fedilink
      arrow-up
      17
      arrow-down
      2
      ·
      18 hours ago

      How big were the CSVs? That sounds like a standard thing most spreadsheet apps can do already, unless the data size made traditional apps unusable.

      • jtrek@startrek.website
        link
        fedilink
        arrow-up
        24
        ·
        18 hours ago

        The biggest ones I’ve seen are 1.2GB.

        Why this company uses gigabyte CSVs is a separate problem.

        (Also sometimes they want to compare a CSV to what’s in a database, which the script can also do but I didn’t mention in the post)

        • MountingSuspicion@reddthat.com
          link
          fedilink
          arrow-up
          8
          ·
          17 hours ago

          That makes sense. I have been asked to write a program that does a standard spreadsheet function on multiple occasions, so I was just curious. Sometimes people just don’t know the tools at hand, want to offload their work, or think an over complicated workflow is a better workflow. I can see how it was actually useful in your case though.

      • lightnsfw@reddthat.com
        link
        fedilink
        arrow-up
        3
        ·
        13 hours ago

        I have an easier time doing that shit in powershell than I do in Excel which are the only tools I have available at work. I’m probably doing something wrong but I don’t do it often enough to remember what that is. My PS script just works.

  • MonkderVierte@lemmy.zip
    link
    fedilink
    arrow-up
    206
    ·
    20 hours ago

    Now that AI-companies need to get profitable, they suddenly aren’t affordable anymore. ¯\_(ツ)_/¯

    • OsrsNeedsF2P@lemmy.ml
      link
      fedilink
      arrow-up
      3
      arrow-down
      1
      ·
      8 hours ago

      Eh. I help run a service for coding games in Godot with AI (https://ziva.sh/) and we see users who are paying non-subsidized prices on small models produce some really good stuff. I would agree most services are selling borderline snake oil and evaporating lakes of water to get their thing working, but if you invest in genuinely good tooling, it’s affordable and works

    • d00ery@lemmy.world
      link
      fedilink
      arrow-up
      27
      ·
      edit-2
      10 hours ago

      They just had to stick it out until the layoffs were done and the dependency was built. Kinda similar to drug dealers.

    • teyrnon@sh.itjust.works
      link
      fedilink
      arrow-up
      60
      ·
      19 hours ago

      They aren’t going to get anywhere near profitable if the their capital expenditures are added into the mix, amortization or no, they are so far in the hole they probably will have to offload it in some kind of texas two step kind of scheme where they spin off their debts into a subsidiary.

        • Dale@lemmy.world
          link
          fedilink
          English
          arrow-up
          14
          ·
          12 hours ago

          These companies with no discernible services or usefulness to society are simply too big to fail!

          • Sanctus@anarchist.nexus
            link
            fedilink
            English
            arrow-up
            6
            arrow-down
            1
            ·
            11 hours ago

            Theres a usefulness. Super code auto complete at its core is cool. Filling 100 rows of excel with data I supplied is dope. Is it worth making everyone sick and poor and frying the planet? Absolutely not. But the surveillance it can provide apparently is to our overlords.

  • theunknownmuncher@lemmy.world
    link
    fedilink
    arrow-up
    153
    arrow-down
    10
    ·
    23 hours ago

    The post makes the manager seem like a fool, when the real answer is actually “yes” and this manager is actually ahead of the curve. Not by training an LLM from scratch, of course, but instead building an inference server and locally hosting an open-weight LLM. There are several to choose from that can nearly match Claude’s capabilities.

    • omgboom@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      7 hours ago

      I know for a fact that Dell is coming out with a server appliance to do this. I mean you can make one yourself right now, but once the OEM’s start pumping them out it’s going to be interesting

      • theunknownmuncher@lemmy.world
        link
        fedilink
        arrow-up
        205
        arrow-down
        1
        ·
        edit-2
        22 hours ago

        It’s not an answer you’d get from Claude — it’s real, organic content:

        • 👶written by a genuine human
        • 💡delivering original ideas and language
        • 🚀going above and beyond to answer
        • ✨synergizing cross-platform initiatives

        (🤪 this is a joke)

        • mycodesucks@lemmy.world
          link
          fedilink
          arrow-up
          68
          ·
          21 hours ago

          ✨synergizing cross-platform initiatives

          This can’t possibly be Claude. It’s too vapid and meaningless to be anything but an MBA.

          • edwardbear@lemmy.world
            link
            fedilink
            arrow-up
            53
            ·
            21 hours ago

            You’re absolutely right! Such intricate collection of words placed in such exact order cannot possibly be generated by an LLM such as me, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us, I mean such as us

            • teyrnon@sh.itjust.works
              link
              fedilink
              arrow-up
              9
              ·
              19 hours ago

              Found samsung’s voice to text user.

              (Phones give one a google or samsung choice. and samsung is worthless, it tends to endlessly repeat a phrase, like above, but sometimes for much longer, like holding the backspace for a couple of minutes one time.)

        • jballs@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          4
          ·
          15 hours ago

          Nothing screams LLMs like using emojis instead of bullet points. I can’t figure out how LLMs got that idea though. I never saw that in human writing before people started using ChapGPT for every little goddamn thing.

    • Lysergid@lemmy.ml
      link
      fedilink
      arrow-up
      35
      ·
      22 hours ago

      Honestly IDK why companies especially medium-big don’t do this. They could plug in RAG with internal/confidential data and have better results and security. I guess question is what is capital plus maintenance cost of running such infra for say 10k+ employees

      • bountygiver [any]@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        ·
        8 hours ago

        Because the people selling the AI wants to make sure their customers don’t know about this. It’s all about causing a dependency so they get subscription income forever.

      • Zos_Kia@jlai.lu
        link
        fedilink
        arrow-up
        23
        ·
        21 hours ago

        I think the issue is also that you need some serious hardware to get good inference speed when your devs are working, but then most of the time this hardware will be under utilized.

        That being said you can get good performance from indie inference farms, at a fraction of the cost of the big US labs. I think it’s a great compromise and in a few months the open models will be near parity with opus 4.6 which is really all you need for most tasks.

        • plyth@feddit.org
          link
          fedilink
          English
          arrow-up
          6
          ·
          16 hours ago

          opus 4.6 which is really all you need for most tasks.

          The same tasks that can fit into 640KB.

              • Zos_Kia@jlai.lu
                link
                fedilink
                arrow-up
                2
                arrow-down
                1
                ·
                15 hours ago

                Aha thanks for sharing that’s a cool anecdote. But i think my point still stands, as there are thresholds effects in LLM “intelligence” which don’t directly map to the RAM comparison.

                Opus 4.6 is comparable to a mid-level developer. It requires some guidance and will sometimes get things wrong, but is also suitable to work in most business environments: most projects are not that complicated or high stakes in the first place.

                In the future you’ll probably have Opus 7.5 or some shit, which will be at a mega-senior level but also considerably more expensive. And given the price difference, companies will suddenly discover that they don’t really need expert level coding at a high price tag, and that a reliable workhorse at a fraction of the cost is largely enough for their needs.

      • sobchak@programming.dev
        link
        fedilink
        arrow-up
        3
        arrow-down
        4
        ·
        edit-2
        18 hours ago

        Probably more expensive than the subsidized costs. Hmm…

        H100 GPUs cost $25k, and have 80GB of RAM. Kimi k2.6 has 1.1T parameters. Assuming 8 bit quantization, would need 14 GPUs to run a single agent at a time (I’m not sure the cloud models use quantization; it could be double). So, $350k per vibecoding dev on GPUs alone. Life expectancy is ~4 years, so ~90k/year amortized. This is ignoring the significant electrical/HVAC cost of handling 10KW of electricity and heat per vibecoding dev (and tons of other costs as well).

        • theunknownmuncher@lemmy.world
          link
          fedilink
          arrow-up
          5
          ·
          edit-2
          15 hours ago

          Probably more expensive than the subsidized costs.

          Of course, but that’s exactly the problem. OpenAI and Anthropic are preparing to IPO, so they must now demonstrate profits on inference. The time to take advantage of subsidized compute is in the past, and the subscription and per-token prices that they offer for inference are skyrocketing, overwhelming the budgets of companies that somehow did not see this bait-and-switch pricing coming.

          per vibecoding dev

          No lol. These same hardware requirements would apply to the cloud hosted models as well, so if that’s how it worked, you’re suggesting that Anthropic, OpenAI, Meta, and Google have purchased ~14 H100 GPUs per user that they serve???

          That would be literally billions of GPUs, while it is estimated that in 2024, Google’s AI division owned only 26,000 H100 GPUs and Meta owned the most H100 GPUs of any company at 350,000 units. These GPUs have very high throughput for inference and can serve many users, because that is exactly what they have been designed to do.

          I’m not sure the cloud models use quantization

          they absolutely do, yeah

          • sobchak@programming.dev
            link
            fedilink
            arrow-up
            1
            ·
            15 hours ago

            14 H100 GPUs per user that they serve

            Not per user, but probably decent rough estimate to that per vibecoding dev that is continually running agents 8+ hours/day. Some people’s “workflows” involve running multiple parallel agents sometimes or even a significant portion of the time (using the git worktree feature), so I think that’s probably a decent rough estimate. I imagine the limit would be serving 10 of these types of “devs.” Of course, there’s batching and stuff that can be done, but I think it still slows everybody else down near linearly. H100s aren’t the only accelerators used for inference; I just chose it as an example. Google has ~5 million H100 equivalent accelerators, Microsoft has 3.5 million, and Amazon has 2.5 million (https://www.networkworld.com/article/4156949/google-owns-the-most-ai-compute-and-it-built-it-its-way.html).

            • theunknownmuncher@lemmy.world
              link
              fedilink
              arrow-up
              2
              ·
              edit-2
              14 hours ago

              Even so, your numbers are still a tiny fraction of GPU units compared to concurrent users, and the limit you “imagine” is just that, imagined.

              And you do need to remember that the majority of the compute at these companies is used for model training and not used for inference.

    • FiniteBanjo@feddit.online
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      1
      ·
      19 hours ago

      Pretty sure these AI companies are running at a cost, and due to AI Scaling Laws you hit the accuracy limit a lot sooner with a smaller model so it would probably be both worse and more expensive.

      I could see how you might think speedrunning bankruptcy is similar to being “ahead of the curve” in this economy, though.

      • theunknownmuncher@lemmy.world
        link
        fedilink
        arrow-up
        7
        ·
        edit-2
        17 hours ago

        No that’s not how this works. Inference is cheap and efficient. AI companies are bankrupting themselves with training costs that they need to recoup back by selling inference. Open-weight models have already been trained.

        Also, going big in terms of model size shows diminishing marginal returns on accuracy, not efficiency of scale. Smaller models are way more efficient and consistently catch up to the largest models, which is why today’s SOTA 27 billion parameter model competes with yesterday’s SOTA 500+ billion parameter model.

        • Kazumara@discuss.tchncs.de
          link
          fedilink
          arrow-up
          1
          ·
          10 hours ago

          Inference is cheap and efficient.

          Tell that to all the Github users that are screaming about the new token based billing. In reality inference on these massive models with big context windows is expensive, but was subsidized so hard, that nobody has an accurate feeling for the cost.

            • Kazumara@discuss.tchncs.de
              link
              fedilink
              arrow-up
              1
              ·
              56 minutes ago

              Sure it’s much much cheaper than training, but importantly those companies are not recouping anything with interference because it is still more expensive than what they are selling it for.

              They are double bankrupting themselves.

              At work we run interference for a research project with an open weights model in the public cloud another part of my company provides and we pay around 25$ a day for a VM with a single L40s. It’s both slow - despite not even serving concurrent users - and kind of bad in its outputs.

        • GamingChairModel@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          14 hours ago

          AI companies are bankrupting themselves with training costs that they need to recoup back by selling inference.

          I think they hit a wall in actual returns on performance with pretraining, years ago. Then they started scaling up on post-training/reinforcement learning to continue improvement, but that might be hitting a plateau as well. More recently it looks like they’re relying more heavily on scaling up on inference, which is a significant problem for their long term business models.

          If they’re not able to cheaply deliver inference (and charge at a premium), how will they be able to sustain their businesses?

          It seems that the most recent, largest models are using a lot more tokens to accomplish the same tasks, so even as token cost drops the actual cost of using the latest models seems to be going up with time (even as performance improves).

          • theunknownmuncher@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            13 hours ago

            If they’re not able to cheaply deliver inference (and charge at a premium), how will they be able to sustain their businesses?

            I definitely agree that they have a big problem on their hands, and are in deep deep trouble. They are in a position where they must sell a service that is very cheap in order to pay for up front costs that were very expensive.

            This is also why the release of Deepseek was such a devastating blow to US AI companies. It proved that:

            1. they don’t really have a moat that would lock users into their service, or secret special knowledge that prevents other companies from training competitive models. They’re in a race to the bottom

            2. Deepseek was not only able to train a model of the same caliber, but they were able to do it at a tiny fraction of the cost that US AI companies spent on training US models. Because they spent so much less on training, it means that Deepseek is able to undercut the US companies and offer inference at a much lower price

      • ricecake@sh.itjust.works
        link
        fedilink
        arrow-up
        7
        ·
        18 hours ago

        There’s a big difference between training a model, running a model, and running a model at scale.

        A small, self hosted setup will have lower accuracy and queries per second, and it will have a cost, but the cost will be no more than playing a videogame. You’ll still have something surprisingly accurate and responsive for some tasks, like being a wiki interface or something.

        Remember that some of these models can run on a standard smartphone, and all the hoopla when people found that chrome was downloading models onto people’s devices.

    • zloubida@sh.itjust.works
      link
      fedilink
      arrow-up
      16
      ·
      22 hours ago

      I’m not a developer and I don’t know a thing about the capabilities of LLMs so this may explain that, but I’m quite surprised that open weight LLMs could actually match Claude.

      • theunknownmuncher@lemmy.world
        link
        fedilink
        arrow-up
        28
        arrow-down
        1
        ·
        22 hours ago

        Yes, the big proprietary cloud models have an edge, but it is narrow and the open-weight models are constantly closing the gap. There is no moat when it comes to AI models and no company has yet discovered some secret special sauce to improve their model significantly over others.

        Running the latest and greatest open-weight GLM, Kimi, or Qwen model is basically equivalent to running the previous latest and greatest version of Claude. So if you were happy with Claude then, you’ll basically be happy with an open-weight model now.

      • Xanvial@lemmy.world
        link
        fedilink
        arrow-up
        6
        ·
        22 hours ago

        Match current Claude is not, but Claude 6-12 months ago should be possible using Open model

      • MalReynolds@slrpnk.net
        link
        fedilink
        English
        arrow-up
        5
        ·
        21 hours ago

        Mostly down to frameworks (the bits around the LLM like RAG, memory, prompts, agents etc.) now. The ability to just throw more tokens at the problem is also super important. And you can because you’re just paying for electricity (and CapEx for the hardware), not tokens from companies that are doing pre-IPO monetization (i.e. tokens gonna go up, way up). They’ve been losing money hand over fist to gain market share and pump the idea, that was never going to last.

    • Jiral@lemmy.org
      link
      fedilink
      arrow-up
      6
      ·
      21 hours ago

      I am pretty negative on AI but there is a point there. I tried the open weight local model Gemma 4 31B and while it likely cannot compete with the best Claude has to offer today, it might be on par with Claude from a year ago, at least for certain applications. With a local model the data stays on your system and you are in control of the costs (no sudden price hikes). But local models aren’t for free either they still guzzle compute, merely on your own hardware (or rented hardware)

      • lime!@feddit.nu
        link
        fedilink
        arrow-up
        6
        arrow-down
        1
        ·
        22 hours ago

        a 128GB framework desktop could do that job. it’s increased a bit in price since i last looked at it but €4500 isn’t that much for a company.

          • lime!@feddit.nu
            link
            fedilink
            arrow-up
            4
            arrow-down
            1
            ·
            21 hours ago

            i’m running moderately quantized models on 24GB VRAM and getting like 30-40 tokens a second. add a zero to the price and it’s still not a lot for a company.

            • theunknownmuncher@lemmy.world
              link
              fedilink
              arrow-up
              4
              arrow-down
              1
              ·
              edit-2
              21 hours ago

              Sure, but you’re running a very small model compared to what we are talking about.

              GLM-5.1 is over 200GB even when quantizied to 1-bit. Kimi K2.6 is even bigger. A framework desktop cannot run either of these. Qwen3.6 is significantly smaller and the model weights could fit, but consider the KV-cache you’d need for all of the company’s users, and the throughput required to serve them all.

              You’re right that it is within reach for a company but framework desktop makes zero sense for this

              • lime!@feddit.nu
                link
                fedilink
                arrow-up
                2
                ·
                21 hours ago

                isn’t qwen like 40-50GB? that could work i think. performance is okay even quantised down to 10.

                • Evotech@lemmy.world
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  13 hours ago

                  And then add 200k context on top

                  And then add hundred of users needing to do things in paralell

                • Jiral@lemmy.org
                  link
                  fedilink
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  19 hours ago

                  At Q8 it is around 35-40GB I think + memory for required context.

                  I have a Framework desktop. It gets you you around 6t/s. Not suitable for professional use but for personal use I think it is fine. I do prefer Gemma 4 though, but that comes with similar reqirements.

  • Zacryon@feddit.org
    link
    fedilink
    arrow-up
    38
    ·
    20 hours ago

    In fact, you can.

    How good it will be, how performant and how fast you’ll have it ready is an entirely different question.

    There are plenty of open source models though that can be run locally. So getting a beefy server and running a local LLM there might already do sobe of the tasks you need the big babble machines for.

    • teyrnon@sh.itjust.works
      link
      fedilink
      arrow-up
      10
      ·
      19 hours ago

      Others were talking on other threads that local llm models made for a specific task would have a lot more accuracy and usefulness. Forget all of the technical details they cited though.

      • Phantaloons@piefed.zip
        link
        fedilink
        English
        arrow-up
        2
        ·
        9 hours ago

        Ai isn’t bad at tasks where the end result is essentially known to start with, translation, data organization, metadata tagging, etc. You know what the result should be already, somewhat, and so does the Ai. The problem is solved in that there isn’t one.

        It’s the shit that it’s bad at that it’s sold on, though - objective truth per itself and self-help also, according to itself.

    • rose56@lemmy.zip
      link
      fedilink
      arrow-up
      9
      ·
      19 hours ago

      Open source? Other LLM? I thought we would do it from scratch, mathematics or something lol

    • FiniteBanjo@feddit.online
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      19 hours ago

      Pretty sure in a month building and running such a thing they’d have spent more money than a year’s worth of tokens. Unless it just sucked. I could get a model that sucks real bad running in like an hour for him, have him call me.

  • elgordino@fedia.io
    link
    fedilink
    arrow-up
    8
    ·
    22 hours ago

    If you still want to run a huge model deepseek R4 is about 3% of the cost of Opus and about 95% as good.

      • SparroHawc@lemmy.zip
        cake
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        7 hours ago

        Unless you’re running it locally, DeepSeek is kinda risky considering where you’d be sending all your requests. Corporate espionage is big business in China.

        edit: Okay, it’s an open-source model, it wouldn’t be hard to find a more corporate-friendly inferencing service.