CEO Jensen Huang claims his AI chip business is diversified, but his own company’s filings suggest just a few whales account for every second dollar in revenue during the past quarter.
I hope this bubble bursts soon I’m getting real tired of it.
So yeah, the scale of it can get wild. But from what I can tell, it seems like there’s a clear diminishing returns on usefulness of throwing more processing power at model training and more breakthroughs are needed in the architecture to get much meaningfully further on general model “competence.” The main problem seems to be that you need a ridiculous amount of decent data to make it worth scaling up. Not only in terms of the model showing signs of actually being “better”, but in terms of the cost to run inference on it when a given user actually uses the model. And quantization can somewhat reduce the cost to run it, but in exchange for reducing overall model competence.
Right now, my general impression is that the heavy hitter companies are still trying to figure out where the boundaries of the transformer architecture are. But I’m skeptical they can push it much further than it has gone through brute forcing scale. I think LLMs are going to need a breakthrough along the lines of “learning more from less” to make substantial strides beyond where they’re at now.
I guess these whales do benefit from more efficient llms as well, it’s not like their choice is “expand compute power” XOR “use more efficient llm”. Worst case they can rent spare compute power to other companies.
Meta trained Llama 3.1 405b model on 16 thousand H100s: https://ai.meta.com/blog/meta-llama-3-1/
So yeah, the scale of it can get wild. But from what I can tell, it seems like there’s a clear diminishing returns on usefulness of throwing more processing power at model training and more breakthroughs are needed in the architecture to get much meaningfully further on general model “competence.” The main problem seems to be that you need a ridiculous amount of decent data to make it worth scaling up. Not only in terms of the model showing signs of actually being “better”, but in terms of the cost to run inference on it when a given user actually uses the model. And quantization can somewhat reduce the cost to run it, but in exchange for reducing overall model competence.
Right now, my general impression is that the heavy hitter companies are still trying to figure out where the boundaries of the transformer architecture are. But I’m skeptical they can push it much further than it has gone through brute forcing scale. I think LLMs are going to need a breakthrough along the lines of “learning more from less” to make substantial strides beyond where they’re at now.
I guess these whales do benefit from more efficient llms as well, it’s not like their choice is “expand compute power” XOR “use more efficient llm”. Worst case they can rent spare compute power to other companies.
Maybe. Idk
Yeah this is true. For companies that operate the scale of FAANG, even seemingly insignificant savings add up to become significant.