• 0 Posts
  • 44 Comments
Joined 4 months ago
cake
Cake day: January 30th, 2026

help-circle
  • Yes, you do know the boundaries of AI. It is purely matrix multiplication: its output distribution is just as intelligible as the distribution of rolls of a dice. We receive a probability distribution for the next token given a sequence of tokens. This is demonstrable; search for softmax online.

    To fairly equate a dice roll event to a model prompt event we must understand the technicalities. To say you have a 20 sided die, is equivalent to saying you have a specific model’s architecture and value of every parameter, in the context of qualifying event determinism.

    If you can assume your die is fair, and 20 sided, that is an equivalent assumption about a model as to saying it’s llama-3.1-8B-instruct. That is, you do know the specific model weights, corresponding to a functional relationship between input and output which is deterministic. That is, if you know the model weights, which is equivalent to knowing whether a die is fair and n-sided, you can deterministically predict the output of a model as you can deterministically predict which number on a die will land

    You’re making specific, technical errors about the mathematical basis of language modeling, and equating things fallaciously to a similar deterministic event.

    Despite this, your intuition is right: we can’t perceptually predict the output of a model as we can’t perceptually predict what number will result from a die roll


  • Language modeling is equivalent to a dice roll (given a perfect random number generator). Setting the temperature to 0 removes all randomness from the output, meaning the model always selects the highest probability next word, and the model becomes 100% deterministic. That is, the output of a model is entirely predictable given temperature = 0, you know the model weights, and the seed/prompt.

    These technicalities aside, it’s true for both a dice roll event and a specific model/prompt event that, practically speaking, the outputs are treated as probabilistic despite being mathematically/technically deterministic: a human can’t predict with 100% accuracy the output of a die despite the theory (classical mechanics of die positioning, force, velocity, friction, …) proving determinism


  • How it currently exists, yes in most cases it is trained on stolen cognitive labor. Do you think this is inherent to the technology itself, however? Consider a model trained on entirely public domain data, or non-copyleft liscence not requiring attribution. E.g., talkie

    Totally agree that we need strict regulation.

    If only we lived in a society where people could be freely able to produce cognitive labor while also being guaranteed a dignified life with universal basic services and income, regardless of what they produce. Then, like with piracy, LLM training, in my opinion, could be trained on anything without harming original authors.


  • i honestly believe it isn’t that everyone here is only pitchforks and cheerleading. i agree “fuck AI” on the surface, semantically is a gross oversimplification without nuance; but rhetorically this really means “fuck AI corporations and their cronies”.

    this community isn’t strictly fuck AI from a technology standpoint, but from the environmental and socioeconomic standpoint.

    the “fanboys” refers to are supporters of the massive corporations pushing their slop and enshittification, which i hope you despise as much as the rest of us




  • The first study cited in the article, a meta study in cognition, alzheimer’s, sleep deprivation, traumatic brain injury, and depression notes:

    DC has conducted industry-sponsored research involving creatine supplementation and received creatine donations for scientific studies and travel support and speaking honoraria for presentations involving creatine supplementation at scientific conferences and on social media. In addition, DC serves on the Scientific Advisory Board for Alzchem and Create (companies that manufacture creatine products) and as an expert witness/consultant in legal cases involving creatine supplementation. NF declares no conflicts of interest


  • I don’t have any familiarity with using this kind of software, but I looked through the git repo of SavaPage. It looks like it has been actively developed for the past few years, which is a great sign, but it looks like almost all commits are done by one user. The issue tracker is also a little meager, with just one open issue, potentially pointing to a very small user base. Adoption heavily depends on as long as that one person keeps maintaining the project.


  • Honestly, you’re a few months late to the whole buying GPUs for local llms party, so expect exorbitant prices even for older cards

    The name of the game is vram. For the most part, more is better. If you can get your hands on multiple matching (same model) 24gb or higher cards (within price range), you’re golden.

    Going for more than 2 gpus can become challenging with motherboard pcie slot heights, so make sure either your cards aren’t too tall or you have widely spaced out pcie slots.

    For inference, speed (tokens/second) is limited by memory bandwidth. Go for faster bandwidth memory cards if you can afford it (e.g. GDDR6 will be faster than GDDR5).

    Also with multi gpus you will need an adequate power supply, and a large enough case.

    If you want to be a bit eccentric and load huge models, you can also go the CPU route and fill up a motherboard with 256 GB ram, because then you’re in the several hundred B param model territory, which could, depending on your use case, be better than having faster inference on smaller/quantized models. Even then, DDR5 with high MHz is still way slower than gpus.


  • yea there’s still honestly some downsides to Qobuz, including:

    • Artist profiles: lack of consistency on details like images, descriptions
    • Generated recommendations: magazine articles and album reviews (sometimes) written by humans are top notch; the tradeoff is that recommendations based on specific playlists are often far less “close” musically and I often get random and unexpected auto plays; there is no “daily mix” or “similar artists” or good recommendations for adding new tracks to a longer playlist
    • Library: across the many diverse genres I listen to, frequently newer releases are delayed on Qobuz. Older music library is outstanding, extremely few of my 10s of thousands of total tracks of jazz records were unavailable








  • Honestly it heavily depends on the use case, in terms of making the model better and choosing between RAG/FT. The most important thing to consider is what sort of changes you want to make to the model. FT is still a good choice if you’re looking for: strict output formatting (json/yaml/…) and refining for highly specific, narrow domain tasks. RAG is better for knowledge freshness, having source citations, and greatly lowers hallucinations.

    RAG will inflate your context windows (more tokens) at inference time, so slower responses and requiring more energy at compute, whereas fine-tuning takes a ton of gpu compute up front (but retains smaller token counts at inference). If you’re doing 100,000 prompts a day, and only need to train once, FT makes more sense; if you’re doing 100 prompts a day and your knowledge database is constantly changing, RAG makes the most sense.

    It’s hard to give a formalized estimate on energy efficiency: fine-tuning and getting to a certain training accuracy can take some undeterminate amount of time (and money on rented GPU compute), but could be a better choice if you think that up-front cost will be paid off over time if you use the model very frequently and only fine-tune once. On the other hand, going the RAG route will have an absolutely free up front compute (energy) cost, but be slightly more at compute time due to more tokens.

    What’s your specific task you’re considering for FT or no FT? This is the most important thing to choose.


  • I do AI research for school. I’m specifically interested in safety alignment. I have studied the original papers for different fine tuning methods: LoRA is typically the baseline and there exist many variants, notably Q-LoRA

    In general, fine tuning is not practically beneficial for hobby level foundation models. It in fact comes with many disadvantages. Primarily, it is difficult to maintain the intelligence of the model and avoid overfitting.

    If you are trying to adapt a model to a specific task, you are generally going to find more success with using RAG and just adding more context to the model that way. Don’t waste time and compute $$ on training.


  • Has anyone compiled a list of where projects are moving to? I know many linux desktop applications are self hosting on gitlab, but i’ve also seen gitea and codeberg. If anyone has opinions about a preference, do comment. I have been enjoying self hosting gitea for my simple personal projects and for deploying simple web apps, all on $5 vps.