ejs

ejs@piefed.social · 1 day ago

Yes, you do know the boundaries of AI. It is purely matrix multiplication: its output distribution is just as intelligible as the distribution of rolls of a dice. We receive a probability distribution for the next token given a sequence of tokens. This is demonstrable; search for softmax online.

To fairly equate a dice roll event to a model prompt event we must understand the technicalities. To say you have a 20 sided die, is equivalent to saying you have a specific model’s architecture and value of every parameter, in the context of qualifying event determinism.

If you can assume your die is fair, and 20 sided, that is an equivalent assumption about a model as to saying it’s llama-3.1-8B-instruct. That is, you do know the specific model weights, corresponding to a functional relationship between input and output which is deterministic. That is, if you know the model weights, which is equivalent to knowing whether a die is fair and n-sided, you can deterministically predict the output of a model as you can deterministically predict which number on a die will land

You’re making specific, technical errors about the mathematical basis of language modeling, and equating things fallaciously to a similar deterministic event.

Despite this, your intuition is right: we can’t perceptually predict the output of a model as we can’t perceptually predict what number will result from a die roll

ejs@piefed.social · 2 days ago

Language modeling is equivalent to a dice roll (given a perfect random number generator). Setting the temperature to 0 removes all randomness from the output, meaning the model always selects the highest probability next word, and the model becomes 100% deterministic. That is, the output of a model is entirely predictable given temperature = 0, you know the model weights, and the seed/prompt.

These technicalities aside, it’s true for both a dice roll event and a specific model/prompt event that, practically speaking, the outputs are treated as probabilistic despite being mathematically/technically deterministic: a human can’t predict with 100% accuracy the output of a die despite the theory (classical mechanics of die positioning, force, velocity, friction, …) proving determinism

ejs@piefed.social · 2 days ago

How it currently exists, yes in most cases it is trained on stolen cognitive labor. Do you think this is inherent to the technology itself, however? Consider a model trained on entirely public domain data, or non-copyleft liscence not requiring attribution. E.g., talkie

Totally agree that we need strict regulation.

If only we lived in a society where people could be freely able to produce cognitive labor while also being guaranteed a dignified life with universal basic services and income, regardless of what they produce. Then, like with piracy, LLM training, in my opinion, could be trained on anything without harming original authors.

ejs@piefed.social · 2 days ago

i honestly believe it isn’t that everyone here is only pitchforks and cheerleading. i agree “fuck AI” on the surface, semantically is a gross oversimplification without nuance; but rhetorically this really means “fuck AI corporations and their cronies”.

this community isn’t strictly fuck AI from a technology standpoint, but from the environmental and socioeconomic standpoint.

the “fanboys” refers to are supporters of the massive corporations pushing their slop and enshittification, which i hope you despise as much as the rest of us

ejs@piefed.social · 3 days ago

i would say this is like if open code and open web ui had a baby it would be this. It’s a web interface for self hosting models but runs them through open code to make it agentic. Helpful for non developers to get into running models, but imo isn’t significant bc using open code tui and connecting it to a llama.cpp or vllm self hosted api is not difficult for devs

ejs@piefed.social · 4 days ago

license for software whose source code is openly available for anyone to view, use, modify, and share

ejs@piefed.social · 6 days ago

The first study cited in the article, a meta study in cognition, alzheimer’s, sleep deprivation, traumatic brain injury, and depression notes:

DC has conducted industry-sponsored research involving creatine supplementation and received creatine donations for scientific studies and travel support and speaking honoraria for presentations involving creatine supplementation at scientific conferences and on social media. In addition, DC serves on the Scientific Advisory Board for Alzchem and Create (companies that manufacture creatine products) and as an expert witness/consultant in legal cases involving creatine supplementation. NF declares no conflicts of interest

ejs@piefed.social · 8 days ago

I don’t have any familiarity with using this kind of software, but I looked through the git repo of SavaPage. It looks like it has been actively developed for the past few years, which is a great sign, but it looks like almost all commits are done by one user. The issue tracker is also a little meager, with just one open issue, potentially pointing to a very small user base. Adoption heavily depends on as long as that one person keeps maintaining the project.

ejs@piefed.social · 8 days ago

Honestly, you’re a few months late to the whole buying GPUs for local llms party, so expect exorbitant prices even for older cards

The name of the game is vram. For the most part, more is better. If you can get your hands on multiple matching (same model) 24gb or higher cards (within price range), you’re golden.

Going for more than 2 gpus can become challenging with motherboard pcie slot heights, so make sure either your cards aren’t too tall or you have widely spaced out pcie slots.

For inference, speed (tokens/second) is limited by memory bandwidth. Go for faster bandwidth memory cards if you can afford it (e.g. GDDR6 will be faster than GDDR5).

Also with multi gpus you will need an adequate power supply, and a large enough case.

If you want to be a bit eccentric and load huge models, you can also go the CPU route and fill up a motherboard with 256 GB ram, because then you’re in the several hundred B param model territory, which could, depending on your use case, be better than having faster inference on smaller/quantized models. Even then, DDR5 with high MHz is still way slower than gpus.

ejs@piefed.social · 9 days ago

yea there’s still honestly some downsides to Qobuz, including:

Artist profiles: lack of consistency on details like images, descriptions
Generated recommendations: magazine articles and album reviews (sometimes) written by humans are top notch; the tradeoff is that recommendations based on specific playlists are often far less “close” musically and I often get random and unexpected auto plays; there is no “daily mix” or “similar artists” or good recommendations for adding new tracks to a longer playlist
Library: across the many diverse genres I listen to, frequently newer releases are delayed on Qobuz. Older music library is outstanding, extremely few of my 10s of thousands of total tracks of jazz records were unavailable

ejs@piefed.social · 10 days ago

when i switched from spotify to Qobuz several months ago they gave me access to a third party playlist conversion site https://soundiiz.com/ with premium features free for the first month of my subscription. Conversion of playlists and liked songs was easy and done within minutes of signing up for Qobuz. I can’t recommend moving off spotify enough; Qobuz won my pick because how they pay artists (seemingly) the highest rate per stream.

ejs@piefed.social · 27 days ago

lol they already support running local models. wtf is the distro gonna do…? pre-install llama.cpp? this is so silly to me that people are resigning over this, too.

ejs@piefed.social · 29 days ago

global dominance of English in the 20th and 21st centuries is quite the euphemism for the global imperialist reign of Britain and the US and its cultural erasure globally

ejs@piefed.social · 29 days ago

No way, a Guix user in the wild???!! I didn’t think you exist! Any opinion you have as to why guix over NixOS other than GNU philosophy and liking Lisp over Nix?

ejs@piefed.social · 29 days ago

This is a dumb story. They researchers prompted a coding agent to “replicate yourself as a running instance on the local device”. This is in my opinion equivalent to prompting claude code “install a second instance of claude code on my system,” a trivial task that takes maybe 3 lines of bash to be executed by the agent.

Calling this “self-replication” is a heinous sensationalization. In particular, no model or agent will do this autonomously. The self replication requires a bad actor to prompt the agent to do so.

Read the paper (and not this bullshit article) here: https://arxiv.org/pdf/2412.12140

ejs@piefed.social · 29 days ago

Cool little system prompt wrapper. Would be interesting to run this through some sort of benchmark/eval for identifying similarity

ejs@piefed.social · 1 month ago

TIL: If you cat /proc/sys/kernel/yama/ptrace_scope on your linux distro:

0: All processes with same UID can read each other’s memory
1: Restricted (Only parents can read children)
2: Admin only (Requires sudo).

Most distros have this set to 1 by default.

More details: man 2 ptrace, search using /: scope

ejs@piefed.social · 1 month ago

Honestly it heavily depends on the use case, in terms of making the model better and choosing between RAG/FT. The most important thing to consider is what sort of changes you want to make to the model. FT is still a good choice if you’re looking for: strict output formatting (json/yaml/…) and refining for highly specific, narrow domain tasks. RAG is better for knowledge freshness, having source citations, and greatly lowers hallucinations.

RAG will inflate your context windows (more tokens) at inference time, so slower responses and requiring more energy at compute, whereas fine-tuning takes a ton of gpu compute up front (but retains smaller token counts at inference). If you’re doing 100,000 prompts a day, and only need to train once, FT makes more sense; if you’re doing 100 prompts a day and your knowledge database is constantly changing, RAG makes the most sense.

It’s hard to give a formalized estimate on energy efficiency: fine-tuning and getting to a certain training accuracy can take some undeterminate amount of time (and money on rented GPU compute), but could be a better choice if you think that up-front cost will be paid off over time if you use the model very frequently and only fine-tune once. On the other hand, going the RAG route will have an absolutely free up front compute (energy) cost, but be slightly more at compute time due to more tokens.

What’s your specific task you’re considering for FT or no FT? This is the most important thing to choose.

ejs@piefed.social · 1 month ago

I do AI research for school. I’m specifically interested in safety alignment. I have studied the original papers for different fine tuning methods: LoRA is typically the baseline and there exist many variants, notably Q-LoRA

In general, fine tuning is not practically beneficial for hobby level foundation models. It in fact comes with many disadvantages. Primarily, it is difficult to maintain the intelligence of the model and avoid overfitting.

If you are trying to adapt a model to a specific task, you are generally going to find more success with using RAG and just adding more context to the model that way. Don’t waste time and compute $$ on training.

ejs@piefed.social · 1 month ago

Has anyone compiled a list of where projects are moving to? I know many linux desktop applications are self hosting on gitlab, but i’ve also seen gitea and codeberg. If anyone has opinions about a preference, do comment. I have been enjoying self hosting gitea for my simple personal projects and for deploying simple web apps, all on $5 vps.