Do you use models developed in China?

RoadTrain@lemdro.id · 2 days ago

Do you use models developed in China?

greplinux@programming.dev · 4 hours ago

Absolutely. China’s models are very advanced - especially Qwen. I don’t code professionally - just for personal projects. If I used these tools as an employee in a company, I’d defer to that company’s wishes. Regardless of whether it’s U. S. or China, these models are using and probably storing my data for further training. As they should! I don’t care - they’re providing me with powerful tools. I use free tiers and jump between them all. ChatGPT, Perplexity (general search), DeepSeek, Qwen (advanced programming), Google AI Studio and others. I’m grateful, not fearful.

MTK@lemmy.world · 2 days ago

Everyone is stealing your data, the US is doing so in the most intrusive and harmful way by far. If you don’t mind using chatgpt, you shouldn’t mind deepseek or qwen.

But really, you should avoid all of them as much as possible.

Sabata@ani.social · 2 days ago

I been running Deepseek and Gemma locally recently but like to hop to whatever tops the charts and I can manage to run at home. If I need to do anything with a big model I got a few API credits for Mistral but try to avoid it.

I don’t like using anything online, especially out of China or the US. If I got to use one online I treat it as publicly posted because we have no privacy.

xodoh74984@lemmy.world · edit-2 2 days ago

I use open source 32b Chinese models almost exclusively, because I can run them on my own machine without being a data cow for the US tech oligarchs or the CCP.

I only use the larger models for little hobby projects, and I don’t care too much about who gets that data. But if I wanted to use the large models for something sensitive, the open source Chinese models are the more secure option IMO. Rather than get a “trust me bro” pinky promise from Closed AI or Anthropic, I can run Qwen or Kimi on a cloud GPU provider that offers raw compute by the hour without any data harvesting.

Valmond@lemmy.world · 2 days ago

Any idea about the minimum specs to run them locally? Especially VRAM.

mapumbaa@lemmy.zip · 4 hours ago

I believe the full size DeepSeek-R1 require about 1200 GB of VRAM. But there are many configurations that require much less. Quantization, MoE and other hacks. I don’t have much experience with MoE, however I find that quantization tend to decrease performance significantly. At least with models from Mistral.

greplinux@programming.dev · edit-2 4 hours ago

VRAM vs RAM:

VRAM (Video RAM): Dedicated memory on your graphics card/GPU - Used specifically for graphics processing and AI model computations - Much faster for GPU operations - Critical for running LLMs locally

RAM (System Memory): Main system memory used by CPU and general operations - Slower access for GPU computations - Can be used as fallback but with performance penalty

So - For basic 7B parameter LLMs locally, you typically need:

Minimum: 8-12 GB VRAM - Can run basic inference/tasks - May require quantization (4-bit/8-bit)

Recommended: 16+ GB VRAM - Smoother performance - Handle larger context windows - Run without heavy quantization

Quantization means reducing the precision of the model’s weights and calculations to use less memory. For example, instead of storing numbers with full 32-bit precision, they’re compressed to 4-bit or 8-bit representations. This significantly reduces VRAM requirements but can slightly reduce model quality and accuracy.

Options if you have less VRAM: CPU-only inference (very slow) - Model offloading to system RAM - Use smaller models (3B, 4B parameters)

greplinux@programming.dev · 4 hours ago

deleted by creator

MTK@lemmy.world · 2 days ago

Generally, the file size of the model is slightly larger than the VRAM needed. That’s an easy way to estimate VRAM requirements.

Valmond@lemmy.world · 1 day ago

Thank you! This is valuable information.

xodoh74984@lemmy.world · 10 hours ago

Sorry for the slow reply, but I’ll piggyback on this thread to say that I tend to target models a little but smaller than my total VRAM to leave room for a larger context window – without any offloading to RAM.

As an example, with 24 GB VRAM (Nvidia 4090) I can typically get a 32b parameter model with 4-bit quantization to run with 40,000 tokens of context all on GPU at around 40 tokens/sec.

TheLeadenSea@sh.itjust.works · 2 days ago

I ran some 7B models fine on my old laptop with 6GB VRAM. My new laptop has 16GB VRAM and can run 14B models fast. My phone with 8GB normal RAM can even run many 2 or 3B models, albeit slowly.

itsame@lemmy.world · 2 days ago

I run AI locally for privacy reasons and because I sometimes work without Internet connections. The main purpose is to correct my English for formal letters and emails. I found the Gemma3 model to be working best on the limited resources of my laptop. Deepseek-R1 does not work at all, ‘Thinking…’ for a long before giving an answer (if at all). The text that Qwen produces is insufficient. So I choose a non-chinese model for practical reasons.

fitgse@sh.itjust.works · 2 days ago

I use them heavily but through deepinfra.com

They work great.

I personally would not use them through a Chinese provider, but I also wouldn’t use Gemini through Google either.

hendrik@palaver.p3x.de · 2 days ago

I guess the average unpaid AIStudio, ChatGPT or Grok records all your interactions and private data as well and might use it for future purposes. The Chinese might be way more relaxed with people’s data, though. I tried various AI services. I try not to put in personal data in general. And I’m more careful with Chinese apps and services.