Thread 'Running local AI inference without tying up my GPU 24/7 – any better options?'

Message boards : GPUs : Running local AI inference without tying up my GPU 24/7 – any better options?
Message board moderation

To post messages, you must log in.

AuthorMessage
nzt2048
New member

Send message
Joined: 15 Jun 26
Posts: 1
United States
Message 119403 - Posted: 15 Jun 2026, 19:05:03 UTC

Hey everyone,

I've been crunching BOINC on a rig with a decent RTX card (keeping it dedicated mostly to GPU projects like Einstein or whatever's available), but I've also been experimenting with local LLMs for some personal/research stuff — things like coding assistance, data analysis, and creative writing tasks. The problem is obvious: running even moderately sized models for inference locally eats GPU memory and power pretty quickly, especially if you want decent speed. I don't have the budget or space for multiple high-end cards right now, and I hate throttling my BOINC contribution just to chat with a model.

Has anyone found good ways to offload AI inference without self-hosting everything? Quantization helps a bit (I've tried GGUF with llama.cpp and Ollama), but for bigger or uncensored models it's still a pain on consumer hardware.

**Update:** Just found icelake.io and have been using them (along with some SAM tools) for inference. It's private, no heavy local GPU load, and handles the heavy lifting remotely while giving me full control/flexibility without the usual corporate filters. Pretty handy when I just need quick results without dedicating my rig.

Curious what others in the GPU/BOINC crowd are doing these days — cloud options, remote inference services, or clever local tricks that play nice with volunteer computing? Any recommendations or gotchas I should watch for (power draw, latency, privacy, etc.)?

Thanks!
ID: 119403 · Report as offensive     Reply Quote

Message boards : GPUs : Running local AI inference without tying up my GPU 24/7 – any better options?

Copyright © 2026 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.