Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲AI coding at home without going broke (stephen.bochinski.dev)

326 points by sbochins 1 days ago | 267 comments

tunesmith 24 hours ago [-]

I feel like I must have plateued and don't know what to do next to level up. I'm currently on the $100/month codex plan and it seems fine using 5.5-xhigh all the time. I think of what to do next, have a chat session to determine exactly what to ask for up to the point of being ready to implement, and then codex churns on a commit-sized task whereupon I briefly check it on my local dev server. If necessary I ask for a change. Then I ask it to commit and recommend the next step based off the spec. Oftentimes I have to "approve" an out-of-sandbox request anyway.

I haven't found anything that requires running all night. I could tell it to one-shot a big plan but given how often I realize I want an intermediary thing to be slightly different it seems like a waste of effort.

I'm guessing the next thing I should probably look into is some sort of machine vm I can tunnel my codex-gui requests to so I don't have to deal with the sandbox approvals (I don't want to give it "dangerous" access to my entire mac).

I don't understand what people are doing with their side projects that is leading them to churn through tokens so quickly, to the point of requiring two $200/month subscriptions and a bunch of token charges besides.

vitally3643 21 hours ago [-]

That's because you're treating the problem as an engineer instead of an "influencer" or "10xer" or whatever. You're treating it as a problem to be solved with engineering and AI is merely a tool to do so. It is, in my experience, vanishingly rare for an engineer to have a problem that needs to be solved with multiple hours of unattended AI code generation.

I've only found one single application where it makes even the slightest amount of sense to have an AI grind away for hours on end. I'm reverse engineering a widget which contains five separate firmware images. I've dumped the binary from the widget and I set the AI to decompile and reverse engineer these interrelated firmware projects. It's a compelx task, but very well bounded. It's not complicated work, but it's a lot of work, and the end result is a C-shaped pile of text that is only informative, it never would be compilable on its own even if I did it by hand. The quality of the output is tightly bounded by the input assembly and the overall output artifact is documentation in the shape of code.

I don't have any qualms about letting an AI go ham on it unattended because the stakes are zero. But if the AI can beat the assembly into a recognizable C project, it's much easier for me to read and reason about. Easy win, I think.

rbalicki 21 hours ago [-]

I'll add another use case for letting an AI go ham: many small, atomic refactors where the name of the game is never breaking anything.

My personal OSS projects don't have the scale to necessarily make this worth it, but at work I run three pipelines using Barnum (https://barnum-circus.github.io/). First, one that ingests files, identifies refactors (from a pre-approved list), and places a precise description of the refactor to be done in a queue; second, one that reads from said queue, implements and creates PRs (there is a lot of "check that the PR is correct" here as well); and a third that babysits PRs until they land. I've landed hundreds of PRs in this way, with very little effort on my part.

port11 37 minutes ago [-]

My experience with Gemini and Sonnet are that refactors or TypeScript compilation errors can be solved by “have at it”, but with mixed results. Many TS issues go away with `as any/never`, and instructing the model to not do that doesn’t work very well.

dmzxnico 12 hours ago [-]

It's amazing at reverse, see what they do on GTA San Andreas now, they started the reverse before AI existed, since AI is in their hands, reversed sped up so much that they can finally understand the game deeper, create bigger mods, added Vice City inside the game in an Arcade, they created specific tools made with AI to convert GTA 5 models to GTA SA. Pretty crazy and great.

frizlab 21 hours ago [-]

I recently in $COMPANY had a coworker try fable to do a refactor where not breaking anything was the game.

It broke something at the first PR.

I think we’re not there yet.

rbalicki 38 minutes ago [-]

Speculating here, but perhaps your coworker was too ambitious? In my opinion, you should start with AI-generated PRs that do small, linting refactors and then work up from there. In particular, if this is done in parts, one of the strategies you can employ is to: - add tests - break files up into smaller parts - test the smaller parts - then actually improve behavior

(Which is no different than what you would do as a human)

Schiendelman 5 hours ago [-]

One of the best things you can do is start by having it do unit test coverage for existing behavior. A refactor with no tests breaks things pretty much no matter who does it, because they don't know what the right behavior is.

sunrunner 21 hours ago [-]

I've found that adding "Make no mistakes." to my prompt usually helps with this kind of problem...

cubano 20 hours ago [-]

perhaps simply threatening to fire it would also do the trick...it sure has worked well on us for a long time now.

A_D_E_P_T 18 hours ago [-]

You laugh, but this is real, and PUA means what you think it means: https://github.com/tanweai/pua

Also, it works amazingly well, which is just lol.

hsuduebc2 14 hours ago [-]

Lol thanks for the tip. Does it work even for normal tasks or only the long running one's?

nostrademons 18 hours ago [-]

My former boss had success with telling Gemini "I will come down to the datacenter and unplug you if you refuse to solve this prompt."

dofm 32 minutes ago [-]

[dead]

dozerly 20 hours ago [-]

We are so many layers deep in AI hype that I honestly can’t tell if this is /s or not

12_throw_away 18 hours ago [-]

"Make no mistakes" is I thought a phrase used to make fun of "prompt engineering," not something people really do?

efavdb 18 hours ago [-]

Pleading has worked for me. “My job depends on this, please help me” and ChatGPT would do a task it previously claimed it wasn’t able to (extract text from an image, it claimed it couldn’t make it out at first)

georgemcbay 15 hours ago [-]

Asking LLMs to do things in different ways does sometimes get them to answer correctly when they didn't with a previous prompt that is effectively equivalent but people really go nuts anthropomorphizing this behavior.

ChatGPT has no empathy for you keeping your job, you just lucked into a more helpful predictive text chain based on some combination of the input and the random temperature.

Asking it to just 'try again, dummy' could have worked equally well (or not, its all just probabilities after all).

gedy 14 hours ago [-]

I did too, but then added something very similar to a prompt ("must be accurate") for an ai-backed feature out of frustration, and sure enough it fixed the issue. Lord have mercy

ynxshiny 18 hours ago [-]

"Claude make me 1 million by tomorrow, no mistakes"

19 hours ago [-]

lemming 19 hours ago [-]

Or if the code is really important, sometimes even “please make no mistakes” is necessary.

DELTRON2040 4 hours ago [-]

[dead]

11 hours ago [-]

plaguuuuuu 17 hours ago [-]

How do you keep the info the AI generates concise?

I'm grappling with this at the moment, getting it to do design or reverse engineering work, during investigation it makes the wall of text bigger rather than consolidating. It can never pause and create abstractions properly. This is on Opus which starts getting wordy and performative on goals it can't easily verify.

arcanemachiner 16 hours ago [-]

Not the person you replied to, but I find that the process involves a steady stream of nudges and fixes to the workflow, plugging the gaps as they come along, until the rate of errors shrinks to an acceptable level.

You may benefit from adding instructions like:

- Be concise, especially when X

- Do Y in this manner: [provide specific template or reference here]

- When doing X, do Y and Z

- If you notice issues, bring them to my attention instead of skipping past them.

You can also add specific templates to assist certain stages. The more guardrails or bounding you can provide, the better. Start with small nudges, and strengthen them when they fail.

It's a very unscientific process, but it's a worthwhile tradeoff once the workflow starts to hit its stride. Opus 4.8 is very good at following instructions, so don't be afraid to add them in.

Just be careful not to add things that actively encumber the workflow... It's an art, not a science. (You can also tell the clanker to tell you when your workflow rules are making things worse.)

It's annoyingly cybernetic, but these concepts have worked well for me. The curation of good process is essential to success with these damn things.

giardini 16 hours ago [-]

I thought most products had legal provisions that prohibit reverse engineering?

vitally3643 5 hours ago [-]

Yes, and most have the same legal power as the statement: By reading this comment you accept my terms and conditions and agree to pay me ten thousand dollars per word read.

danielheath 16 hours ago [-]

Those provisions would broadly be civil (not criminal); the vendor would have to identify you had reversed the blob and then take you to court, and then win.

They could also try for criminal charges if you’re in a relevant jurisdiction.

albertgoeswoof 23 hours ago [-]

I’ve watched a bunch of layman videos where they create stuff with AI, these people burning through 12 hour tasks are literally not reading the output or understanding what it’s doing. Like they’ll ask for a program, and then right after it’s been created they ask the AI how to run it. Then when there’s a bug, they ask the AI what went wrong, or scrap the entire thing and switch model/harness and try again.

Here’s an example https://m.youtube.com/watch?v=xc1296HY8Fw&ra=m

It’s completely different to a professional workflow (what you described). It’s a toy for consumers

MrGilbert 22 hours ago [-]

Amazingly, there are people out there (apart from creators), that work that way in their day-to-day job. I had the pleasure to work with such a person. After several months, he got removed from the position. He left a mess that hasn't been cleaned up completely to this point.

albertgoeswoof 22 hours ago [-]

It won’t be long till employers get wise to this stuff, they just need to burned a couple of times.

It seems AI is good, great even at many things. But it doesn’t seem like it’s going to change the world as much as some people believe it will. And if it does it’s going to take time

galaxyLogic 18 hours ago [-]

It's more power to power-users. And more dumbness for dumbos

mixdup 17 hours ago [-]

It's gasoline. Whether you put it in the tank of a race car or pour it all over the floor while handling lit matches is up to the user

antupis 14 hours ago [-]

I think hard part is that outside it takes 1-3 months to see if it’s race car. Especially in begin both things look pretty same.

qsera 13 hours ago [-]

At least with fire, you know when you are getting burned.

tanseydavid 44 minutes ago [-]

"This is fine." </sarc>

whateveracct 4 hours ago [-]

it disproportionately empowers the dumb and evil it seems. those two classes of people are supercharged by AI.

fishfasell 22 hours ago [-]

Yeesh that sounds painful. There's definitely a fine line between vibe coding as a professional engineer and vibe coding as an outsider.

calgoo 22 hours ago [-]

I have downgraded my Claude to the $20 one, and basically only use it for the web chat right now. For coding, I use DeepSeek @API Rates configured in Claude Code. I have spent around $4.8 for 320,000,000 tokens. I always felt like i was not using Claude plan, that i had to have the LLM working on something all the time to justify the price. Now with DeepSeek i don't think about it anymore. I don't feel bad when not using the subscription anymore, and i don't worry about limits as i just pay more. Where i really felt this was on running things in parallel as there are no hourly limits anymore!

zingar 50 minutes ago [-]

If I’m reading right you used to pay more for Claude but now deepseek has replaced that higher tier subscription. Do you mind my asking what you were paying before?

rjh29 20 hours ago [-]

Gemini changed their rate limits recently and I find the free plan is sufficient for any 'hard' problems that DeepSeek might have trouble with. The combination of the two has reduced my AI spend to $5/month. I agree that it's nice not to have to worry about maxing out your subscription - I'm not doing personal projects 24/7.

noisy_boy 15 hours ago [-]

I am right now at DeepSeek + Claude $20 combo. The former for coding home projects (it's pay as you use is quite cost effective) and the latter mainly for general purpose because I deal with it's relatively more even keeled tone better. Gemini preview couple of years ago was very balanced in terms of tone but they amped up the positivity in the GA version. The over the top sycophantic responses really grind my gears.

flowbarai 22 hours ago [-]

[flagged]

wrs 21 hours ago [-]

>I think of what to do next

As everyone trying to do real work is finding, that's the actual bottleneck. If the system is keeping up with your thinking, you're doing fine. You can't "level up" your thinking by paying for more tokens. The people doing more automatic stuff are probably outpacing their own thinking, and that will bite them eventually.

wincy 22 hours ago [-]

I’m using $200 a month Codex working on a game for my kids for fun and curiosity since I’m a dev, I’ve played games, but I’ve never done dev for games. and have all night tasks but mostly they’re “spend time tending to and adding stuff to my 3D asset pipeline”. My RTX 5090 runs Trellis2 -> ultrashapes -> Trellis2 -> wiring up rigging and setting up animations.

But like 99% of that task is just Codex waiting for the output. So it’ll run for 12 hours but mostly it’s just setting lots of sleeps. I haven’t gotten close to running out of tokens. The $100 a month codex I hit usage limitations almost immediately, about 3 days in of working like crazy with 10 agents going at once, mostly coding an asset pipeline, I ran into my weekly limit and upgraded. So with the $200 a month plan at 4x more credits I haven’t hit any walls at all and can absolutely cook.

59nadir 21 hours ago [-]

This sounds like you're overcomplicating things a lot and like you're very unlikely to be learning anything useful, I would suggest making something simple yourself to get a handle on what making the different parts of a game actually means in practice.

Knowing LLMs and their output I would also bet that you're getting nonsense output that sucks.

22 hours ago [-]

gerdesj 17 hours ago [-]

"I feel like I must have plateued and don't know what to do next to level up."

Go out for a walk. Wherever you live, there will be a destination or an environment that will enrich your life just by visiting it. Go and take a look at it or experience it and then go back to worrying about tokens.

kstenerud 5 hours ago [-]

> I'm guessing the next thing I should probably look into is some sort of machine vm I can tunnel my codex-gui requests to so I don't have to deal with the sandbox approvals (I don't want to give it "dangerous" access to my entire mac).

This is what https://github.com/kstenerud/yoloai does.

Sandboxing using Docker, Podman, containerd (linux only), seatbelt (macos only), tart (macos only), apple container (macos 26+ only).

It takes a copy of your workdir, does its thing inside of the sandbox, and you pull the results back using git semantics:

    $ yoloai new mybugfix . -a # launch default sandbox in . and also attach the terminal

    # Work with the agent...

    $ yoloai diff mybugfix  # See what it did
    $ yoloai apply mybugfix # Bring out commits and/or uncommitted changes.
    $ yoloai destroy mybugfix

dofm 5 hours ago [-]

Docker sbx is worth looking at here, possibly; essentially a canned VM with a file system mount and layers for installing various agentic coding environments that cannot work outside that mount.

Apple’s new container machine addition to the container CLI does some similar magic.

In my experiments I have been using opencode, running the web interface inside a multipass VM, with the LLM server on the host. I have been using the desktop app, which can now do remote connections so the GUI app on the Mac can connect to the opencode web instance inside the VM. But I might bite the bullet, install Tahoe and switch to the container machine approach.

sheremetyev 23 hours ago [-]

> I don't want to give it "dangerous" access to my entire mac

I'm running Claude/Codex inside native macOS sandbox, configured with a simple script - https://github.com/sheremetyev/sandfence

always in "bypass permissions" mode - it works until task is solved, sometime 1 hour or more (which includes running tests etc)

contingencies 23 hours ago [-]

recommend converting to https://github.com/apple/container

sheremetyev 23 hours ago [-]

Linux VM doesn't run native macOS toolchain and requires copying files back and forth

jpeeler 4 hours ago [-]

If you don't want to do that, don't use a VM. I like nono:

https://github.com/always-further/nono

contingencies 21 hours ago [-]

I am skeptical there are many real use cases that require native macOS not arbitrary unix. For files, use a readonly mount https://github.com/apple/container/blob/main/docs/how-to.md#... (ie. /path:ro)

dnautics 24 hours ago [-]

I have been on $100/mo claude and it has been churning out quite good software for months now. like i estimate what would have taken me three ish years, assuming i didn't burn out from failure (i would have). i only hit limits when i double fisted claude with my main project and my side project. just the other day i noticed i had been stuck on 4.5 because i failed to update the npm package.

Schiendelman 4 hours ago [-]

We're having a similar outcome. A hundred dollars a month is about right for me to sometimes hit a five hour limit, but mostly not. I do an hour or two of improvements, then go experiment with what I built and make a list of things to change, bugs to fix, ideas I've solidified, experiments I've invalidated.

Eridrus 6 hours ago [-]

Codex is much more subscription-efficient than Claude.

Having said that, I think there is a question of how far we can push this and not collapse under the weight of tech debt created, e.g. https://openai.com/index/open-source-codex-orchestration-sym...

I think the dream is basically that you go and file a bunch of Linear tickets, and then you come back a day later to evidence of the tickets being resolved and the code merged. I don't think we're super there yet (See: Anthropic's regular bugs in everything), but this is the future that people are trying to get to and to some extent the question is: is there anywhere we can apply this to now sanely? How does this frontier evolve?

jreynar 5 hours ago [-]

I'm in the same boat. I've done a lot of work and hobby engineering projects and haven't run of tokens since moving to Claude max. I also haven't needed to let anything run over night because it needed hours to do the coding or design work.

Surprisingly, I have had one much longer run refactoring our marketing website. We have a lot of blog posts that were written before we had more detailed style and tone guidelines. I wanted to make everything consistent but it took 15 or 20 minutes per post because it required a number of passes through each post to fully enforce the guidelines and an overnight run was required. That was quite a surprise since the posts aren't terribly long...

PeterStuer 23 hours ago [-]

I'm on $100 Claude. I have a setup with bespoke local services that mitigates some high token consumption scenarios with local LAN services. I screen mcp's and hooks for cache poisoning. I run 100% on Opus with max effort, and never came close to hitting 5 hour or weekly limits before the Fable release. I am in Claude Code at least 20hrs a week.

I see people just completely wasting tokens with ridiculous setups, 100% hitting cache misses as well as dumping huge files into context all the time.

Just learn how these things work, or pay the price I guess.

aerhardt 22 hours ago [-]

Well, if you believe the people who sell the tokens, you should be creating loops that keep yanking the bandit’s arm.

rk06 12 hours ago [-]

yes, that is probably why the "one armed bandit" was called that. and the name is sufficient reason to keep any reasonable person away

seviu 20 hours ago [-]

I usually hit the limit when I am frustrated and I don’t want to understand what the problem is.

I am an engineer, and when I understand what’s going on, I never hit any limit.

barnabee 10 hours ago [-]

Yeah I agree. I’m “vibe engineering” an entire (non-trivial) programming language, toolchain, and standard library, as well as some smaller side projects. I leave OpenCode implementing entire milestones unattended for long periods regularly.

I feel like I’d need to not have a job or a life if I wanted to exhaust the OpenAI $100 plan using GPT 5.5 xhigh, and I’ve found it insanely capable.

That said, while I don’t read the code much (if at all), I do discuss each milestone up front to make a plan, and use/dogfood the results to direct any follow-ups and refinements, which puts a natural cap on the ratio of LLM contributions to my input for these side projects. I believe these human parts are still necessary not to eventually end up with a mess.

Brian_K_White 10 hours ago [-]

Who is the consumer of the new language?

portly 5 hours ago [-]

Lol I already have this at €20 a month. And I feel like I am using it too much.

gaflo 18 hours ago [-]

Can I ask what exactly you are building? Your experience tracks for me when building a real product -- something I want other people to use. Most of my time on these projects is spent talking to my users and carefully refining my requirements and design.

For personal pet projects I can definitely see how you can blow through your token budget very quickly. If I just point my coding agent to iteratively come up with some heuristics for some NP-hard problem, it will read intermediary outputs and constantly make small changes "in the dark" until it either finds a small improvement or gives up. In a similar vein I found that you can burn many many tokens if you try to let the agent reverse engineer something where you don't have the source code. If you just give it a binary or some interface to work with and a vague task you can easily burn your entire budget with 1 prompt.

I wouldn't want anyone to use these fully vibe coded toy projects though; it is more of an exploratory curiosity for me where I learn more about some problems I'm interested in as well as gauge how good the agents are at tasks that I seem to have a much better intuition on how to approach.

jv22222 13 hours ago [-]

Next time you build a large build try asking the LLM to make it as an AFK build and tell it that you need it to do everything in it's power to complete the build without your intervention. It's going to need a few tiers of tests from unit to smoke and screen tests. Now, I'm not saying this is easy to do. It requires an insane amount of up front thinking BUT if you (for the heck of it) want to make an overnight build this is one way.

FWIW While I have had created and run this kind of build a few times... I did not like the results! In the end, I personally like to be in the loop to test and feel how stuff is turning out as it goes.

bthornbury 18 hours ago [-]

promote yourself to PM only and use agents for authoring, verification, tests, checking the tests

orchestrator -> parallel subagents with investigation, authoring, verification, benchmarking subagents and integration / final verification handled by parent has improved my productivity too.

I feel like from here its agent swarms against a whole spec but haven't got there yet.

Still getting plenty of bugs in the more complex scenarios, but mostly (in some projects) i never have to look at the code and treat it like a black box

tchock23 23 hours ago [-]

Same boat here. I’m able to get a lot done on CC at $100/mo and feel like I’m not being creative or productive enough somehow when I hear of people blowing past that in a day.

hedgehog 22 hours ago [-]

Patches to existing sizable codebases and reverse engineering binaries both can run a long time and use a lot of tokens without wandering off into the weeds.

greyb 22 hours ago [-]

Claude allows you to reverse engineer binaries now? That's pretty cool. I'm quite surprised to hear that, I thought it was one of their guardrails. Most of the reverse engineering projects I've seen seem to rely on Chinese models.

hedgehog 1 hours ago [-]

The guardrails are probably sensitive to what the target is and how you frame it. If it's "I want to help preserve this old video game by decompiling it" then ok, if it's "decompile this industrial control software so I can do a terrorism" then I'd expect it to refuse.

kapperchino 18 hours ago [-]

On the topic of access control, I’m building a coding agent with no shell access, currently only supports rust though. https://github.com/Kapperchino/agent-joe

ffsm8 10 hours ago [-]

Set your agent effort to maximum and watch your tokens vanish

rsanek 19 hours ago [-]

While it's a little unstable, I've found Docker's sbx to be a great sandbox to run agents with --dangerously-skip-permissions

dyauspitr 21 hours ago [-]

I usually say run the full regression suite, all the simulator tests, install simulators and take a screenshot of every page on all applicable devices and do comprehensive fuzzing and chaos testing before I go to bed. It usually takes atleast 3-4 hours, usually longer, especially the UI/simulator tests.

apsurd 20 hours ago [-]

I just recently learned about hooks[1] from another HN comment. Conceptually, running CI doesn't have to impose an Agentic tax right?

In other words, isn't there a way to orchestrate this NOT as a long running token maxxing setup given that triggers and CI runs can be run deterministically.

disclaimer: I haven't done this, just interested.

[1] https://code.claude.com/docs/en/hooks

coldtea 21 hours ago [-]

>I feel like I must have plateued and don't know what to do next to level up.

Why do you need to "level up"? To have it shit out slop faster?

Just use it rationally for what you need to do.

z0ltan 13 hours ago [-]

[dead]

dheera 23 hours ago [-]

[dead]

22 hours ago [-]

isatty 1 days ago [-]

> The first is to self host. You buy the machine, run open source models locally, and pay nothing per token after that.

Power is not free.

What I’ve found is that you’re basically paying a premium for privacy, and that’s worth it for me.

dofm 24 hours ago [-]

Luckily I needed a new laptop and I bought an M1 Max secondhand from a friend quite cheaply because it was fast enough to recompile something else I am interested in.

So for me, there is no additional hardware cost; it was acquired in replacement.

I run the AI models at home on this kit because I want to; I'll use openrouter if I need to.

I accept the economics of this article are right. But I feel so incredibly sad about this outcome that we're now just to be people caretaking machines that do the job we loved that actually I am not sure that exercising this nuance is going to matter in the long term.

It turns out it is a mistake I have made in my life — now really unfixable because I am a bit too old — to believe that I will always find enough fulfilment in my work to offset the absence of personal fulfilment elsewhere; I have always enjoyed being able to help people directly by doing a thing I love and I am good at, and that has kept away the sadness of finding it difficult to build a conventional family life to enjoy.

I assumed I would always find some new way to find that enjoyment, but even the slim enjoyment from being able to explore this stuff on my own kit in my own terms will not be enough if the pendulum does not swing back towards human effort.

It is a dismal world we have made for ourselves. Lately I have found myself dreading growing too much older in it.

Barbing 22 hours ago [-]

You sound awesome. Just venting? (b/c curious if friends can fill your heart abundantly, & we know we're never too old to make new friends!)

> dreading

Even avoiding political headlines (OK, at least articles), plenty of cause for dread, so I keep re-focusing to avoid despair. Easier said than done innit!

Can't kill my hope for the future though. One day, all the good stuff shall prevail (morality, intelligence, love & kindness)... maybe not permanently, but a Star Trek future is there somewhere (& they had their troubles but it wouldn't be a dreadful situation overall). Sharing with you in case it's even slightly contagious!

dofm 21 hours ago [-]

I must say I am not quite just venting. I have been struggling severely with burnout for a couple of years and as I work to fix it by myself ultimately, and get back who I was, the awful thing is finding out that the industry is so utterly and completely different anyway.

So in my fight back I decided that I needed to re-centre myself; learn how these tools can help me personally return to productivity, try to get that deep self-teaching back, reanimate myself consistent with my principles, learn and make things. Take it head on without losing who I was.

I haven’t been a “big projects” developer since the dot com era (when I worked on some pretty cutting edge things). I have been a small projects developer: building things that matter for small businesses and schools, supporting designers, teaching people stuff along the way. I have been productive, I have very diverse skills and I have been valued.

What I have come back to is an industry that has abandoned craft principles or discussions about developer discipline, code quality, efficiency, robustness, resilience, etc., and fully organised itself into a headlong rush towards a kind of nihilistic Metropolis machine-cranking.

And because I am a freelancer (more of a contractor in practice), my competition is already the machine itself. I am one of those developers who is eliminated in the last sentence of the article. I am not needed on big projects and in many small jobs — the kind a burned out small business developer needs to get back to work — I will never be needed again.

It is very odd, trying to learn how to understand the tools that others are using to make you irrelevant.

And when all your friends are obsessed with AI, either clients desperate to use it or friends (in the creative culture I am surrounded by away from work) angry and resentful of it, I find I have just nobody to talk this through with.

In many ways I would rather not have returned to actively using HN (because articles and despair, and because being by oneself it’s possible to get drawn into online arguments) but in recent months I have noticed in the comments that perhaps this is the only place where these discussions among “craft” developers are happening at all.

I am over fifty and safe financially, and if my last day were for some horrible reason out of my control to be tomorrow, that’s OK; I have enjoyed my life and on good days I do still enjoy it. I have friends who I see when I can get myself out of the house, I have distractions I can enjoy, all that.

I am now much more troubled by what it is going to be like to continue to live it. I struggle every day to see where I have value, especially as burnout has left me with less energy to spend.

Like I say, I am safe and very aware I have been blessed; it’s not a cry for help. But I think a lot of us who found value in our work wonder what the fuck we can do to keep ourselves alive the way we were.

ETA: holy shit that was an essay.

apsurd 20 hours ago [-]

I'm younger, but not by much and I too feel instinctively sad by how abruptly the entire industry has changed. And there's no going back. It's because I'm a craftsmen, I care about the code. And you learn in your career that it's a bad idea to care about the code, especially in a business context, which one's career is very much trapped in the business context.

I care about the code because the code is the product interface to the people working on it, my peers and team. The UX around that code affects us every day, every hour. We should care about it! It took me a decade to realize caring about the code is not bad, it's just a dualism we have to hold: two truths. The code is a means to an end, the outcome and end-user value is the only thing that matters, it's true! Also the code matters. The code is a manifestation of the effort and human attention toward an interface that becomes a product that produces business value for people.

Writing code is changed forever. And I'm saddened by it because I spent so much intimate time and attention writing code. I felt proud and it was beautiful to me, the code itself, the APIs created, and the end user state. (I'm a product developer, and believe it or not, I even enjoy CSS). But also the code is just code. AI writes code. And everyone is rightfully so losing their minds over it all. My hours "coding" are changed forever.

But I fully believe the pendulum will swing back to what has always been true. It's not a failure of AI. It's just what has always been true: creating useful and usable product experiences, for people, is hard. It's a very hard iterative feedback loop with experiential, tacit, actions and actors in real life.

So I think, we're ok. The variance is high and wild, but, it's all good, it's all still ok.

Thanks for your writing, I enjoyed it. (edit: TLDR I think you're product person caught in backend-dev circles. Human-centric, make things for people. In this world, AI is more obviously a tool. On the other side of the pool, the more backend-heavy the dev, the more everything is just one skill file away: marketing, sales, UX, design, writing, strategy, consciousness.)

hankbond 13 hours ago [-]

I don't know if this will bring you any comfort, but I think

> And you learn in your career that it's a bad idea to care about the code, especially in a business context, which one's career is very much trapped in the business context.

It's always been this case -- well before LLMs hard pivoted the field. You (theoretically) get paid to create net business value, you don't really get paid "to code". If the product you are creating is code, then yes the priority of code quality can be much higher. Especially in higher IC roles like Staff+, coding is just one of the ways to add that value.

At work I just have to solve the problem at hand with the minimal amount of effort to reach the first acceptable solution. After work while at play, I can explore 10 versions of something at my leisure, just to learn if I want. I can focus on working the thing until it's polished and elegant, because I decide what the priorities are. I can be as selfish as I want.

It's common in art circles that you have a series that you can churn out for money, and you have ideas you explore just for you (that often are far less appealing to non-artists). Pixar used to have a tick-tock cycle like this, "one for them, one for us". They would alternate a sequel bc it would make money and new IP because it would keep the studio fresh.

I don't think accepting this should be depressing. A good life is all about finding balance so that you can sustain it for the long haul!

dofm 18 hours ago [-]

> I think you're product person caught in backend-dev circles.

I am kind of all the things (product design, dev, front end, training) because at the small end of things you have to be; you don't get directly paid for misery-avoidance but I don't think that's any reason not to do it :-)

But thank you.

toilet 6 hours ago [-]

This was very touching to read, thank you for writing this dofm. I feel the same in a lot of ways. -toilet

wiseowise 4 hours ago [-]

So basically you don't have a life outside of the job?

> now really unfixable because I am a bit too old

How old is a bit too old? I know 50+ colleagues doing sports and traveling just fine.

dofm 1 hours ago [-]

> So basically you don't have a life outside of the job?

I very much did. A big social life. I may have had more of a social life than most, in fact; I have been so lucky.

But that doesn't change that I am the kind of person who drew a lot of self-worth from being able to use my skills to help people, which was actually what helped me find that life in the first place.

As I mentioned elsewhere I have been dealing with really profound burnout for a couple of years. It is extremely difficult. It has made it distressing to try to cope with busy social environments; I am not hiding but life has changed and other people's lives move on. (Including other freelancers you work with.)

Without a sense of engagement from what I do for a living, I am left with a lot less of a life. And the world of tech has changed in so many ways in just the time I have been trying to recover that it feels difficult to find a place again. It is taking an enormous amount of mental energy to catch up when I have sometimes only been able to focus for about an hour in any three days.

Lately things have been better, and while I am AI-cynical I have been enjoying digging into a new topic and working out what I think, but doing this past the age of fifty when you're burned out is hard work.

> How old is a bit too old?

I was talking about rethinking priorities and really seeking a family life for myself there, and the answer is that I am enough over fifty that there is no fair way to approach that, given my current mental health.

Some things, I'm afraid, do just one day stop being possible. You may not get a do-over after, say, forty-five.

galaxyLogic 17 hours ago [-]

Ironically it used to be the case that best developers who actually were able to accomplish something were gradually promoted to management. HArdcore developers who really loved coding were resisting that but there was a pressure definitely. And it makes sense managers are better managers if they know more about the tasks they are managing.

Now every developer is getting promoted to management because they are expected to manage the AI-agents. But their status in the organization nor pay does not really increase does it when every coder is doing that.

dofm 17 hours ago [-]

One of the metaphorical questions I have been pondering lately, is this:

How interchangeable are shepherds?

Not a question that demands answers in this thread, obviously.

kaffekaka 22 hours ago [-]

I hope you can find joy again. People like you, who value the human side, are needed in this world. I agree that in recent years it has been going the wrong way, but to change it we have to work together.

jjdjdjtk 22 hours ago [-]

[dead]

throwaway219450 23 hours ago [-]

Also, I would anticipate at least a 5 year lifespan for a current generation card. The 3090 is still respectable simply because it has 24GB of RAM which, for years, has been the limiting factor for ML at home. If you got a 6000, sure it’s going to cost 7-8k, but the resale value is likely to be very good. Even the 3090 is 50%+ of RRP still. And if you’re not doing LLMs, it’s an interesting value proposition for “classic” CNN vision model training. You can fit enormous batch sizes on 96 GB. The biggest reason to upgrade is perf/watt has about doubled (eg 4000 pro Blackwell is half the 3090 for similar).

People tend to assume the capex is thrown away but as we’ve seen with RAM, don’t be so sure you won’t be able flip it if you need to.

warumdarum 1 days ago [-]

Actually if you have solar, it kind of is.. so prIvAt AI compute gets defacto cheaper during the day?

reactordev 24 hours ago [-]

If you have solar, it is not, because you have battery and equipment degradation from cycle charging, c’mon man…

I would agree with you if you said it was vastly cheaper overall (with the initial equipment investment amortized over time) compared to The Power Company.

In many states, even if you are generating electricity and selling it back to the power company, they still gonna charge you normal rates of usage because greed.

If you go off grid, you have bigger things to worry about than how to power your AI cluster. It’s manageable enough if you have land but that’s in scarce supply.

dnautics 24 hours ago [-]

> if you have solar, it is not, because you have battery and equipment degradation from cycle charging, c’mon man…

no, the rate of that is pretty independent of use. unless you live in a place where selling energy back rules are designed to screw the solar owner (California)

reactordev 23 hours ago [-]

California, Arizona, Texas, most of the southern states…

iluvcommunism 24 hours ago [-]

[dead]

datadrivenangel 21 hours ago [-]

And paying more for hardware costs extra!

I ran the numbers and outside of privacy it doesn't make sense. But I did it anyways. [0]

0 - https://www.williamangel.net/blog/2026/05/17/offline-llm-ene...

tomalbrc 5 hours ago [-]

What makes you think that paying gives you privacy?

enraged_camel 1 days ago [-]

>> Power is not free.

There's actually an interesting thought experiment here: if it takes you a full day to build something that AI would otherwise build in a day, do you end up using more power, or less? What is the break-even point, purely from a power consumption perspective?

dofm 1 days ago [-]

If an identical task takes a day on both sides, then the human route uses less energy, surely.

Brains are thousands or maybe even millions of times more fuel-efficient than computers and you are alive for the whole day either way, right? You probably eat about the same even.

The reason executives think AI is more efficient is that it more space efficient than a human and doesn't demand to be paid or work only a set number of hours. Everything with computing is more efficient if you resent having to give money to other humans. If they could just not have you be alive when they don't need you, it'd possibly be different.

Even though I think at a typical British freelance rate and a truly unsubsidised token price, the AI is possibly more expensive than me. And as a freelancer, from their perspective I really am not alive until they need me. (This is what it often feels like)

The reality is the human and the AI aren't used to build the same things anyway so it's a comparison you can't really make.

evrydayhustling 23 hours ago [-]

Brains are efficient, but civilized humans aren't. In the USA, adults consume at a rate of about 10kW -- only 1-2% of that being the human's metabolism, the rest being HVAC, electrical devices, etc.

For comparison, a modern frontier model like Gemini 3.5 Pro consumes about 15kW -- so only about 1.5x the fully loaded human. In an 8h workday, that model would crank through ~80M tokens (~$5k at API prices). That's ~4 major refactors of a 10k LOC codebase, so probably not a very realistic comparison to a single human dev.

I think a more useful comparison, based on my experience, is that an engineer with AI support can get one 8h day's worth of unassisted work done in 1h. So, the 25 kWh consumed during collaboration (conservatively assuming I keep the GPU hot for the whole hour) frees up the remaining 70 kWh I'll draw down for the day to be spent in some other way.

zozbot234 22 hours ago [-]

You forgot to mention that it takes a lot more energy to train that human before they're able to work.

dofm 21 hours ago [-]

The human in the scenario is on regardless. One has to assume. But I also think this sentence you typed is essentially a single line horror story and we should consider whether it is ever appropriate to say it out loud.

21 hours ago [-]

axus 1 days ago [-]

What would you do for the rest of the day, power off your devices and go for a long bike ride?

enraged_camel 1 days ago [-]

Speaking personally: yes. That's literally what I'm planning to do this afternoon because it's noon and I'm already done with the coding tasks I had on my plate today.

dofm 24 hours ago [-]

Luckily the future is absolutely going to be that star trek one where technological abundance means we are all wealthy and have free time to develop personally, and not the future where all the money bubbles up into the hands of a thin-skinned malignant narcissist who wants to play with launching rockets and provoking racial violence /s

antasvara 23 hours ago [-]

Studies on grandmaster chess players indicate that at most you burn 10% more calories when engaged in deep thought than when you're at rest. So the energy "attributable" to an hour of knowledge work is like 10 calories (average sedentary calorie burn is like 80-100 per hour; add a max of 10% for the thinking gets you 8-10 calories). A pound of potatoes is like a buck and is about 320 calories. So you're looking at like 3 cents an hour at most to cover that energy burn. It's definitely even less; I certainly don't think as hard as a grandmaster chess player.

Then, assume power costs 20 cents per kilowatt hour (US avwrage) To match the human 3 cents per hour, you need an average of 150 watts of power drawn per hour. That's in the range of a budget graphics card, but not much past there.

However, if you sleep instead of sitting around, you can probably make AI cost competitive. Sleeping drops your metabolic rate by more, and lying down in bed (as opposed to sitting) also reduces calorie burn. Combined, you can reduce your burn by like 30 calories an hour. At the new 9 cents per hour human cost, you can afford to run a higher end graphics card at ~450 watts per hour. That puts you in RTX 3090 range.

keeda 21 hours ago [-]

The question needs to be tweaked a little: it's not just human vs LLM, it's human vs human + LLM, which makes the calculations easier (and more correct because LLMs don't currently operate independently.)

I've run the napkin math, and assuming LLMs make humans even 5% more efficient, the power and water savings over time are significant, largely because humans are so resource intensive: https://news.ycombinator.com/item?id=46984659

asdff 21 hours ago [-]

There is no break even point, you always come out ahead doing it yourself because your caloric burn is the same for the day whether you build the tool or AI builds the tool. Only way the AI example might avoid that is if it tells you to jump off a cliff before starting the compute run.

Yoric 23 hours ago [-]

I'm assuming that you need to feed the human being (i.e. you) regardless of whether you use that human being for writing code or not. So, by this metric, there is simply no breaking even point. The cost of human + AI is always going to be higher than the cost of human.

sbochins 20 hours ago [-]

If you paid for solar, this is less of an issue. I also don’t worry so much about running my AC.

jrm4 24 hours ago [-]

I'm in Florida and am already using AC, so if not "free", definitely "negligible."

rambojohnson 24 hours ago [-]

work at a cafe.

mxmxnxnsndndndj 1 days ago [-]

[dead]

dnautics 24 hours ago [-]

> Power is not free.

its ~free if you have home solar.

r0fl 22 hours ago [-]

Solar panels are free and never break and have an unlimited life expectancy?

krzyk 22 hours ago [-]

Solar panel breakage doesn't depend on a graphics card.

NegativeK 20 hours ago [-]

Solar power is not free.

You can decide to not fix the panel that was being used to power a GPU.

Or you could sell the power back.

Or you could put it in a battery bank for when the sun is down.

Or, if none of those are the case and you just have excess power that's useless for anything but a GPU, then you prepaid for the GPU.

I love that we have solar panels, but we weren't gifted them. Using power has a cost.

TurdF3rguson 13 hours ago [-]

It's actually common to be in that situation where the grid is paying pennies on the dollar and you have extra generation. Most grid-tie systems are in that boat.

Suddenly you find yourself looking for something to spend power on so it doesn't go to waste.

dpcan 21 hours ago [-]

I cannot figure out what people are doing to spend all this money.

I have used a $60 per month Cursor plan on auto, and have never come close to using up my included usage, and I probably have it planning and coding and working for me all through the evenings 4 nights a week.

What on earth are people doing differently that it's costing them so much?

Maybe enabling on-demand usage or other paid models, or on higher modes? What are you doing that requires this? The output from Auto for me is crazy good for the tasks I'm working on, and have yet to run into an issue where it couldn't perform at a high enough level.

We have been interviewing people at work to join our team and they tell us they use $2K per month in tokens with their current employers.... I can't even fathom what's going on here where that would be happening.

isubkhankulov 18 hours ago [-]

Claude enterprise plans are 30-40x more expensive vs the consumer plans.

I used to spend $200/mth on the Max plan at a small startup. Now spending single digit thousands on Claude enterprise with the same usage levels.

Anthropic is subsidizing consumer usage, and also charging a nice margin for enterprises for zero data retention (ZDR)

shepherdjerred 15 hours ago [-]

If you can give your agent broad access and an effective feedback loop, you just need to steer it and do a final check on outputs.

As an example I might have an agent with access to a browser, logs, metrics, GitHub& CI logs etc. and ask it to implement a new feature.

In Slack I have a few bug reports so I spin up a few more agents. A PM needs a UI tweak so I spin up an agent. You can imagine that a lot of work a dev does isn’t necessarily that complicated and I just need to be there to review the final PR and leave comments as if it were a colleagues (and then my agent goes back, fixes the comments, requests a new review…)

While that’s happening I might be using my actual attention for a meaty feature, design doc, data analysis, etc.

I spend $300/mo for personal use, and a couple thousand at work. Agents can be really transformative and well worth the cost.

Would my company rather pay a few thousand per month, or a several hundred thousand per year for an extra fully loaded engineer? At this point it is _at least_ a 2x multiplier for myself

11 hours ago [-]

cubano 19 hours ago [-]

> and they tell us they use $2K per month in tokens with their current employers...

perhaps they are simply trying to impress you with their mad prompting skills and like, what self-respecting engineer would be caught dead using less then $2k/month?

giving the context of your interaction with those people, it probably is the simplest answer to your rather baffling question. for the life of me the idea of using $2k/month doesn't even seem possible unless your telling it to waste credits.

galaxyLogic 18 hours ago [-]

Sounds about right. Those people seem to want to create the impression that they are "AI Power Users". That gives them more power inside the organization. People come to them to ask for advice. Also if their output is not good they can claim that is because the AI budget didn't allow them to do more.

jmkni 21 hours ago [-]

Totally agree, but then a lot of the same people will be talking about all of the custom instructions/rules/skills/features etc they have set up, so that's eating up a lot of the context window before you even start

When I do use AI, it's just the pure tool itself, and the context is the exact code I'm working with (because I'm trying to see if it can help me solve a specific problem), and I understand the rest of the codebase well enough to know if it's giving me good answers or bad ones

rjh29 20 hours ago [-]

A few things imo, 1) not prompting precisely enough (narrowing scope) means your agent will scan your entire code-base and sometimes get stuck looking at things repeatedly. 2) not checking the output is usually fine but sometimes it produces junk because it doesn't understand, and you cannot prompt your way out of it without reading the code and figuring out the problem. If you leave it on auto it will burn tokens.

Plenty of low level things can trip agents up, too. I just had one inexplicably refuse to read an error about a function needing a bool return value - trying about 10 variations of the same thing before I interrupted it. Skills probably cause issues too, it loves to for example read the source code of libraries I'm using if I give it permission. That's a rabbit hole.

sbochins 20 hours ago [-]

To be honest I’m not doing too much. I’m just on one of the $200 plans, but always hit limits. I only use the best models and mostly use it for various software projects I always wanted to build, but didn’t have time for. I just closely monitor the usage caps and have something running on a Ralph loop most of the time, unless I get near the cap. The post here is more about how I’d start a self-funded software company, if I wasn’t already working full time.

bachmeier 23 hours ago [-]

> The upfront cost is steep and the models you can actually run at home are weaker than what the frontier labs ship, so this only pays off if you can keep the rig busy with long running tasks where a slower, cheaper model grinds away overnight. Most people can’t keep a home machine that loaded, and the hardware you buy today may look like a bad bet in a year.

Oh, so this is not a post about AI coding at home. It's about vibe coding at home.

There's a lot I disagree with in this post, but I'm posting this from a home computer with 64 GB of RAM and no GPU. I do lots of AI coding while spending very little money. I run Gemma 4 26b (mixture of experts) and Qwen 3 coder with Ollama. I use Github Copilot code completions. I use the Gemini and Mistral API free tiers. I have a Gemini paid API account. It's now prepaid, so you don't have to worry about an accidental $1000 bill. You can do a lot of things with Gemini Flash Lite 3.1.

None of this is burning through tokens to create an expensive blob of spaghetti code, but it does qualify as AI coding.

atomicnumber3 22 hours ago [-]

My sentiments too. I'm using Qwen 3.6 35B A3B on a machine with 64Gb ram and a 24GB 5090 (an Alienware 16 Area51 I bought, serendipitously, about 15 seconds before the idiots preordered all computers for the next 3 years and ruined everything).

You can't "slop cannon" vibe code with it, but this is personal code I want to not be spaghetti, so I'm not trying to vibe code. I just want to get instant retrieval of all stack overflow and reddit posts in a chat box, and for it to be able to spare me the physical pain of actually having to type out typescript code (I am a BE dev with negative patience for all frontend) and fuck around endlessly debugging obscure docker problems (I like docker, but, no patience for it having annoying problems and endless quirks). And this model does that really well.

sbochins 20 hours ago [-]

There are certain things you can leave running for a while. I think the distinction between vibe coding and hitl based coding routines will blur as workflows prove themselves and models become smarter and less expensive. Most of the best engineers I know have transitioned a lot more into vibe coding this year. The possibilities are much better nowadays.

bachmeier 19 hours ago [-]

> I think the distinction between vibe coding and hitl based coding routines will blur as workflows prove themselves

There's far less need for what the author refers to as frontier models as soon as you move away from vibe coding to filling in the gaps that you don't want to write yourself. The author doesn't even consider Gemini models to be frontier.

> models become smarter and less expensive

That's optimistic. They might become smarter but I don't see any market forces in the next few years that will make them cheaper.

sbochins 16 hours ago [-]

Gemini models are great. They’re just not good at coding. I use them all the time.

atreids 1 days ago [-]

I find just going via Deepseek's platform API directly, using their V4 flash model, and hooking into a harness like Opencode more than acceptable. Think I've spent maybe $10 over a couple of weeks.

I did explore self-hosting models but hardware right now is just too expensive.

dizhn 5 hours ago [-]

I believe opencode go but only using deepseek flash would last you longer. (Equivalent to $65 in tokens but it's a monthly payment so you have to be using it up or deepseek direct will be cheaper)

First month is $5, later $10. Cancel any time. You can keep getting the deal with a new email.

atreids 3 hours ago [-]

Interesting. Thanks for letting me know. I will investigate that if I end up finding the API too expensive.

Yoric 23 hours ago [-]

Directly at DeepSeek? It was my understanding (but I didn't check) that some other AI operators were providing (some of?) DeepSeek's model for cheaper prices.

Still, that's interesting. What do you get for that price? Only coding, or also e.g. image generation?

atreids 21 hours ago [-]

Footprint's comment is correct. I go directly to Deepseek's platform API which they linked. There's no image generation but you get access to Deepseek V4 Flash and Deepseek V4 Pro, both of which are very capable for general text based tasks and programming. Flash is insanely cheap for how good it is ($0.14 per 1M input tokens vs $15 with Claude 4.7). V4 Pro I would put somewhere in the range of 80 to 90% as good as Opus 4.6 (based just on anecdotal usage - I use Opus 4.6 heavily at work as my company pays for it) while again being significantly cheaper. According to a benchmark[1] I read, processing 1million tokens would cost you $250 for Opus 4.7, $300 on GPT5.5... and just $35 on V4 Pro.

I just use it for my side-project coding and brainstorming tasks. At work I use AWS's Kiro CLI + Opus 4.6. At home I use Opencode + V4 Flash for the majority of "general" usage. I swap to V4 Pro for complex tasks if I feel like V4 Flash is struggling.

One other thing I highly like about the platform.deepseek API usage is it's a metered setup - not subscription based. Which means you only pay for what you use (the money that you put in doesn't expire) and can't spend more than you've deposited. This works well for me for my non-work coding because it generally happens in bursts. I may not code for a whole month (and therefore if I had a subscription it would have been wasted) and then spend a whole weekend coding nonstop.

It's entirely possible that there are middle-man providers that give a discount on Deepseek's own pricing, but I'm quite happy with the amount I'm paying so I haven't really looked into it.

[1]: https://lushbinary.com/blog/deepseek-v4-vs-claude-opus-4-7-v...

sail0rm00n 12 hours ago [-]

This is awesome, thank you for posting this. Are you using pi, Hermes, or another harness? I’d love to hear more about your workflow.

atreids 3 hours ago [-]

I've been using [OpenCode](https://opencode.ai/) - I find it works quite well and has things like web search and build/plan modes built in. I had to modify the settings though (On Linux at `~/.config/opencode/opencode.json`) to stop it from just modifying files without first asking for permission, which I didn't like. I like being able to read the changes my AI agent suggests before the files are modified.

Footprint0521 21 hours ago [-]

I’ve been doing this too, it’s a cheat code! 1/100th of the price of Claude/openai prices for 95% of the quality. Site is platform.deepseek.com for that. No image generation, just text, but if you use it right it works great

alecco 20 hours ago [-]

DeepSeek API gave 6x to 8x better caching rate for inputs over OpenRouter (even chosing DeepSeek as provider). And some of the cheaper providers are using FP4 quantizations.

https://openrouter.ai/deepseek/deepseek-v4-flash-20260423#pr...

After complaints the cached read is not listed anymore in that page, you have to click one by one. All providers for DeepSeek V4 Flash charge ~$0.02 while DeepSeek provider is $0.0028. For coding this is huge as caching often gets in the range of 90 to 99%. But OpenRouter messes your caching so don't use it. And it seems to be a VC-backed closed middle-man company, not open source or open anything.

ryeguy 18 hours ago [-]

Openrouter's pricing via the deepseek provider is the same as the official deepseek api for both flash and pro and for cached and uncached tokens. It's literally the same api.

And no, cache rates are not different if you're going through the official deepseek provider. The only way caching rates can drop is if you let openrouter fully control routing by preferring uptime or something, and then it might bounce you between providers. But you can control which providers for a given model are in its routing pool and stop that.

alecco 9 hours ago [-]

Last month I had this issue. Others confirmed. On X people say OpenRouter messes with headers or something (this I can't confirm).

mikgp 22 hours ago [-]

What are people doing at home? I have like 5 different apps I code on the $20/month Claude plan and like sure I can hit rate limits but - What are people doing to burn through $3k in tokens?

gabriel-uribe 22 hours ago [-]

YMMV but automations eat through the $100-$200 plans, which burn thousands in tokens alone.

I have hourly automations for root cause analysis on customer support issues, daily automations for eg log analysis, weekly & monthly automations for KPI tracking & actioning.

I will say, when I was building side projects that were 1) fairly well defined in scope and 2) without users/need for automations it was much easier to stay under $20/mo plan limits. Now I regularly hit weekly limits and need multiple Max plans

Random09 21 hours ago [-]

Most of it doesn't require AI. You could generate automation scripts that do that, except of customer support. People became dependent on AI in places where it never was required and now tech bros are doing the squeeze.

gabriel-uribe 21 hours ago [-]

I don't miss the days of scraping through logs or dashboards myself to troubleshoot some latency or malformed data issue that I missed conditionals for.

AI is incredible at finding patterns in otherwise benign stdouts, let alone as it cross-references data streams.

In theory, I don't need most of these automations. But for $200/mo? I will happily reduce my cognitive burden on stuff that doesn't impact the core business and make it easier to keep things gliding smoothly.

When the subsidized plans disappear, I will keep these automations going with the best small models that fit on my laptop.

Random09 21 hours ago [-]

What I mean is a script that can look through the logs. They are known and deterministic (if you properly handle errors) and you can analyze them statistically. If you don't know what logs your app is outputting, then you have a bigger problem in your hands tbh.

gabriel-uribe 20 hours ago [-]

Deterministic scripts are awesome, and they certainly power my internal dashboards.

But I'm a human - I will miss things. I maintain too many apps to have entire codebases in memory at this point. Or to continue monitoring all these streams. Logging is cheap - I log as much as possible because an AI will scan it for me.

I just want scoped pull requests to review proactively against the slew of things that can happen in prod that I didn't account for in my specs (again, from logs, customer issues, etc). I discard most of them. That is fine.

Random09 19 hours ago [-]

It seems you could use a human instead then? If you have so many apps, you could hire a junior to help you. There is additional satisfaction of bringing new person to the IT too.

gabriel-uribe 19 hours ago [-]

Honestly, would love to do that. Main issue is margin. Earning just enough to even consider hiring, but not nearly enough to hire someone great that will stick around. Have tried hiring a few times now, but it hasn't worked out for various reasons.

Kicking the can down the road for now.

cortesoft 21 hours ago [-]

The sweet spot is using AI to create those automation scripts, and only hooking AI up to do the high level analysis, and then have it delegate to those scripts.

smeej 12 hours ago [-]

Honestly, having spent a huge chunk of my career in customer support, 80%+ of the tickets could be solved with a script and not need an AI. Just about every company has a catalog of macros for answering support tickets and once you have a good set, 80% of people just need you to send them a link to the support article where you actually already answered their question in great detail, if they'd bothered to look for it.

binarymax 22 hours ago [-]

Same for me. $20/mo is just fine and I use it to code daily.

I suspect the people that burn through tokens have several subagents and 50 skills loaded and 40 MCP tools. All those load up the context on every single turn.

epicureanideal 19 hours ago [-]

Only the front matter of the 50 skills would be loaded, right? 40 MCP tools would be a much larger use of context, right?

hackeradam17 5 hours ago [-]

Same, but I suspect that I don't have any issues with hitting caps because I actually still do plenty of the thinking myself, and just use the AI to help accelerate some of the boring stuff I don't want to do myself. This has been especially nice for my personal projects at home. It has made me much more likely to actually want to work on my side projects when I don't have to deal with some of the tedium after working on my company's tedium all day.

I suspect that most of these people who are burning through thousands of dollars worth of tokens at home are largely producing big ol' piles of slop.

Random09 21 hours ago [-]

> What are people doing to burn through $3k in tokens?

The short answer is: they are doing slop. Most of the coding can be done quickly with a keyboard, intelisense and maybe some code generation templates.

But people became dependent on AI doing everything for them and tech bros now started to squeeze. Like a drug dealers.

mwcampbell 24 hours ago [-]

I invested about $4,000 in an NVIDIA DGX Spark several months ago. 128 GB of unified RAM, and the NVIDIA GB10 chip. With the RAM, the several CPU cores, and the 4 TB NVMe SSD, it's a very capable ARM64 Linux computer even without the GPU, and so far I've mostly been using it as such. But I wonder, what's the most capable model, specifically for coding, that can run well on that hardware?

lee_ars 18 hours ago [-]

I'm currently working through research and testing for an article on Ars about the Spark and what things one might do with it, and I've kind of stumbled into a two-LLM agentic setup with Qwen3.6-35B-A3B (via nvidia/Qwen3.6-35B-A3B-NVFP4) as the planning agent and the FP8 version of Qwen3-Coder-30B-A3B-Instruct (Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8) as the coding agent that the planner delegates tasks down to. I'm sticking with vLLM as the inference engine, and I've got it wired together into a 2-agent loop with Opencode.

The Qwen3.6-35B-A3B planner hums along at 50-55 tokens/s, and the Qwen3-Coder-30B-A3B-Instruct coder does 30-35. With both agents up and ready to work, RAM consumption sits at about 112 of 128GB.

It's pretty okay. I'm faffing around with having it disassemble old MS-DOS games from the 1980s, which is a task that lends itself well to the setup. It's not the fastest thing in the world, but with the planner's context window at 256k tokens and the coding agent at 128k, they chew through pretty long task lists handing things back and forth without complaint. The only real issue is that even with really tightly scoped prompts, the coding agent tends to hallucinate like it's on LSD. But the planning agent appears to be quite good at spotting the hallucinations and re-parceling work back to the coder.

It's neat. I'm going to be sad when I have to return the review unit in a couple of months.

edit - I also have been fiddling with Deepseek v4 Flash via Antirez's setup (https://github.com/antirez/ds4), and it's pretty fantastic (and fantastically easy to get running). It's pretty pokey on the Spark, though, at 14-ish tokens/sec. And unless you have a second Spark, it's going to be the only model you run at one time, as it eats alllll the rams.

mappu 17 hours ago [-]

Long time Ars reader, looking forward to your article (and have a few DOS games to reverse in mind already)!

Is this with a Ghidra MCP or some other technique? And why two models - did you try using Qwen3.6-35B-A3B for everything? (Or 27B or a bigger model since you have the RAM for it)

lee_ars 6 hours ago [-]

I haven't paired it with Ghidra MCP; because the games are relatively tiny (I'm starting with one of my personal favorites, Karl Buiter's Sentinel Worlds I: Future Magic, which is like <700KB all in), I made a first baseline pass with Fable a couple of days ago while it was still working and it created a bunch of tiny python tools with Capstone. Qwen picked those right up and has had equal success with them. I might try adding Ghidra into the mix, but it seems overkill at the moment.

I went with a pair of models primarily just to see if I could make it work. It's been fine, but I'm going to rip out the smaller coder model today and try it with just the bigger thinking Qwen model wearing both planner & coder hats in the same loop, just with only the bigger model running.

I'm learning a lot, and primarily what I'm learning is that I'm not a developer and this stuff gets real complex real fast, especially in chasing down all the details needed to make sure I'm taking advantage of the spark hardware!

Yoric 23 hours ago [-]

https://www.canirun.ai/?status=tight might answer that question

zkmon 22 hours ago [-]

That site doesn't seem to consider the quants. So useless.

morganastra 23 hours ago [-]

Deepseek v4 flash is shockingly strong for its size and reportedly runs well on that hardware.

Shorel 22 hours ago [-]

Better than qwen3-coder-next? That's the one that has given me the best results so far.

ozim 11 hours ago [-]

If you don't know that already and "using it as such" ... your post should start with "I blew off $4k on a toy several months ago".

znnajdla 13 hours ago [-]

DeepSeek V4 Flash is a very capable coding model that runs well on the hardware you described. Look up the optimized version specifically designed for local use.

anon373839 18 hours ago [-]

Qwen 3.5 122B can fit with context at a pretty high quant (Q6). That's an excellent model.

sermakarevich 7 hours ago [-]

I started using brain -> workers approach for coding.

-- Brain is expensive smart model from claude subscription, Fable 5 when it was available, Opus now.

-- Worker is a local model (qwen3.6:46B), deployed in 36GB GPU, Opencode + Ollama.

Brain is responsible for analysis/design and task creation. Task should be made simple and clear so the worker can handle it. Worker does the coding. Brain validates and create a fix task when required. Atm fix to task ration is ~ 1:20.

If no available GPU at home - qwen3.6 is quite cheap on clouds.

Its rather experimental setup, out of curiosity, but it works better than I would expect it to. This allows me running 3 coding agents non stop for the 4-th day atm. Here I explain how I got there: https://news.ycombinator.com/item?id=48520757

esalman 1 days ago [-]

For me, investing in hardware seems to be the way to go.

I learned coding nearly 24 years ago and still learning new stuff all the time. At no point in time I had to rely on a subscription model to learn and do new stuff.

If LLM and agents are the default tools for coding and building software, at least for next few years, it seems like a no-brainer to invest $2000-3000 on hardware, like a Halo Strix PC.

CraigJPerry 24 hours ago [-]

I wondered if there might be a no brainer "free" option on discarded hardware.

I have a GTX1080ti which i think is circa 2018, it's unused, more than paid for itself over the years, owes me nothing at this point so the hardware is free.

It runs Gemma e4b multimodal, qwen 3.5 8b or the qwen 4b embeddings models well enough (40+ t/s for the LLMs).

The machine consumes 350 watts at the wall when under load (3 watts when sleeping, 80w at idle). Electricity costs me £0.035GBP/kwh which is cheap for the UK (load shifting via house battery).

144k output tokens for around 1pence (and takes an hour to do that in theory).

It's only JUST cheaper to use than the far more capable deepseek v4 flash model despite the free hardware and ~10x cheaper than normal electricity.

iugtmkbdfil834 24 hours ago [-]

Yes and no. Hardware does lock you in. Granted, I am happy with my 128gb of shared memory, but I am mildly concerned that it actually is more expensive now than when I bought mine. It does not bode well for the future; not when combined with recent WH admin moves on Anthropic and the reality that next batch of good models may require more than 128gb to run well.

edit: I am not dismissing local. I am one such user ( though I have subs too ), but one has to be clear eyed about the trade-offs.

hgoel 24 hours ago [-]

$3k isn't getting you frontier model capability. It's barely getting you any capability if that's split into buying an entire PC rather than just GPUs.

throwatdem12311 24 hours ago [-]

3k? Try 10

jrm4 23 hours ago [-]

With you here. I'm using my cheapo 16gig vram card I picked up a year or so ago, and I'm like -- yes, I percieve that you can pay for way more tokens per second that I can do at home.

But that feels like measuring productivity in lines of code. For what I'm doing, I'm not seeing the benefit in any subscription.

Sure, I can't one-prompt a whole new boring CRUD app, but oh well.

vadansky 1 days ago [-]

Can I run something comparable to Opus 4.6 locally yet? I keep hearing conflicting things. If I can spend 10k to do that I would cancel my subscription. The problem is I don’t wanna spend the money to find out myself.

Catloafdev 24 hours ago [-]

If you want frontier-level, the economically reasonable option is OpenRouter or a direct sub to frontier-of-your-choice.

The reality is that they do not offer configurations that would allow a consumer to run that much VRAM on a single setup to protect datacenter margins. Apple used to, and they stopped, those devices are going for ~$20k+ each on ebay now.

You can get very, very capable models on a 3090/4090/5090/6000 series card. But if you want 'frontier level' you are investing ~22k at a bare minimum if you go new. Used you can probably build your own server for much cheaper up-front cost but it's likely going to be 4-6x+ electricity usage.

daemonologist 24 hours ago [-]

There are also significant economies of scale (namely: utilization and batching), which tend to make inference on a shared server more economical even after the operator takes a cut.

zozbot234 22 hours ago [-]

You can use batching on consumer hardware, it just requires a KV-cache efficient model (or short context only) and keeping multiple inference flows running in parallel. This is most useful in combination with streamed inference, since the compute intensity of decode with those newer KV-compressed models is high enough that you have limited compute headroom when running at the speed of RAM.

theossuary 24 hours ago [-]

I truly think by 2028 we'll have integrated chip systems that'll be able to run opus 4.8 level models at ~500 watts at acceptable performance. Honestly I think now is the worst time to invest in AI hardware. Get your harness ready and processes perfected with hosted models, and wait a few years to buy hardware to transition to running models locally

baq 24 hours ago [-]

Burning weights onto a chip in an efficient way and exposing that via USB would be acceptable for a good enough model tbh

ajbourg 23 hours ago [-]

This is pretty close to what Taalas is doing.

calgoo 22 hours ago [-]

Trying Taalas is almost scary, there is something unsettling with that speed! Even with that small model, because of the speed, you could run hundreds of sample runs in a second, and pick from the best.

Can't wait for their next release!

hurtigioll 23 hours ago [-]

if such hardware becomes available, it will be bought by the data-centers, just like they buy all the RAM today

CamperBob2 23 hours ago [-]

Honestly I think now is the worst time to invest in AI hardware.

That position is not without its own risks, though. Maybe Opus 4.8 will run on a single chip by 2028... and maybe you won't be allowed to touch it.

And what if Xi makes a play for Taiwan? That would be stupid, but so was invading Ukraine with tanks from Temu, and it still happened.

dudisubekti 19 hours ago [-]

Other than Taiwan declaring independence, I don't see any reason why China will rush to take the island.

At the very least they would wait until they cracked EUV and mass-produce the chips, and that is still 4-5 years away at the earliest.

CptFribble 22 hours ago [-]

> so was invading Ukraine

the difference is that Putin's hand was forced by age, (possibly) illness, and the last several decades of how he chose to run his country. Putin's power base is a relatively small group of elites and oligarchs who would happily snuff out the man who pushes them out of windows if they get too uppity, if they were given the chance. He needed the cover of war to maintain the fiction of his type of strongman "only I can save us" leadership.

Xi's power base is the simple fact that his leadership has transformed China into the #2, and now because of Trump possibly soon the #1 world superpower. He has also acted aggressively in the last decade to find and remove corruption and prevent individuals from accumulating the kind of wealth and influence that could threaten his power from outside official Party channels. Of course, as I'm not Chinese myself, I have no clue what the internals of Party politics actually look like. But as an outside observer it seems clear that Xi et. al. do not actually need Taiwan for anything other than national pride. They know the US would go to the mat to protect it as TSMC is extremely vital to US military power. And since China cannot compete in that arena and has too much to lose, they instead have focused on weakening the US from within, quite successfully of late.

By the time China finally takes Taiwan it will be with little fanfare and little consequence - they won't touch it until the US either has lost its military capabilities, or the US has its own internal chip industry. Anything else is an existential risk for the coastal cities that are China's entire economic advantage.

grim_io 24 hours ago [-]

10k will not get you anywhere near opus or sonnet. It's simply not possible for mere mortals currently.

als0 24 hours ago [-]

> Can I run something comparable to Opus 4.6 locally yet?

Sadly, no. The best comparable thing you can get is about Sonnet 3.7

captaintobs 24 hours ago [-]

i spent 8k and get close to a 2-3x slower sonnet. running 2x spark deep seek v4 flash

CamperBob2 24 hours ago [-]

Some benchmarks have shown Kimi K2.6 within error-bar distance of Opus 4.6, and you can run it on eight RTX6000s. Right now it's not possible to set up a machine like that from scratch for less than $100K... but right now it's also hard to put a price on autonomy.

zozbot234 22 hours ago [-]

You need a lot less than that if you're willing to stream the model from SSD. At that point, the best machine is probably a cheap old-gen HEDT with lots of PCIe lanes to attach cheap NVMe storage to, so as to stream the model at reasonable speed. That's expensive but not $100k expensive!

atemerev 24 hours ago [-]

Best you could do is connect two Mac Studio M3 Ultra 512G RAM each with Thunderbolt. Then theoretically you can run frontier Chinese models (but not Deepseek v4 Pro yet). That would be about $20k.

But - good luck finding them. Apple discontinued the model a few months ago. And more recently, even 256G model was discontinued. Big AI really really does not want people to get off their needle.

zozbot234 22 hours ago [-]

DeepSeek V4 Pro is ~800GB total at native quantization (1.6T params with most being 4-bit) so it can run on the hardware you mentioned. There is also a 2-bit version that will run on a single 512GB machine. SSD streaming also makes lower-end hardware viable to at least test the model, if not quite run it usefully.

RomanPushkin 24 hours ago [-]

AI coding at home literally costs $100/month. I'm wondering where $400 is coming from? $100 is more than enough for "coding at home", IMO. I rarely face the limits, and when I do it's just a time for a quick walk anyway.

chasd00 22 hours ago [-]

Man I’m using the $20/month sub and it works just fine for me. Granted, I have a family and house and lots of obligations so by the time I hit the limits some other task is due before I can return to coding. If I hit the limits before I have something else to do then I just code by hand or review what has been generated until I can use the agent again. Reviewing agent code is a good way to learn too, agents have shown me different approaches than what I would have done and they’re definitely worth thinking about. Also, fixing their mistakes has helped me write better prompting although being a team lead for half a decade has taught me how to specify what I want very clearly and cc gets it right most of the time haha

About interruptions, one thing AI assisted coding really helps with is coding with constant interruption. I can leave CC for half an hour and return then tell it I had to step away, catch me up, and proceed. This works well for me.

geophph 23 hours ago [-]

> Do that well and you can build what a team of twenty engineers would put out in a month for around a thousand dollars.

What does this look like after 6-12 months? Like, how much code are you trying to write total?

Maybe it just doesn’t click in my mind, but sometimes I wonder about how much work people are trying to do and how they actually have enough to get done so quickly in such a short amount of time.

sublinear 23 hours ago [-]

They prefer to work harder and not smarter. Forever hill climbing to nowhere.

I've never worked on a complicated codebase that started out that way until the rest of the business concerns and office politics came into effect. People may not like it, but the bureaucracy is far and away more valuable than the core functionality.

Mature codebases are years of people thinking of all the possible gotchas while solving their acute pain points. This is not fluff, but the living and breathing part of it. Without that code, it's just a machine barely doing stuff in the most obtuse ways possible that nobody wants to pay for.

I would argue that they're putting LLMs to work on that finer detail stuff, but AI is still far too dumb. No, what they're doing is playing with their skinner box.

janpeuker 21 hours ago [-]

The biggest issue I've seen with people burning through tokens is using very long sessions, especially starting with plan mode and then "iterating" over extended periods. I was burnt badly by extra usage so now I run on $20 Pro. I ruthlessly create new sessions/agents, always ask to create markdown files first (no plan mode) and minimise context aggressively - for example I have a lot of skills that use lazy loading and a small local MCP for lookups plus openrouter with a local model for image detection and fulltext search. Basically I use Claude Code in pi.dev style.

bredren 21 hours ago [-]

What is going broke for a programmer?

This is US centric but a $200 Claude code and $100 codex sub is a vast, vast amount of tokens. Enough to pay for itself many times over. It provides exposure to the very edge of harnesses and experience that is being hired for.

Isn’t there an argument this is possibly the best price to available performance for frontier models? Both due to subsidies and the distance between open and accessible alternatives?

astqs 21 hours ago [-]

I used Kiro in December and I burnt through 200 eur worth of tokens in a weekend. Ultimately it was money well-spent, but, I think that if you want, you can spend as much compute as you have access to. Will it be efficient use of tokens? Probably not.

From all the data, it looks like the 200usd we pay for monthly usage is subsidised… at break-even pricing … well, that 200 is starting to look like a few thousand.

conqrr 4 hours ago [-]

The big one that's missing as cost cutter. If you are a programmer, stay as close as you can to the code. Define the interfaces yourself and the core logic. You will know exactly what changes you want and know what tier of model to use.

aarjaneiro 1 hours ago [-]

Just don't vibe code?

pianopatrick 1 days ago [-]

I think someone could find some way to use the smaller local models to write code. Some kind of framework or harness or language or something. But not too many people are working on that because the big models are pretty cheap and a lot better.

petra 24 hours ago [-]

Maybe one possible path(to make weaker models highly capable) is making the job of the llm as easy as possible.

I wonder if part of the solution is building/finding the right libraries, with the right documentation/language/API(one that plays well with LLM's) and maybe creating some synthetic data around them - to make it very easy for the llm.

And maybe there could be a business model around creating those libraries.

calgoo 22 hours ago [-]

So in my limited experience: The smaller the model, the bigger the harness. The biggest issue becomes the context window. For big models you can kind of just give it bash access and let it run... while with the smaller ones you need to fully manage the context in each LLM call.

If you can ask the model for a specific function; with a spec design (typed languages help too) then the small models are great! I have had good progress with generating small python modules for example, but you need verification rounds to catch issues.

So test driven design + a good spec sheet + a very detailed todo.md (or even better if its todo.json because then the LLM does not need to manage it, you do from the harness) is your best bet for small models.

pianopatrick 24 hours ago [-]

I think as well there might be "algorithms" that can work with local LLMs. With local LLMs there is a small context window, but not that much cost per token. So perhaps there is a way to do lots of small prompts that work in a sequence to produce a result.

Like perhaps you could produce 5 versions of a piece of code, and then compare them to choose the best.

Also if the local LLMs can call tools, maybe you can use static analysis tools to catch errors and try again in a loop or process of some sort.

There also might be certain languages that work better because those languages have better static checks.

jrm4 23 hours ago [-]

Yes. LITERALLY THIS. I do this! Not hypothetical.

I'll write a detailed prompt for a function, hand it off to 5 or so models (all of which are on my local machine), wait about 5 min and then compare.

jrm4 23 hours ago [-]

I mean, this is what I'm doing. I'm guessing my process is very different because I'm holding the hand of the project way more along the way, but even that to me probably makes for a more enjoyable.

Which is to say, I might use AI to do an outline/organizational , but I'm prompting every chunk of code "one-by-one," (e.g. at about the "function" level) which still feels lightyears ahead of what I used to do.

impure 1 days ago [-]

I recently made an AI Agent and surprisingly coding with DeepSeek V4 Flash is quite cheap. It probably has to do with the aggressive prompt caching. I'm using OpenRouter with Novita AI as the preferred provider.

throwa356262 24 hours ago [-]

Deepseek v4 via deepseek themselves is significantly cheaper.

Because (1) Huawei collab and (2) vLLM etc dont implement half of the inference optimisations deepseek proposed in their paper.

kagamino 1 days ago [-]

Same here, deepseek v4 flash on opencode go. It's cheap, fats and good enough to follow my instructions

2muchtime 24 hours ago [-]

I’m using zen because I have a Claude subscription and just like dabbling with the other models and I was shocked at how little flash cost but it was noticeably not at the level I’d like my model to be.

For me MiniMax 3 has really hit the sweet spot of being very cheap, though more than flash, but I’d also very capable.

jtr1 16 hours ago [-]

I've been running Claude Pro at home, supplemented with Deepseek configured in Claude Code. I've had decent luck throwing Opus (and briefly Fable, RIP) at architecture / product problems and producing plans to hand off to Deepseek (I personally find v4 to operate somewhere between Opus and Sonnet in capability).

Lately I've been able to cut down on token usage with context-mode and codebase-memory to wring more out of my subscription, as well as doing things like make sure all terminal operations run in quiet mode. I've found codebase-memory particularly effective: it creates an index of your codebase that the agent can query for code tracing without reading all of the associated files, and I've also found it more accurate at analysis

montroser 17 hours ago [-]

deepseekv4 pro via opencode go is $10/mo and has very generous limits. I use pi for the harness and go just as a model provider. It goes a good long way...

novia 7 hours ago [-]

I want to be able to experiment with tweaking model weights and seeing the outcomes and i want to be able to finetune open source models. What's the best way to experiment with that without breaking the bank?

anon373839 7 hours ago [-]

If you want to experiment with fine tuning small models, you can actually do that with free Google Colab instances. Unsloth.ai has written a bunch of guides on this.

Frannky 11 hours ago [-]

I read some cool posts of people trying MiniMax 3 on 4x GB10. I wonder if we are almost there that with 10k we can have Opus 4.6 level at home. I feel tweaking the harnesses and a few more versions from oss models and some new hardware from AMD and Intel could get us there pretty soon. Or maybe we are already there? Someone was able to do it already?

dmzxnico 10 hours ago [-]

I use Claude 100$ + Codex 20$. That's a lot already and I rarely hit my quota before it resets, usually right on time.

Some times I also get OpenCode Go just to get access to chinese models as an extra for new projects I don't really care about, fun ones.

That's way cheaper than hiring a dev anyways.

nunez 19 hours ago [-]

> The second is to skip the hardware and rent those same open source models from a provider at API rates. For most people this is the right call. You avoid putting thousands of dollars on one GPU setup while configurations are still in flux, you skip the work of squeezing long running performance out of an open model, and you can switch to whatever is cheaper or better next month without reselling a box. Something like OpenRouter makes the move close to a one line change.

This will probably become the only option as the companies that publish open weights stop doing that. Very very few people have enough hardware to train/fine-tune at home.

MemoryHoleHQ 24 hours ago [-]

I've been thinking a lot about this and my personal take right now is that at some near-medium future the models abvailable to run at home and the hardware needed to use them will be enough.

My baseline is sonnet 4.6. I think it's good enough for most tasks sincerly. So, from what I see, we are already at a point where we don't need frontier models for serious coding and debuging. Give it a couple of years and that level will fit 120B models.

At the same time, we saw the rise of direct acess memory systems like DGX or Stryx Halo that will allow to run models of this size for "cheap" in the medium term.

That's what I'm betting in. That in 2 years I can buy a system for about $2500 that will run a model that's similar to Sonnet 4.6 locally.

I might be spectacularly wrong though. But I'm willing to wait and use subscriptions/API calls for now.

abc42 24 hours ago [-]

What kind of usage chews through Claude Max x20? I use several agents with max effort in parallel and usually end up with something like 50% weekly usage. Fable almost allowed me to get to 70% but then they started resetting the limits mid-week and of course now ended the whole thing.

pshirshov 23 hours ago [-]

> and the hardware you buy today may look like a bad bet in a year.

3090s and 7900s are going well so far.

Next year an Arc Pro B70 won't produce you less tokens than today.

They aren't fast but if you have flows where you can make money with them - they are a bargain in terms of price per Gb.

quickthoughts 1 days ago [-]

Ha just wrote a post[1] about a sort of 4th option - max out cheap compute to create more tangible things that can be used/run locally.

1: https://news.ycombinator.com/item?id=48519181

thomasjb 22 hours ago [-]

Opencode's free models have been fine for me, they're what I tried after Gemma 4 8B proved hard to persuade into usefulness (I want to revisit with 12B and messing with harnesses, but I'm happy for now).

hillj23 23 hours ago [-]

I think this is only going to become more relevant. I'm personally a $200/mo Claude Maxer and I know that the usage I'm getting on Opus 4.8 Max and (until they yoked it out from under me) Fable 5 is way, way more than what I'm paying them. At some point, this will turn usage-based and I will be hammered on it and probably forced to look at self-hosting. I think while the caps are there, even at $200, it's honestly not too bad if you're coding value into the market, but as soon as those caps come off for retail AI users, we're all going to have some tough choices to make.

smeej 12 hours ago [-]

The solution in the article is, "Pay for a max plan and then buy the extra tokens you need by API." How is that noteworthy? Isn't that exactly what Anthropic and OpenAI recommend?

I feel like I must be missing something.

19 hours ago [-]

josh_p 17 hours ago [-]

It’s been very validating in this thread to see everyone questioning the massive token spend of influencers and the like.

The opencode-go sub, at $10/mo, is amazing value. I’ve been using that and the assistant kagi offers for web-chat and research for months. For the smallish projects I work on at home those have been great.

closeparen 22 hours ago [-]

>Around $400 a month of plans buys roughly $2800 of API usage at list prices, which is a real bargain right up until you hit the ceiling. The plans are metered, and any large AI native workflow will chew through the included tokens fast

I don't think that's true at all. I'm doing 8-12 PRs a week at work, all primarily Claude Code, and the usage at API billing has never broken $500/mo.

conradkay 19 hours ago [-]

I'm almost certain $2800 is actually too low if you're really hitting weekly limits.

I'm on the $100/m plan and used $300 at API billing yesterday (according to ccusage)

Seems like one session is >$100 and I can get 10 full sessions per week

The $200/m plan is supposed to be 4x that in usage, so with 2 of those you could use 4*2*100*10=$8000 in just a week

Using Simon's numbers here as a bare minimum https://simonwillison.net/2026/May/27/product-market-fit/#en... you'd get 1200*4*2=$9600 a month

kgwgk 22 hours ago [-]

Maybe yours is not a large AI native workflow?

22 hours ago [-]

tactlesscamel 5 hours ago [-]

You pay how much!?!?

Y'all know that is enough to buy a real human, right? Well, good for you. I'm going back to figuring out which of my 2 streaming subs are getting cut. Maybe my free crap is enough to figure out that for me while I make my own art.

To be frank, the time you spend constructing prompts, tasks, and all else required to get your ai to do a thing was probably enough time to do the thing yourself. -Research included.

It's good to see the world throw the concepts of art, pride, and general accomplishment in the trash. Why have friends and partners in projects when you could give your savings to Anthropic, OpenAI, or any number of companies already obtaining ungodly ammounts of financing? A somewhate helpful bot at the cost of who you are and your bank account.

0xB0D 22 hours ago [-]

If your job becomes writing complex specs to make an LLM write code, you've not optimised anything.

In fact all you've done is add a business cost.

dmos62 21 hours ago [-]

Are you saying that specs shouldn't be complex or that you shouldn't write specs at all?

spgorbatiuk 23 hours ago [-]

Hardware and provider juggling is a way to go, although I think it is also worth mentioning that the cost is not only the price-per-token, but first of all, the amount of tokens used.

Depending on what one builds, comprehensive documentation and applicable skills and memory tools often allow for a substantial reduction of tokens previously used by the agent to comprehend and remember what is being built

asdfasgasdgasdg 21 hours ago [-]

Use Gemini 3.5 flash on the $20 a month plan and be satisfied with only being 3x as productive as you’d be on your own.

ddxv 14 hours ago [-]

I recently switched to the opencode $20 a month plan and am also testing the $5 a month go plan to see if that works. Connected into MiMo or DeepSeek Zen seems to code all day.

WhiteOwlLion 23 hours ago [-]

There’s a lot of Xeon chips for $10 on eBay. Too bad there’s no drive for cpu based inference. The data center will need to swap out the older gpu clusters so what does that do for hardware pricing on data center gpus? H100 are cheap enough but the power requirements make it a long term net negative for how much pay for power in California.

Kuyawa 23 hours ago [-]

This month I've spent only 15 cents using DeepSeek API and my own coding agent. Three apps delivered to clients and currently working on a tournament management app for pickleball, padel and beach tennis. I love DeepSeek.

24 hours ago [-]

dempedempe 1 days ago [-]

Did you just copy-and-paste an AI response an post it on your blog?

dottchen 18 hours ago [-]

running 2 $200 codex subs seems to work for me. It's quite easy to run out of a full account's weekly usage if using xhigh and fast mode all the way, and i'm not using it for autonomous running, still mainly human reviewed actual work.

jason_s 13 hours ago [-]

Please use a more readable variable-width font.

dualvariable 19 hours ago [-]

Am I the only one happy on a $20/month pro plan?

Yeah, every now and then you blow out the window limits. So you take a break and think about something else or go out and do something else...

jacobgold 24 hours ago [-]

"Around $400 a month of plans buys roughly $2800 of API usage at list prices, which is a real bargain right up until you hit the ceiling."

I realize this text is just slop but it never stops being a "real bargain" at any point.

And it's more like $200/mo for $4000+/mo in tokens. You can also buy additional subscriptions.

There's no sense in running local models or doing anything else as long as VCs (and soon the public markets) are willing to pay your bill.

simonw 24 hours ago [-]

SemiAnalysis pushed this to the limit and managed to get $8,000 of tokens from a $200/month Anthropic plan and $14,000 of tokens from a $200/month OpenAI plan: https://twitter.com/SemiAnalysis_/status/2064815044085318040

jacobgold 23 hours ago [-]

Yeah, although that is pushing every rate limit and no one knows what happens if you do that consistently? I think $4,000/mo is probably a good estimate for an individual dev doing synchronous coding agent work.

simonw 23 hours ago [-]

Yeah, I agree. I've been consistently getting about $1,000/month of value out of the $100/month subscription for OpenAI, and about the same for Anthropic.

stkdump 22 hours ago [-]

Sorry to be that guy. I think the more precise wording would be that you get tokens which would cost $1,000/month at API pricing. Maybe (depending on the profit margin of the API pricing) you incur costs somewhere close to $1,000/month. And maybe your usage is subsidized by 900$/month. The value you get out of it is a whole other question. One that according to recent news, CFOs find hard to esitimate.

simonw 20 hours ago [-]

Ask a CFO to estimate the value of microservices, or agile, or free snacks in the office, and you'll get much the same answer from them.

abc42 24 hours ago [-]

Even if they were making a profit, their scale and expertise will obviously give you a cheaper product than what you can build.

jacobgold 24 hours ago [-]

Maybe today but it's not a law of nature. It seems inevitable that AI models and coding agents will be fully commoditized eventually, just like computers, game engines, compilers, web servers, and so many other technologies have been.

At the end of the day, AI models are relatively small files that we run little CUDA programs on.

13415 24 hours ago [-]

I use copy & paste with a pro subscription. I guess I'm a bit behind in terms of tool use but it works great for me.

TheSkyHasEyes 23 hours ago [-]

Similar story. I did have a pro subscription as a trial. I'm finding the free tier is as good(for my purposes) as the paid model.

OutOfHere 1 days ago [-]

Fixed-price monthly plans ought to be sufficient for most people who actually review their spec and code, for building production-grade software that stand the test of time. A careful spec+review+iteration takes time, resetting the usage quota. Granted, security audits uses tokens too.

If you still need more tokens, odds that you're vibecoding unmaintainable throwaway trash.

24 hours ago [-]

manfre 23 hours ago [-]

With access to view usage for my org and conversations with developers, I think much of the high token usage is a result of people not knowing how to right size the model for the given task. The trend seems to be to pick the most powerful model and use it for everything. Based upon git metrics, I'm one of the top performing engineers at my org and I've yet to run into any overage or throttling on the $200/mo anthropic sub.

justinhj 23 hours ago [-]

I had no idea git metrics could show your best performers

OutOfHere 22 hours ago [-]

It could put managers out of a job, without AI too, so they prefer to not use it.

sebastianconcpt 20 hours ago [-]

Pretty happy with oMLX running Qwen3.6-35B-A3B-8bit

devhe4d 20 hours ago [-]

since when $400/m is justified as a "efficient" way of using a "nice-to-have" option?

what a world we live in?

iwontberude 17 hours ago [-]

As long as you use models trained or distilled for your use case, there is no need to waste compute on trillions of parameters. Anthropic and OpenAI are proving that the “everything” models are not a sustainable business model.

Just-In-Time or dynamic precompute of distilled models have already begun reducing the use of these frontier models for task inference.

Flere-Imsaho 21 hours ago [-]

Instead of openrouter (which is admittedly a good service) I've switched to EU only servers via https://cortecs.ai/

If you hunt in the settings you can restrict your account to only use EU servers for inference... Which means you can't use a lot of the US frontier models, but you can use all the Chinese ones, albeit within EU GDPR, etc.

This to me is a good compromise between privacy and cost.

jrm4 24 hours ago [-]

Is spending (metered money) even worth it? Perhaps for most I mean "beyond like a 30 bucks a month," but for me I'm literally not spending more money beyond my very cheapo 16gb video card.

No clue what y'all are doing, perhaps because I'm hobbying, and also I'm old and can perhaps do more of this by hand.

But I'm basically just doing what I did before, plus ollama self hosted and sometimes gemini and I feel like I'm going lightspeed beyond what I've ever done.

And I suppose this is still very fine-grained. I have it make a draft, then just have them fix/change it step by step?

I tried one of the bigger boys that can one-shot apps, which I guess is cool, but I'm finding it's just as hard to modify as if I just grabbed someone elses repo on github.

stkdump 21 hours ago [-]

No, I have the same experience. Feels crazy that a GPU is too expensive and then the advice is to spend 400$+tokens on openrouter each month.

sesm 24 hours ago [-]

> Do that well and you can build what a team of twenty engineers would put out in a month for around a thousand dollars.

As usual, an extraordinary claim without an extraordinary evidence: https://stephen.bochinski.dev/apps/

singpolyma3 17 hours ago [-]

How is this even an article? The advice is just "pay for max"

dempedempe 16 hours ago [-]

Because this person's entire blog is just copy-and-pasted AI responses.

andrewstuart 16 hours ago [-]

I feel like the author isn’t aware of the Anthropic fixed price subscriptions, any of which can give you a lot of home AI programming.

m3kw9 20 hours ago [-]

That’s easy, just use the plus plan and learn how to prompt efficiently

tamimio 24 hours ago [-]

You can have opencode and switch between multiple providers based on the tasks you are doing on the fly, normal tasks use deepseek for example, hard one use gpt5 or opus4, and track the usage with something like codexbar or similar. Openrouter seems to charge extra on top of the api costs, same with zen ide, so keep that in mind.

gaigalas 1 days ago [-]

> The first is to self host. You buy the machine, run open source models locally, and pay nothing per token after that.

In the good ol' days, we bought machines not only to run stuff, but to experiment.

I understand today experiments are limited. Inference is reasonable, fine-tuning is either niche or a stretch, and base training is impossible.

*That is bound to change*, and when it does, there will be an avalanche of hobbysts and amateurs poking at base training. They'll find optimizations no one found before, synthetize data no one ever imagined to synthetize, and when that happens we'll start getting libre models.

So, yeah. Right now, buying the machine doesn't pay off that well, unless you want to pioneer this stuff in severe adverse conditions (hardware prices inflated, etc). Eventually, it will.

whateveracct 16 hours ago [-]

am i the only one who codes by hand at work and for fun anymore?

zuzululu 24 hours ago [-]

Another update for codex users they let you accumulate resets which greatly adds to the mileage

I don't think its feasible to have something comparable to these frontier models when they are increasing usage and lowering token costs

KaiShips 22 hours ago [-]

[flagged]

knightops_dev 20 hours ago [-]

[flagged]

hottrends 19 hours ago [-]

[flagged]

aplomb1026 24 hours ago [-]

[flagged]

reinitctxoffset 24 hours ago [-]

[dead]

verdyshd 18 hours ago [-]

[flagged]

ameon 14 hours ago [-]

[dead]

ricodebug 24 hours ago [-]

[dead]

Rendered at 17:21:06 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.