Rendered at 21:29:28 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
tmanchester 15 hours ago [-]
Okay this is actually pretty cool. Gemma 4 is a nice little model and I've really enjoyed playing around with it. At 1800 tok/s turns are essentially instant, it's a bit of a trip
simianwords 7 hours ago [-]
I just tried it on their website and it is extremely fast. I wonder what is the value prop of this? Where would I want
1. a smaller model
2. also non local, hosted on cloud
I can't think of any case.
johntash 4 hours ago [-]
OCR is a decent use-case for smaller models. I've had good experience using gemma for OCR'ing handwritten stuff that tesseract doesn't do so well on.
But for 2, probably only useful if you have a huge batch workload you want to get done quicker and don't want the local hardware for it?
jamesponddotco 4 hours ago [-]
A voice assistant comes to mind. Ideally, it'd be local, but if you don't have the hardware you'll go with the cloud, in which case, the fastest, the better.
anthonypasq 7 hours ago [-]
speed is always better. if you have ever used a coding agent with 1000 tps going back to 50 seems like walking through sludge. for simple question i hate waiting 2 minutes for opus to loop 50 times just to read some files and answer a question.
its not necessarily specifically labout gemma 4, but in a year or 2 when we have opus class models at 2000 tps imagine the productivity.
simianwords 7 hours ago [-]
Of course I think speed is preferable but I don’t see myself paying for a fast Gemma
anthonypasq 7 hours ago [-]
i mean, i can imagine a million different apps that use ai that want cheap multimodal capabilities with high latency.
simianwords 7 hours ago [-]
Answering myself: fancy autocomplete in my IDE?
Text autocorrect on my phone? Like give it all the context about me and so on.
1. a smaller model
2. also non local, hosted on cloud
I can't think of any case.
But for 2, probably only useful if you have a huge batch workload you want to get done quicker and don't want the local hardware for it?
its not necessarily specifically labout gemma 4, but in a year or 2 when we have opus class models at 2000 tps imagine the productivity.
Text autocorrect on my phone? Like give it all the context about me and so on.