New HavenForumsAnnouncementsDowntime
- This topic has 2 replies, 3 voices, and was last updated 1 day, 1 hour ago by
Evalina.
-
AuthorPosts
-
Is it really letting anyone post here? Well, it’s empty and no rules posted for how to use it and this seems important.
Anyway, for those who do not know, the game is currently experiencing frequent downtime.
It is freezing once or twice a day for an hour or more before crashing, at which point it remains down for something like 6-12 hours. This latest was 16 hours.
This has been going on for about a week, and it almost exclusively happens in the afternoon or evening game time. Mostly evening.
Putting this here in hopes Nova sees it earlier than petitions, and maybe people can reply with any other details they’ve noticed to help nail down the problem.
My current running theory is that cloud hosting (possibly Google’s content hosting) is being used for Haven. This in and of itself is not an issue. However, as the LLM usage has expanded, it may be that the cloud hosting service being used is not robust enough for it. These recent issues have come into play since the addition of the podcast. All the text-based LLM content is fairly straight-forward. Still a bit beefy, but nearly as much as generating audio that emulates real people.
If you listened to the entirety of the 2nd podcast, you likely noticed a degradation in the last 15-20 minutes: names weren’t said properly, it forgot things… general hallucinations.I think it’s processing it backwards up: so as events happen, it processes audio into the pile so that once the 2 weeks is up, the podcast is ready to go (otherwise it would take it a long time to process together). But as we get into week 6 and the 3rd episode, I think the data and processing usage might be too much for the hosting Haven is currently using.
Now, again, this is all a theory based on what I can tell and what I know of LLMs in general. It may not be the issue, but it would track based on what we’ve seen and when it happens (during primetime, when events are ending, I’ve seen ‘Supernatural Rumors’ that fail because the LLM failed to process them, etc.)
Hopefully the extended downtime this time meant that a fix went in. Otherwise my guess would be that a more robust hosting service that can handle what the LLM has become will be needed.
These issues came into play weeks after the first podcast, so going to have to put the nix on that theory. It’s unlikely Nova is running the model on the same server anyway, even if he does run the model himself. A server used to run models like this ideally has a GPU, Haven needs no such thing and adding even a basic GPU adds far, far more to the pricing than doing the minor things haven does via the API I think.
I think about 8000~ emotes written by the AI would be the likely point where it’d be profitable to run a GPU there. More likely Nova is using an external API (and we know, from past things, he’s used at least Gemini and probably also others, so hey) and very perhaps a server shared with a more constant cloud workload. Unless you all are rping with the AI a /lot/, it just doesn’t make sense.
Processing 1 player’s logs takes about 30 seconds to a minute at worst and isn’t that heavy for gemini. I let it process all of mine a few times to test some things. We’ve about 700 players if we’re being generous. That’s 12 hours per week. And that’s being incredibly safe for us – it’s probably way way less.
Presuming he’s being clever and using batch processing on multiple players at once and limiting the total RP processed per player to a reasonable limit he’s probably paying about 20 cents per active player. If he’s processing all our RP, he’s paying a lot more. But a local model that can reasonably handle that at a decent level of quality will probably cost him a ton more money. Because he still wouldnt need that GPU 24/7. He’d need it at most a few processing hours a day. The rest of the time it’s just costing him money. And plenty of us donate, but that doesn’t mean he’s likely to throw that away so easily.
If he’s using the podcast service I think he is, he’s paying about 22$ for that. Gemini can create the entire script in minutes – no need for any constant real time processing that would cause this.
I suspect he just did an oopsie in some of the code he wrote to handle petitions a week ago and is hitting a resource limit. We had 2 manual copyovers since, so I suspect he’s aware of that oopsie and maybe already fixed it.
Who knows, pure speculation that. Just wanted to counter-argue blame-the-llm before that becomes a widespread thing…
-
AuthorPosts
- You must be logged in to reply to this topic.