Downtime – New Haven RPG

This topic has 5 replies, 4 voices, and was last updated 2 months ago by Evalina.

Viewing 6 posts - 1 through 6 (of 6 total)

Author

Posts
#26370

Maina
Participant
September 23, 2025 at 1:36 pm

Is it really letting anyone post here? Well, it’s empty and no rules posted for how to use it and this seems important.

Anyway, for those who do not know, the game is currently experiencing frequent downtime.

It is freezing once or twice a day for an hour or more before crashing, at which point it remains down for something like 6-12 hours. This latest was 16 hours.

This has been going on for about a week, and it almost exclusively happens in the afternoon or evening game time. Mostly evening.

Putting this here in hopes Nova sees it earlier than petitions, and maybe people can reply with any other details they’ve noticed to help nail down the problem.

#26371

Yourstruly
Participant
September 23, 2025 at 3:23 pm

My current running theory is that cloud hosting (possibly Google’s content hosting) is being used for Haven. This in and of itself is not an issue. However, as the LLM usage has expanded, it may be that the cloud hosting service being used is not robust enough for it. These recent issues have come into play since the addition of the podcast. All the text-based LLM content is fairly straight-forward. Still a bit beefy, but nearly as much as generating audio that emulates real people.
If you listened to the entirety of the 2nd podcast, you likely noticed a degradation in the last 15-20 minutes: names weren’t said properly, it forgot things… general hallucinations.

I think it’s processing it backwards up: so as events happen, it processes audio into the pile so that once the 2 weeks is up, the podcast is ready to go (otherwise it would take it a long time to process together). But as we get into week 6 and the 3rd episode, I think the data and processing usage might be too much for the hosting Haven is currently using.

Now, again, this is all a theory based on what I can tell and what I know of LLMs in general. It may not be the issue, but it would track based on what we’ve seen and when it happens (during primetime, when events are ending, I’ve seen ‘Supernatural Rumors’ that fail because the LLM failed to process them, etc.)

Hopefully the extended downtime this time meant that a fix went in. Otherwise my guess would be that a more robust hosting service that can handle what the LLM has become will be needed.

#26373

Evalina
Participant
September 23, 2025 at 7:28 pm

These issues came into play weeks after the first podcast, so going to have to put the nix on that theory. It’s unlikely Nova is running the model on the same server anyway, even if he does run the model himself. A server used to run models like this ideally has a GPU, Haven needs no such thing and adding even a basic GPU adds far, far more to the pricing than doing the minor things haven does via the API I think.

I think about 8000~ emotes written by the AI would be the likely point where it’d be profitable to run a GPU there. More likely Nova is using an external API (and we know, from past things, he’s used at least Gemini and probably also others, so hey) and very perhaps a server shared with a more constant cloud workload. Unless you all are rping with the AI a /lot/, it just doesn’t make sense.

Processing 1 player’s logs takes about 30 seconds to a minute at worst and isn’t that heavy for gemini. I let it process all of mine a few times to test some things. We’ve about 700 players if we’re being generous. That’s 12 hours per week. And that’s being incredibly safe for us – it’s probably way way less.

Presuming he’s being clever and using batch processing on multiple players at once and limiting the total RP processed per player to a reasonable limit he’s probably paying about 20 cents per active player. If he’s processing all our RP, he’s paying a lot more. But a local model that can reasonably handle that at a decent level of quality will probably cost him a ton more money. Because he still wouldnt need that GPU 24/7. He’d need it at most a few processing hours a day. The rest of the time it’s just costing him money. And plenty of us donate, but that doesn’t mean he’s likely to throw that away so easily.

If he’s using the podcast service I think he is, he’s paying about 22$ for that. Gemini can create the entire script in minutes – no need for any constant real time processing that would cause this.

I suspect he just did an oopsie in some of the code he wrote to handle petitions a week ago and is hitting a resource limit. We had 2 manual copyovers since, so I suspect he’s aware of that oopsie and maybe already fixed it.

Who knows, pure speculation that. Just wanted to counter-argue blame-the-llm before that becomes a widespread thing…

#26554

Maina
Participant
September 25, 2025 at 11:29 am

Now for an actual announcement:

—
Area: Announcements – Note #774
From: Nova
To: all
Cc:
Subj: Systems Nominal
Time: Thu Sep 25 07:38:51 2025

—
The instability we were experiencing over the last week or so should be resolved. Basically every time the game needs to compare a colored string to a non-colored string it has to create a new non-colored version of that string for comparison. The memory from these weren’t being released properly causing the game to use up more and more memory over time until it froze everything up. It took a while to solve because there were three different ways it was leaking memory, one was pretty obvious and easy to fix, one was quite complicated and took a while to find, and the third required me to learn a bunch of new stuff about C memory handling.

I’ve also made it so the game will auto reboot if it hits a high memory usage just in case.

I put in a change a while ago that would de-color any time which wasn’t expensive enough to be a colored item, I’ve updated the logic now so it de-colors any item which isn’t manually colored and isn’t currently available in a shop at the eligible price. Because I wasn’t previously tracking manually colored items there’s some backfill logic which may be imperfect where if you have an illegible item but also less recolors that you could have based on your community credit it will flag it as if it was manually recolored up to a limit. If you’ve had items you’ve manually recolored de-colored and want your recolors reimbursed please petition.

Manually recolored items should now also not cost a color to change as long as the change is only a few characters.

#27015

Anonymous
Guest
October 4, 2025 at 8:20 am

Is it still the same problem? Because that was a long time.

I don’t know very much about computery stuff, but why is a memory leak in the game’s code causing the homepage to go offline?

#27036

Evalina
Participant
October 4, 2025 at 4:05 pm

Game has a memory leak -> uses too much memory -> server runs out of memory -> everything else on the server stops working.

The whole game + forums + old forums are all hosted on the same physical machine I am pretty sure, so this happens.
Author

Posts

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.