Archived:February 2023 lag incident

From A2wiki
(Redirected from February 2023 lag incident)
Jump to navigation Jump to search
February 2023 lag incident
Feb2023incident.png
A screenshot of the unplayably low TPS during the incident.
Duration3 days
TypeServer lag
CauseHigh resource usage on server host's end
Outcome
  • World rolled back 5 days
  • User homes lost
  • Major decline in user activity, leading to the Great Reset

The February 2023 lag incident began when Attempt2 suddenly began to experience extreme TPS lag, to the point of rendering the server entirely unplayable. As all possible solutions, up to and including wiping the entire server failed to resolve the issue, it was believed by the server administration that it was caused by a fault on the host's end. This was later confirmed on February 7th, 2023, when the server host's technical support team confirmed that the server was running on a node that was experiencing high resource usage, and moved the server to a different node, upon which the issue was immediately resolved.

The failed attempts at fixing the issue resulted in the rollback of the world by several days, the loss of several plugin configuration files, and several days of downtime. However, the affected plugins were successfully reconfigured, and compensation was provided to all affected players.

History[edit source]

PixelPrinter and server performance[edit source]

Attempt2 has used the PixelPrinter server plugin since its inception for detailed graphics in and around the world. A main function of this plugin is CreateFrame, allowing the user to create download images using item frames. This can, when used excessively, generate hundreds to thousands of item frames, causing heavy amounts of entity-based lag. Although PixelPrinter is not accessible by non-staff, staff have been occasionally seen to use it as a method of griefing.

Attempt2 previously used 5GB of Server RAM before the lag incident. Server owner nc77812 stated he planned to upgrade the RAM multiple times, however never ended up following through. The server had not experienced major performance issues before the incident, although its predecessor server, The PLA Network, did experience a major lag incident after its August 2021 public opening. This was remediated by an upgrade from the Spigot to Paper API.

Incident[edit source]

PixelPrinter Grief[edit source]

On the evening of February 3rd, 2023, the server had been fairly active. Mod Serenity7321 had specifically decided to paste the attic of player dwrr_'s with hundreds of PixelPrinter-generated item frame pictures of Chinese communist revolutionary Mao Zedong, which had previously been downloaded to the server by Admin RandomUser34. She had done this same picture-based grief before in the Mount Xavier Compound storage room a few days before, totaling 855 item frames. The grief this time, however, had totaled a significantly larger amount of item frames, causing the server to experience item-frame-based entity lag. All item frames had been removed shortly after via the /killall server command by Serenity. The server recovered immediately after this.

Decision to upgrade server memory[edit source]

Server owner nc77812 had decided shortly thereafter that he would upgrade the server RAM from 5 to 8GB effective immediately, announcing his decision in the Attempt2 Discord server's #voice-general chat. He cited future performance concerns as the reason to upgrade. The server then restarted at approximately 2:00AM EST.

Beginning of Incident (2:00-4:00AM EST)[edit source]

After the server restarted, lag issues immediately became apparent. As people joined, chunks around them failed to load. Server TPS (ticks per second) dropped from the regular 20 average to a near 1 to 5 average, rendering the server virtually unplayable. nc77812 and RandomUser34, joined with the feedback of dwrr_, attempted to troubleshoot the problem, by performing things such as mass killing of entities, and attempting to find a root cause of the lag by installing and using the Spark server diagnostics plugin. As it became clearer that their actions were leading nowhere, multiple things were suggested to try and solve the problem. RandomUser34 and dwrr_ suggested that the issue could be fixed by switching the server to the Fabric API, a move which was swiftly rejected by nc77812, who insisted the issue instead involved the server world, citing previous issues with PixelPrinter-derived entity lag.

Eventually, nc77812 made the executive decision to delete the current server world and revert the server's world to its last backup of January 30th, 2023, explaining that it would be easier than switching the entire server and its plugins over to Fabric, which used a completely different plugin/addon system than Paper/Spigot. This action generated immediate controversy from the playerbase, but was nonetheless executed.

Continuation of Incident (Feb 4-5)[edit source]

Server Admin RandomUser34 continued investigation into the server lag issue. After trying many things, including running Spigot, Paper, and Fabric APIs, and even running the server as a clean-slate, Vanilla server with a new world, it became clear that the issue was the fault of the server host. He immediately contacted host support thereafter. After the incident, it was discovered that he had accidentally wiped both the server's End and Nether worlds, of which there was no backup.

Support resolves issue (Feb 5-7)[edit source]

Support was relatively slow to respond, due to the incident timing occurring on a weekend. After about 2 days, support acknowledged they had transferred the server to another node, which had been overwhelmed with high resource usage. This essentially had throttled the server's performance to a point of unplayability. The host then transferred the server to another node free of charge. The server immediately reverted back to normal performance after the transfer. The server reopened later in the day on 7 February.

Aftermath[edit source]

Multiple builds built between January 31st-Feb 4th, including all of the server's End and Nether progress, had been lost in the incident. Owner nc77812, who faces backlash due to his mishandling of the incident, pledged that all of those impacted by the incident would be "reimbursed" by "any means necessary".

The incident coincided with a vote on server referendum Proposition Two. Due to the unusual circumstances surrounding the poll's timing, it's results were declared null and void, and a new poll is being conducted currently.

By February 9th, a new public Enderman farm had been built to replace the one lost in the incident, as well as a new spawn-end gateway.