The ramblings of a developer
Over the weekend some users may have noticed that their site started to experience some 403 errors. I'd like to take a minute to explain what happened, how it was fixed, and what lessons were learned in the process.
Recently, a user had complained that they were unable to add email addresses to their account. It turns out that they had deleted their entire home directory. This is a very bad idea. All of the account specific files for email, password key files, etc were in there and the control panel wasn't able to recreate or recover them.
Of course, this had to be fixed. The first attempt at fixing a user's home directory was to restore via a backup. Unfortunately some research indicated that this would not be a sufficient method to restore a user's account. We can confirm, that the control panel software does not like to restore accounts that do not have the standard folder structure.
The second attempt was to simply backup their public_html files, delete their account (and all of the related files and folders) and recreate it from scratch. After recreating the account, all files and folders inside of their public_html directory were restored. This left one glaring issue, ownership and permissions. Files and folders must be owned by the user's account (so they can read and write these files) and have specific permissions (so PHP and Apache can access them). This lead us to run a convenient script called chownpublichtmls.
Chownpublichtmls is a control panel script that is designed to correct the ownership problems described above. Running this script should have no adverse effects on users since their files and folders were already set up correctly. It did, however, manage to change the permissions such that PHP and Apache were unable to access, execute, and serve files. As a result, users found that their sites started throwing 403 errors (access denied).
We corrected the sites that we noticed that were throwing 403 errors by examining the Apache error logs. This didn't settle our suspicions, however, because there could have easily been more users who would experience this issue without showing up immediately in the error log.
In order to make sure that no more users on the system were affected, we requested that an extra set of eyes exam the problem. During this examination, the systems gurus had changed the ownership and permissions on users to something that had conflicted with what was necessary for proper operation.
When we got wind was scripts were run, we knew that it wasn't right. We were immediately on the phone to find out specifically what actions were performed and how to fix the permissions problem. We started applying randomly testing and applying fixes to sites that we noticed were down and began exploring a system wide fix.
Thankfully, we came across this script. This script is designed to fix the permissions for all users back to their proper settings. We began by testing this on a small scale and, once we verified that it worked as advertised, ran it for everyone.
First and foremost, we learned to never run thechownpublichtmls script again. Should we need to properly correct the permissions on an account, we will use the file permissions script linked above.
Going forward we will invest time and energy into developing scripts that make recovery from issues like this easier. We will also invest in (purchase) applications for the iPhone and iPad that make it easy to address issues while on the road. In the event that there is a status problem with the server, we will post about it (as we did in this event) to Twitter with the hash tag #status. Please follow us on Twitter for service updates.
Thanks to everyone who emailed in and tweeted us. We appreciate your continued business and support.
Your Barista, Greg