Monday, Monday…

23 Nov 2004  in the wee hours  Matt Winckler

“Life is like a sandwich. Sometimes you eat the sandwich, and sometimes the sandwich eats you.”

“What kind of sandwich are you talking about, anyway?”

“Pit-bull salad sandwich.”


Today was a Monday in the stereotypical sense of the word. Had I known what evil was forthcoming, I might have had a greater sense of foreboding yesterday afternoon when I received two cryptic emails from the CVS/Bugzilla server at work, indicating that a power outage had occurred and the server had rebooted itself. As it was, I rolled into work this morning, went through the normal routine, ran a rootkit on the server to make sure all was secure, and found no problems.

Instead, someone else found the problems for me. At 0841, my efforts to recall what I had been doing at 1650 on Friday were interrupted by a coworker asking nonchalantly if I was doing maintenance on Bugzilla, because it was down. At that moment, someone carelessly dropped a small pouch of #10 shot into one of the lower chambers of my heart, and I fired up the trusty web browser to find that indeed, Bugzilla was thoroughly broken. I speculated from the error messages that the MySQL server had failed to start during yesterday’s reboot for some reason. I quickly confirmed this and started the server normally. All was well.

Moments later, the careless individual had dropped more like a wagonload of #4 shot into my heart, and I was feeling pretty much like cold lead, except that cold lead under pressure doesn’t sweat as much as I do. It seemed that somewhere along the line, the database had misplaced the last three and a half months’ worth of bug tracking data. Still, I felt reasonably confident in restoring backups to minimize the loss. Then the aforementioned invisible man dropped a few cannonballs into my stomach, when I realized that for some strange reason, the backups were restoring the same August 12 version of the data as was already present. I racked my brain for possible solutions. I had been taking backups of the database regularly; why none of the changes were present in them was utterly beyond my comprehension. The situation looked bad.

After lunch, the most heavily impacted parties met to evaluate the situation. They came to me seeking the raw files from my backups, wondering if there was any chance that data could be recovered from them. I disappointed them by opening up one of the files in vim and pointing out that it was binary, and that it would not be very much fun to pick through a few ten thousands of rows seeking data that was in all probability not even present to begin with. Then something prompted me to search the hard drive for MySQL table files, just to see if by chance any others were more recent than my backups. A few moments later revealed several different places where similarly-named files resided. Investigating, I was overjoyed to discover the actual most recent files themselves, sitting off in some obscure directory all alone. I set aside my imminent and burning questions long enough to restore them to their proper place, and got the server working again. (Mostly.)

I then set about trying to discover the reason the database had been writing to /usr/local/mysql-standard-4.0.20-pc-linux-i686/data instead of /var/lib/mysql. After all, the configuration file in /etc clearly specified that the data should be stored in /var/lib/mysql. I was at a loss for some time, until my mind reached back to the events surrounding August 12. I had needed to drop in a replacement MySQL binary for the existing one, because there was a bug in Gentoo’s portage tree that caused ODBC connections to MySQL to crash the database server for some reason. Following the advice found on the MySQL mailing list, I installed a version directly from MySQL alongside my Gentoo version, tested it with real data, found that it worked, then updated the configuration files and set it in place as the production server. However, it seemed that what I didn’t do was restart the new MySQL server to read the updated configuration files. So it was still using the copy of the data I’d been testing with, located in /usr/local/… (per its default configuration), and the real data directory, /var/lib/mysql, had been left alone in its pre-August 12 state. Therefore this forced reboot finally caused it to read the new configuration files, and it looked for data in /var/lib/mysql–making me think something catastrophic had happened. (That’s right, the server hadn’t been rebooted in the past hundred days. It’s Linux. It’s stable. It needs a UPS. I think the record uptime on the previous server, before I took it down to install this one, was something like two hundred some days.)

Lessons learned: backups will not save you if you don’t back up what you need to later recover, always always always restart any services you’ve been tinkering with to reload pertinent configuration files, and don’t try to run two versions of MySQL side-by-side. It’s just plain confusing, and it leads to problems later.

Which in turn leads to why I am still at work, trying to emerge a new version of MySQL. It seems that in all the hubbub, MySQL, Bugzilla, and ODBC have all come to disagree about where the MySQL socket ought to be….

React

This comment form is Markdown-enabled, in addition to allowing the following XHTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> .