Posted on 29th Oct, 2021 in Production
Previously on Servergate, after being the first responder to a catastrophic hardware failure, we spent 5 days rebuilding our production systems from scratch. Now the task of recovering missing data and building out development and testing environments remains. Personally, I think it's kind of weird to be back here writing this series after a nearly 6 month break. However, I feel compelled to finally finish this story.
Finance and the Machine
At 4:18 AM on a Monday, a day before we would bring our systems back from the brink of death and at an hour when any sane person would be asleep, the gears of IT's political machine were set in motion. It was at this the exact moment that our boss, who has a long history of burning bridges and doing whatever they please to achieve their goals, would send out a meeting invite for "Short Term Solutions for Hardware Monitoring and Data Replications". Unbeknownst to me and the rest of our team, this would be the beginning of a comeuppance over a decade in the making. That meeting was scheduled for the following week and included: our boss, M. Bison; our lead firefighters, Ken, Ryu, Rashid, and myself; and the top 2 ranking members of IT's Infrastructure team, Bert and Ernie.
Due to the duration and scope of our service outage, the rest of IT had to be informed as to what was happening. While I was off fighting fires on the ground, M. Bison had to report to the Senior Leadership Team and the head of IT. After years of our boss antagonizing them as they tried to reorganize IT and well over a decade of antagonizing them before our department was even a part of IT, Leadership finally had the opportunity to return the favor and you best believe they did. For starters, IT's vendor and contract management team began to scrutinize our every move. They went so far as to contact Dell to find out why we had contacted them and then proceeded to interrogate them about our hard drive order (despite it not having been fulfilled by Dell). Our account representative did not deserve to get dragged in to this, but soon 3 different people from Dell were also ensnared in the situation. When they failed to find fault there, they then moved to obstruct any other expenditures we made as part of recovering our systems.
However, as we were still mid-crisis we also sent out requests for technical help from other departments that specialized in managing hardware. One by one their representatives came back with responses akin to "We don't have the resources to help you right now". When we approached Ernie's team for assistance, he told us "Ooh? why didn't you have better backups? This isn't our problem." I guess over a decade of having such a fraught relationship with M. Bison will do that to you, but that didn't stop me from being furious with him. We are allegedly a single organization set up to support the mission of the University, so to deny your coworkers your help in their hour of need because of a personal grudge with their boss is unacceptable behavior.
With all this happening in the background, we finally restored our systems and the meeting rolled around. What one would expect to be a meeting about building resiliency for our systems in the short term, was actually a firing squad session. A number of uninvited parties showed up and M. Bison presented a Root Cause Analysis document that they had ordered someone to draft up. (It's still a mystery to me how M. Bison knew this would happen and was prepared for it.) Although the document successfully papered over the disaster, our guests made sure they got their shots in. Personally, I was still angry with Ernie and I made sure to let him know it as we exchanged carefully coded barbs across the conference room table. Once all was said and done, we never met again. Both Bert and Ernie would leave for greener pastures 3 months later.
So now we've covered everything except for the conclusion of this mess. Our systems were back up and running but we were missing a couple of years of financial data from a gap in our backups. The now inoperable SAN was sitting in a midtown data recovery office, awaiting analysis. The analysis came back on the same day our political troubles began. They were fairly confident that they could recover some or all of the data on the SAN. However, this came at a steep 5 figure cost. In order to be able to pay them, we'd need to get approval from up the food chain. Our first big obstacle was IT's vendor and contract management team, who was already acting in full obstructionist capacity. As they dragged their feet on the formal vetting of the data recovery vendor, M. Bison worked on getting the expense approved. Weeks of meetings and circular negotiations ensued, as the vendor sat with a pile of our hard disks in their offices.
Every time I'd inquire as to how that was going, I was told that we were still seeking approval. Approval would never come though. After the matter escalated all the way to the head of IT, they refused to approve the expense. We truly were on our own. The only favor they would afford us was to reimburse Ken for the expenses he incurred in getting our SAN to the vendor in the first place. At some point in June we had to retrieve the SAN that had now been in the data recovery vendor's office for over a month. We brought it back by subway. As for our missing financial data, we gave up and recreated the bare bones of it using data from other systems.
For reasons, all IT staff went through a period of about a year in which we were forced to record how we spent every minute of our days. We suspected that the highest levels of management were using the data to figure out what projects they wanted to ax but that's neither here nor there. Servergate happened to fall in the middle of this truly cursed year and so at the end of every week I was required to submit data about where my time was going. I viewed this process with great contempt, often submitting my reports late and with guesstimated numbers.
In the wake of Servergate, I figured I might as well count how many hours I spent herding cats and putting out fires. In the two week period that followed I had put in a whopping 151 hours. Considering my work week is supposed to be 35 hours long I expected, nay, demanded some sort of compensation for this. M. Bison had promised Ryu, Ken, Rashid, and myself comp time. The prospect of receiving as many hours as we had worked outside of our regular schedule in vacation time was quite enticing. We already had a ton of vacation time naturally, but this was easily 2 weeks worth of time off on top of that. As the chaos subsided, my thoughts turned toward taking an extended vacation in a foreign country.
Once things were back to normal, we inquired about the time that was promised to us. Our boss assured us they were working on it. Weeks passed and we continued to ask and be met with the same answer. As summer rolled in, M. Bison got a single day of comp time for some of us and nothing for everyone else. They cited a lack political capital, which at this point could be plausible given everything that had just transpired but this also fit their pattern of making long-term promises they had no ability or intention to keep in exchange for short-term results. To say I was livid would be an understatement. Fortunately, other matters would soon take my mind off the matter. There were conventions to attend, flights to Japan to book, and interesting people sliding in to my inbox. Those are all stories for another time though.
This is officially the end of the Servergate saga. Thank you for sticking with me as I slowly unwound this disaster. It's kind of hard to believe that after all that I still work there. If there's one lesson to take away, it's to have no loyalty to your employer because they have none to you. If I have some free time I might write about what happened next. But for now, I think I will procrastinate dealing with my problems with other distractions.