SQL Server upgrade shenanigans

It’s late at night, I’m watching Yours, Mine and Ours (the 1968 version), and it’s time to upgrade one of our SQL2014 dev instances to SQL2017. We’d already gone through the upgrade advisor, the issues raised (mostly CLR) were not a big deal, so let’s go ahead and get this done. It’s all smooth sailing until I get this error:

Very odd! After a quick googling, I find this page by Pinal Dave –> https://blog.sqlauthority.com/2017/01/27/sql-server-sql-installation-fails-error-code-0x851a001a-wait-database-engine-recovery-handle-failed/

His suggestion is that there’s an issue with the AD service account, and the fix is to switch to network service. This is dev anyway, and I feel a little careless, so why not? Unfortunately, though, it didn’t work! Not only that, when I tried switching service accounts, I got even weirder errors:

 

I absolutely love the %1 to logoff.

After a restart and another attempt at the upgrade, and getting the same results, I decided to ditch the upgrade summary and take a peek at the sql server log itself. This is the actual service log, saved under MSQL12.MSSQLSERVER\MSSQL\Log. In there, I’m able to verify a few things:

  • The server starts
  • It tries to run the upgrade scripts

So that’s good, right? Then I find the real culprit (database names hidden to protect the innocent):

Based on this message, what the issue appears to be is that the upgrade script is trying to upgrade some system objects (you can tell they’re system objects because the begin with MS%), and when that fails, it fails the entire upgrade. Interesting! I know that this particular object is left over from some weird replication issues that have not been properly cleaned up, so it’s safe to drop them. But how? I can’t even get the service to come up, but I know I have a few options:

  • I can look up the trace flags that skip upgrade scripts (This very handy link lists all the trace flags –> http://www.sqlservercentral.com/articles/trace-flag/152989/ ) The correct flag is 992 902 — this flag will skip upgrade scripts in the master database. Once the server comes up, I can fix the issue, remove the flag, then retry the upgrade
  • I can be lazy and rename the data file and log files in the filesystem, then restart the upgrade. In theory, SQL should fail to attach the database (because I renamed the files!), then apply the upgrade scripts on any of the databases that actually mounted, and then complete the upgrade. I’m feeling adventurous, so I try that, and it worked! I have a working instance again.So now I have to clean after my mess. I switched the service account back to the proper AD account, and I also stop SQL and rename the data files to the correct extensions. Once I start SQL again, the database mounts properly… I think it’s odd that it didn’t fail the upgrade scripts, so I can only assume that the upgrade scripts did not run again. I’m not super worried, at this point, because next up is fixing the issue by dropping the orphaned replication objects. Since I have to apply CU11 afterwards, I figure that at that point the upgrade scripts will be applied. 10 mins later, I was done.

What are the lessons learned? First of all, before you run an upgrade, make sure your databases are clean, with no weird system objects laying around. Second, if you have a failed upgrade, take a look at the SQL server log itself, there’s usually a ton of useful information in there to help you get up and running again.