Renewing the STS cert in VCenter

The STS cert is used by vcenter to authenticate against external SSO targets, such as Okta or ActiveDirectory, and it’s a key service for various other services – like the Web UI!

The fun part about STS is that it’s autogenerated at some point (probably during install or upgrade), and it will expire 2 years after that – but vCenter won’t tell you when the expiration is coming up! It’ll just gloriously fail (see https://kb.vmware.com/s/article/82332 and https://kb.vmware.com/s/article/79248), and then you’ll get either an http500 or a “No healthy upstream” error.

To renew the cert, follow the steps on https://kb.vmware.com/s/article/76719, namely:

1 – SSH into vcenter

ssh -l root <your vcenter url>

2 – Move into /tmp

cd /tmp

3 – Create a bash script and paste the script below

vim fixsts.sh
#!/bin/bash
# Copyright (c) 2020-2021 VMware, Inc. All rights reserved.
# VMware Confidential
#
# Run this from the affected PSC/VC
#
# NOTE: This works on external and embedded PSCs
# This script will do the following
# 1: Regenerate STS certificate
#
# What is needed?
# 1: Offline snapshots of VCs/PSCs
# 2: SSO Admin Password

NODETYPE=$(cat /etc/vmware/deployment.node.type)
if [ "$NODETYPE" = "management" ]; then
    echo "Detected this node is a vCenter server with external PSC."
    echo "Please run this script from a vCenter with embedded PSC, or an external PSC"
    exit 1
fi

if [ "$NODETYPE" = "embedded" ]  &&  [ ! -f  /usr/lib/vmware-vmdir/sbin/vmdird ]; then
    echo "Detected this node is a vCenter gateway"
    echo "Please run this script from a vCenter with embedded PSC, or an external PSC"
    exit 1
fi

echo "NOTE: This works on external and embedded PSCs"
echo "This script will do the following"
echo "1: Regenerate STS certificate"
echo "What is needed?"
echo "1: Offline snapshots of VCs/PSCs"
echo "2: SSO Admin Password"
echo "IMPORTANT: This script should only be run on a single PSC per SSO domain"

mkdir -p /tmp/vmware-fixsts
SCRIPTPATH="/tmp/vmware-fixsts"
LOGFILE="$SCRIPTPATH/fix_sts_cert.log"

echo "==================================" | tee -a $LOGFILE
echo "Resetting STS certificate for $HOSTNAME started on $(date)" | tee -a $LOGFILE
echo ""| tee -a $LOGFILE
echo ""
DN=$(/opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\vmdir]' | grep dcAccountDN | awk '{$1=$2=$3="";print $0}'|tr -d '"'|sed -e 's/^[ \t]*//')
echo "Detected DN: $DN" | tee -a $LOGFILE
PNID=$(/opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\vmafd\Parameters]' | grep PNID | awk '{print $4}'|tr -d '"')
echo "Detected PNID: $PNID" | tee -a $LOGFILE
PSC=$(/opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\vmafd\Parameters]' | grep DCName | awk '{print $4}'|tr -d '"')
echo "Detected PSC: $PSC" | tee -a $LOGFILE
DOMAIN=$(/opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\vmafd\Parameters]' | grep DomainName | awk '{print $4}'|tr -d '"')
echo "Detected SSO domain name: $DOMAIN" | tee -a $LOGFILE
SITE=$(/opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\vmafd\Parameters]' | grep SiteName | awk '{print $4}'|tr -d '"')
MACHINEID=$(/usr/lib/vmware-vmafd/bin/vmafd-cli get-machine-id --server-name localhost)
echo "Detected Machine ID: $MACHINEID" | tee -a $LOGFILE
IPADDRESS=$(ifconfig | grep eth0 -A1 | grep "inet addr" | awk -F ':' '{print $2}' | awk -F ' ' '{print $1}')
echo "Detected IP Address: $IPADDRESS" | tee -a $LOGFILE
DOMAINCN="dc=$(echo "$DOMAIN" | sed 's/\./,dc=/g')"
echo "Domain CN: $DOMAINCN"
ADMIN="cn=administrator,cn=users,$DOMAINCN"
USERNAME="administrator@${DOMAIN^^}"
ROOTCERTDATE=$(openssl x509  -in /var/lib/vmware/vmca/root.cer -text | grep "Not After" | awk -F ' ' '{print $7,$4,$5}')
TODAYSDATE=$(date +"%Y %b %d")

echo "#" > $SCRIPTPATH/certool.cfg
echo "# Template file for a CSR request" >> $SCRIPTPATH/certool.cfg
echo "#" >> certool.cfg
echo "# Country is needed and has to be 2 characters" >> $SCRIPTPATH/certool.cfg
echo "Country = DS" >> $SCRIPTPATH/certool.cfg
echo "Name = $PNID" >> $SCRIPTPATH/certool.cfg
echo "Organization = VMware" >> $SCRIPTPATH/certool.cfg
echo "OrgUnit = VMware" >> $SCRIPTPATH/certool.cfg
echo "State = VMware" >> $SCRIPTPATH/certool.cfg
echo "Locality = VMware" >> $SCRIPTPATH/certool.cfg
echo "IPAddress = $IPADDRESS" >> $SCRIPTPATH/certool.cfg
echo "Email = email@acme.com" >> $SCRIPTPATH/certool.cfg
echo "Hostname = $PNID" >> $SCRIPTPATH/certool.cfg

echo "==================================" | tee -a $LOGFILE
echo "==================================" | tee -a $LOGFILE
echo ""
echo "Detected Root's certificate expiration date: $ROOTCERTDATE" | tee -a $LOGFILE
echo "Detected today's date: $TODAYSDATE" | tee -a $LOGFILE

echo "==================================" | tee -a $LOGFILE

flag=0
if [[ $TODAYSDATE > $ROOTCERTDATE ]];
then
    echo "IMPORTANT: Root certificate is expired, so it will be replaced" | tee -a $LOGFILE
    flag=1
    mkdir /certs && cd /certs
    cp $SCRIPTPATH/certool.cfg /certs/vmca.cfg
    /usr/lib/vmware-vmca/bin/certool --genselfcacert --outprivkey /certs/vmcacert.key  --outcert /certs/vmcacert.crt --config /certs/vmca.cfg
    /usr/lib/vmware-vmca/bin/certool --rootca --cert /certs/vmcacert.crt --privkey /certs/vmcacert.key
fi

echo "#" > $SCRIPTPATH/certool.cfg
echo "# Template file for a CSR request" >> $SCRIPTPATH/certool.cfg
echo "#" >> $SCRIPTPATH/certool.cfg
echo "# Country is needed and has to be 2 characters" >> $SCRIPTPATH/certool.cfg
echo "Country = DS" >> $SCRIPTPATH/certool.cfg
echo "Name = STS" >> $SCRIPTPATH/certool.cfg
echo "Organization = VMware" >> $SCRIPTPATH/certool.cfg
echo "OrgUnit = VMware" >> $SCRIPTPATH/certool.cfg
echo "State = VMware" >> $SCRIPTPATH/certool.cfg
echo "Locality = VMware" >> $SCRIPTPATH/certool.cfg
echo "IPAddress = $IPADDRESS" >> $SCRIPTPATH/certool.cfg
echo "Email = email@acme.com" >> $SCRIPTPATH/certool.cfg
echo "Hostname = $PNID" >> $SCRIPTPATH/certool.cfg

echo ""
echo "Exporting and generating STS certificate" | tee -a $LOGFILE
echo ""

cd $SCRIPTPATH

/usr/lib/vmware-vmca/bin/certool --server localhost --genkey --privkey=sts.key --pubkey=sts.pub
/usr/lib/vmware-vmca/bin/certool --gencert --cert=sts.cer --privkey=sts.key --config=$SCRIPTPATH/certool.cfg

openssl x509 -outform der -in sts.cer -out sts.der
CERTS=$(csplit -f root /var/lib/vmware/vmca/root.cer '/-----BEGIN CERTIFICATE-----/' '{*}' | wc -l)
openssl pkcs8 -topk8 -inform pem -outform der -in sts.key -out sts.key.der -nocrypt
i=1
until [ $i -eq $CERTS ]
do
    openssl x509 -outform der -in root0$i -out vmca0$i.der
    ((i++))
done

echo ""
echo ""
read -s -p "Enter password for administrator@$DOMAIN: " DOMAINPASSWORD
echo ""

# Find the highest tenant credentials index
MAXCREDINDEX=1
while read -r line
do
    INDEX=$(echo "$line" | tr -dc '0-9')
    if [ $INDEX -gt $MAXCREDINDEX ]
    then
        MAXCREDINDEX=$INDEX
    fi
done < <(/opt/likewise/bin/ldapsearch -h localhost -p 389 -b "cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" "(objectclass=vmwSTSTenantCredential)" cn | grep cn:)

# Sequentially search for tenant credentials up to max index  and delete if found
echo "Highest tenant credentials index : $MAXCREDINDEX" | tee -a $LOGFILE
i=1
if [ ! -z $MAXCREDINDEX ]
then
    until [ $i -gt $MAXCREDINDEX ]
    do
        echo "Exporting tenant $i to $SCRIPTPATH" | tee -a $LOGFILE
        echo ""
        ldapsearch -h localhost -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" -b "cn=TenantCredential-$i,cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" > $SCRIPTPATH/tenantcredential-$i.ldif
                if [ $? -eq 0 ]
                then
                    echo "Deleting tenant $i" | tee -a $LOGFILE
                        ldapdelete -h localhost -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" "cn=TenantCredential-$i,cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" | tee -a $LOGFILE
                else
                    echo "Tenant $i not found" | tee -a $LOGFILE
                    echo ""
                fi
                ((i++))
                done
fi
echo ""

# Find the highest trusted cert chains index
MAXCERTCHAINSINDEX=1
while read -r line
do
    INDEX=$(echo "$line" | tr -dc '0-9')
    if [ $INDEX -gt $MAXCERTCHAINSINDEX ]
    then
        MAXCERTCHAINSINDEX=$INDEX
    fi
done < <(/opt/likewise/bin/ldapsearch -h localhost -p 389 -b "cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" "(objectclass=vmwSTSTenantTrustedCertificateChain)" cn | grep cn:)

# Sequentially search for trusted cert chains up to max index  and delete if found
echo "Highest trusted cert chains index: $MAXCERTCHAINSINDEX" | tee -a $LOGFILE
i=1
if [ ! -z $MAXCERTCHAINSINDEX ]
then
    until [ $i -gt $MAXCERTCHAINSINDEX ]
    do
            echo "Exporting trustedcertchain $i to $SCRIPTPATH" | tee -a $LOGFILE
            echo ""
                ldapsearch -h localhost -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" -b "cn=TrustedCertChain-$i,cn=TrustedCertificateChains,cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" > $SCRIPTPATH/trustedcertchain-$i.ldif
            if [ $? -eq 0 ]
            then
                echo "Deleting trustedcertchain $i" | tee -a $LOGFILE
                ldapdelete -h localhost -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" "cn=TrustedCertChain-$i,cn=TrustedCertificateChains,cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" | tee -a $LOGFILE
            else
                echo "Trusted cert chain $i not found" | tee -a $LOGFILE
            fi
            echo ""
                ((i++))
                done
fi
echo ""

i=1
echo "dn: cn=TenantCredential-1,cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" > sso-sts.ldif
echo "changetype: add" >> sso-sts.ldif
echo "objectClass: vmwSTSTenantCredential" >> sso-sts.ldif
echo "objectClass: top" >> sso-sts.ldif
echo "cn: TenantCredential-1" >> sso-sts.ldif
echo "userCertificate:< file:sts.der" >> sso-sts.ldif
until [ $i -eq $CERTS ]
do
    echo "userCertificate:< file:vmca0$i.der" >> sso-sts.ldif
    ((i++))
done
echo "vmwSTSPrivateKey:< file:sts.key.der" >> sso-sts.ldif
echo "" >> sso-sts.ldif
echo "dn: cn=TrustedCertChain-1,cn=TrustedCertificateChains,cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" >> sso-sts.ldif
echo "changetype: add" >> sso-sts.ldif
echo "objectClass: vmwSTSTenantTrustedCertificateChain" >> sso-sts.ldif
echo "objectClass: top" >> sso-sts.ldif
echo "cn: TrustedCertChain-1" >> sso-sts.ldif
echo "userCertificate:< file:sts.der" >> sso-sts.ldif
i=1
until [ $i -eq $CERTS ]
do
    echo "userCertificate:< file:vmca0$i.der" >> sso-sts.ldif
    ((i++))
done
echo ""
echo "Applying newly generated STS certificate to SSO domain" | tee -a $LOGFILE

/opt/likewise/bin/ldapmodify -x -h localhost -p 389 -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" -f sso-sts.ldif | tee -a $LOGFILE
echo ""
echo "Replacement finished - Please restart services on all vCenters and PSCs in your SSO domain" | tee -a $LOGFILE
echo "==================================" | tee -a $LOGFILE
echo "IMPORTANT: In case you're using HLM (Hybrid Linked Mode) without a gateway, you would need to re-sync the certs from Cloud to On-Prem after following this procedure" | tee -a $LOGFILE
echo "==================================" | tee -a $LOGFILE
echo "==================================" | tee -a $LOGFILE
if [ $flag == 1 ]
then
    echo "Since your Root certificate was expired and was replaced, you will need to replace your MachineSSL and Solution User certificates" | tee -a $LOGFILE
    echo "You can do so following this KB: https://kb.vmware.com/s/article/2097936" | tee -a $LOGFILE
fi

4 – chmod the script

chmod a+x fixsts.sh

5 – Run the script! Note the first authentication prompt will require the ROOT login (same one you used SSH). The second prompt will require the ADMINISTRATOR@VSPHERE.LOCAL user. Don’t be like me and enter the root user’s password, then wonder why it didn’t work.

./fixsts.sh

6 – Restart the vcenter services, no reboot required.

service-control --stop --all && service-control --start --all

If all goes well, the script should return something like this (notice no errors!)

And then the service restart – vcenter should come online shortly after this completes.

Installing SQL server Unattended

If your environment looks anything like mine, you’ll be spinning up new SQL instances in a pretty regular basis. The SQL install process has gotten much better over the years, but it’s still a slog to go through the same prompts over and over again. My favorite solution for this conundrum is to install SQL Server using a unattended script. It does take a bit of work to get the config file just right, but it will save you a lot of time in the long run.

The heart of the Unattended process is the Configuration file: During a regular SQL install, and as you select options, services, paths, etc., those options are saved to the configuration file. Once you’re ready to start, the installer reads through the file and sets everything up. The key, then, is to save the configuration file before you actually kick off the install. As is, the file is not yet usable, but once we’ve make a few changes to it, you’ll be able to use this same file over and over again.

1st step – Start a new SQL install

Here you’ll just start a server install, like you’ve might have done many, many times already:

VirtualBox_windows server_08_12_2018_08_41_57

Select all the options you’d normally do. For my servers, I always install the engine itself, Integration services and replication, among a few others. You can always change the config file afterwards to add or remove features, or even have multiple files.

VirtualBox_windows server_08_12_2018_08_43_50

Note that in the service settings, you’ll be prompted to enter the service account and password. The password is not actually saved in the config file, but you can’t progress until you enter one, so go ahead and fill it in.

VirtualBox_windows server_08_12_2018_08_48_09

In the “Ready to Install” window, you’ll get a chance to review all your options before you click on the “Install” button, effectively starting the install. DO NOT click Install! Instead, note the location of the configuration file (which I highlighted in the screenshot above), and open that file, then save it somewhere safe! I support multiple SQL versions, so I keep each file in their respective ISO folder.

VirtualBox_windows server_08_12_2018_08_49_50

2nd Step – Edit the config file

And here’s the file! For a smooth and easy server install, you’ll need to modify the items below:

  • IACCEPTSQLSERVERLICENSETERMS – Set it to True. This is the SQL server terms of licensing, and you have to accept it if you want to use it!
  • UIMODE – On SQL 2016 and older, I’ve always just set this to Normal, and QUIETSIMPLE to True. SQL2017 has changed things a bit, so now I’m setting QUIETSIMPLE to FALSE, passing QUIET via the command line (with /Q), and commenting out UIMODE altogether.
  • UPDATEENABLED – This is really handy if you want to install and patch your install at the same time. At work, I have our installs folders organize like the image below. You can’t see it in this screenshot, but in each version folder I keep both the ISO and the decompressed image. You can’t use unattended with the ISO file alone, but I keep the ISO around just in case.
folder structure

In the Updates folder, I keep the last few CU and SP files for their respective SQL versions. When UpdateEnabled is configured, and a proper path is set (you’ll see that below) SQL will scan that directory and install the latest versions of whatever files you have in there.

  • FEATURES – This is what specific services and features you’d like installed. You can always change this later.
  • UPDATESOURCE – This is location where the installer will look for the patch files, as mentioned above. You’ll probably want to use a network share for this.
  • AGTSVCACCOUNT, ISSVCACCOUNT, SQLSVCACCOUNT – This is where you’ll specify the service account for SQL Agent, SSIS and SQL Server. You can use the same account for all 3, or a different account for each, or however your organization prefers it.
  • SECURITYMODE – This enables SQL auth, along with Windows Auth. If you set this to SQL, you’ll also need to specify an SA password in the command line. Generally you don’t want to do that, as each of your SQL servers should have a different SA password. For my lab, I’m just going to set this to SQL. At work, we leave this on windows auth, then after the machine is up and running, I go back in and generate a unique SA password, as part of our configuration checklist (you do have one of those, right?)

A lot of the other details, such as install location, were taken from when you went through the wizard, so they don’t need to be reentered here. Once you’re satisfied with your changes, go ahead and save the file. We’re ready to test it!

Part 3 – SQL Install – for reals now

Launch the command prompt as Administrator, navigate into your installation media directory, then type

setup /?

So helpful! For SQL 2017, you’ll be greeted with all the parameters that can be passed into setup, including a full example of an unattended install command. We just have to fill in the blanks, more or less. Do note that you can run this from a network share, just pass the full FQDN path.

help_exe

Another thing to note is that we only enter the password in the command line. I’m using the same password for all the services, but again, you should use different accounts, and your service account password should be different from your SA password. Think security!

 D:\setup.exe /Q
 /SQLSVCPassword="@C0mpl3xP@ssw0rd"
 /AGTSVCPassword="@C0mpl3xP@ssw0rd" /ISSVCPASSWORD="@C0mpl3xP@ssw0rd" 
/SAPWD=”@C0mpl3xP@ssw0rd” 
/configurationfile=C:\SQLInstall\ConfigurationFile.ini
 

Once you enter all that in, go ahead and hit enter. After a few mins, it should be all done! You’ll get an error message if there are any issues, otherwise you’ll just be returned to the prompt.

command line - finished

And that’s it! You’re now the owner of a brand new SQL instance. I have the install string saved without the passwords, so next time I need to install a new server, I copy the string, enter in the passwords, then hit enter and go about my day. 10 mins later, it’s up and ready for final touch ups (like MaxDop, Memory settings, etc). Happy installing!

services running

Useful links!

Official documentation

Other Server Links

Getting started with VirtualBox

Troubleshooting shenanigans

Recover from disk full errors

We’ve all been there, at some point: You set up a new QA server, but you’re a busy guy, and put off setting up alerts for later. The server gets a ton of usage, and all of a sudden, it runs out of space before you had a chance do something about it. This post is about one we way I use to recover from disk full errors.

In this particular case, we created a new SSIS catalog. The SSISDB database is created by the system,  so at first you don’t get to select where the files are located. Sure, you could’ve modified the database defaults post-setup, but you didn’t do that either! Now the log file is in the data volume, and the volume is all filled up. You’d like to move the log file, but you can’t detach SSISB because, again, the volume is full and nothing works right. So what do you do?

Whenever SQL server restarts, it reads the entries from sys.master_files and sys.databases to figure out where the databases are. When you alter any of the database properties, those changes are registered in that table. So what we need to do here is update those entries (not directly, please!) and then restart the service. Since this particular server is non-prod, restarts are ok! So here’s the syntax:

--run this first to get the current logical name, you'll need this for the next step 
SELECT DB_NAME(database_id),
       name,
       physical_name
FROM sys.master_files; 
--Now the actual trick, where filename is the physical name of the file 
ALTER DATABASE SSISDB
MODIFY FILE
(
    NAME = 'log',
    FILENAME = 'L:\sqllogs\ssisdb.ldf'
);

After this, stop SQL and manually move the file to the new location (as defined in your script — SQL will not move the files for you). When done, start SQL again. Your database should come right up!

Now, let’s say that, in your hurry to get things back up, you restarted the service but forgot to actually move the files. Despair not! As long as SQL hasn’t acquired a filesystem lock on the files, the following commands will allow you to move the files to the proper places. Once everything is in the proper places, the following commands will initialize the database:

ALTER DATABASE SSISDB SET OFFLINE;
ALTER DATABASE SSISDB SET ONLINE;

Useful Links:

Alter Database

More Troubleshooting!

Upgrade shenanigans