My Azure Hosting Hiccups, or "How to Shoot One's Self in Your Own Foot"

As you might have read, I moved my website onto Azure a couple of weeks ago. I have not looked back at all. Well, okay. Two events made me rethink my strategy around hosting on Azure. One was my own doing and the other is a conflict between DotNetNuke and the Azure SQL model. Both were resolved and I am again 100% on hosting via Azure, until the next problem rears its ugly head.

Let's review how I got to today. First, I started out on Azure with my DotNetNuke instance using the DotNetNuke Azure Accelerator. It was a miserable failure and I was floundering. I also had other issues going on that night with various technologies and decided to skip it. Then, I found the ease of setting up my Azure hosted DotNetNuke CMS system. Success!

Let's move on to last Saturday, March 2nd. I decided to do some re-configuring of the website on Azure. First thing, I reviewed my account and my bandwidth and processing needs were pushing the limits of the free account. I had to change from their "Free" webhosting instance to the "Shared" model. On top of that change, I wanted the URL to be my own website's URL and not the version that is created when you setup a website on Azure. Lastly, I wanted to use a publishing system so I could upload changes to my site when update came out. In my case, the only one I had some experience in (and not very much as I find out) was GIT but I did not want to tie my Azure site to GitHub, so I selected localized GIT on my desktop. With all of these actions, I pulled out the gun, filled and loaded the magazine, chambered a round, and pointed it at my foot.

Sunday morning rolls around and I get a text message page at 6:30 am; my Azure website is offline. HUH? How can it be offline? Did Azure have another one of their illustrious outages? Looking at the site on my phone, I got at 502 error. Ummmm … "Bad Gateway"??? Thinking my DNS was having issues, I went to the default Azure website URL and got slapped with another 502 error. My site was down! Jumping out of bed, I fumble into my computer and start to look at the issue. I pulled up the Azure Portal, my site, my monitoring services and my VM hosted mail server to get an external perspective on the issue. No matter how many times I pressed SHIFT-F5, the site was down. I checked all browsers and still the same. I had the monitoring service check from all of its servers; still down. Looking through the Azure portal, nothing seemed to be misconfigured. Checking the Azure DB, no issues were seen there. Last check was looking at the webserver logs from Azure; the logs did not show anyone visiting the site. Huh? How could my attempts from my phone, home computer and hosted VM not register in the Logs. I restarted the website services and nothing in the logs. One more SHIFT-F5 and "Ta da!", website functional. HUH? BLAM! That hurt.

I don't like having mysteries. One of the toughest thing for me in my IT world is to have something fix itself and not know what the root cause is. Many of you might remember IBM's commercials around the "Self-Healing Server Pixie Dust". I mock these commercials because parts of servers can fix themselves but others cannot. System Admins are still a necessary group of people no matter what technologies you add to hardware or software. Giving those professionals the information they need to perform good root cause analysis is more important than self-healing. Yet, this is what I was looking at. Nothing in the logs, in the stats, nor in the code told me what was wrong. Nothing like this happened the 7 days I was hosting it on the "Free" model. Being a good IT Operations person, I started rolling back my changes. Doing the easy stuff first, I reversed the DNS work and then went to breakfast. During my meal, I got 10 pages that my site was up, then down, then up, then … well, you get the idea. After breakfast, I went home and switched the site back to the "Free" model. I waited for any changes and was met with similar pages and watching my site go from non-responsive to responsive. My final thought was that the problem must be in the GIT deployment system.

The story turns very interesting at this point. Reviewing the settings for Azure, there is no way for an Azure administrator to remove a deployment system from a website. No mechanism is in the Azure Portal to change once a deployment system is selected. I was stuck with an unstable site and no way to revert back what I did. It seems Azure's method is to just recreate the site. I copied the code from my Azure website to my local computer, deleted the Azure website and created a new one in Azure, copying the code back from my desktop. Thanks to many factors, the file copying seemed to take hours though, in reality, it took 35 minutes for both down and up loads. I clicked on the link for the new site and ".NET ERROR". A huge sigh and facepalm later, I delved into what was going on. DotNetNuke was missing key files; my copy from the internet did not include them. Instead of trying to figure out where I went wrong, I reviewed what I had: an Azure website with code that was bad and an Azure SQL DB with my data. To make it easy for me, I decided to just build a new DotNetNuke installation from scratch with a new DB. Then, recopy my blog data back in to complete my work. After approximately 2 hours of work later, my site was back up and running again on the Azure URL. Success!

Going over all of the changes I wanted to make, I decided to separate out the changes and leave them for 24 hours to verify that it would not affect my site. The critical change I needed to make was changing from the "Free" mode to the "Shared" mode for the website. Azure would block the site if I did not do this because I was over my resources. This was a "no brainer" for me so this was my first change. I re-enabled my redirect from the server that hosted this site before and all was working again. Monday night rolls around and all has been stable. My next change, the URL to my domain name, was prepped and executed. My site was stable for the rest of the night and into the next day. My analysis was correct, the configuration of GIT as a "publishing" system was the cause of my outages on Sunday. Tuesday night led to a lot of review of Azure web publishing. All of the information I was able to find led me to my final conclusion; I am not developing my own code and do not need publishing. None of the systems would help me and only looked to make things more difficult. In its current mode, I can FTP files up and down from the site which is good enough for me.

Let's move on to Wednesday. I received a notice from DotNetNuke that they released 7.0.4 of their system and my site is currently running 7.0.3. I should upgrade it to make sure I am safe, secure and stable, right? As I started to download the code for the update, I got the gun back out again, filled and loaded that magazine, chambered a round, and got it aimed right next to the hole I put through my foot on Sunday. Using FTP, I uploaded the update code and pulled up the upgrade installation page. I waited for the upgrade to complete while working through my e-mail. When it completed, I turned and saw "Completed with errors". BLAM! I got to stop shooting myself like this.

One of the modern advantages of DotNetNuke is the logging that upgrades and installs do now. I was able to pull up the installation log and get the exact error messages from the upgrade installation: 3 SQL errors when it was processing the SQL upgrade statements. Looking at each error, the error messages were confusing to me. In two of the errors, the upgrade tried to determine if an index was in place and then remove said index to replace with a new one. Yet, when this was performed on my Azure DB, it threw an error saying "DROP INDEX with two-part name is not supported in this version of SQL Server". How am I going to fix this? For those of you that don't know, my start in IT was in SQL DBA and programming. I dug out my rusty SQL skills and started through the database alongside online the MSDN website for Azure SQL. In no time, I figure out what I need to do to modify the DotNetNuke code and run the SQL statements against my Azure SQL DB. The third error was even more interesting. The DotNetNuke code wanted to verify that a default value was set for a column in one of the tables. The way this is done normally in SQL Server is to query against the sys.sysconstraints system view. The problem with this in Azure SQL DB is that there is no sysconstraints view available. The SQL statement that ran returns "Invalid object name 'sysconstraints'". More digging and I found my answer; Azure SQL has the new Catalog Views of check_constraints, default_constraints, and key_constraints available. Quick change to using the default_constraints view and I found that the desired default was in place. My upgrade is now complete and a success.

As you can see, I did all of the damage myself; I cannot blame Azure for it. My impatience to not read all the way through and just get things going caused my own downtimes. I have no doubt my thrifty behavior will also be my downfall when Azure has any sort of outage in the US West Websites or SQL DB layers. If I want a website that will not go down, I need to create and pay for the Azure infrastructure to do that. For now, I am super happy with my decision. To the cloud!

Are you thinking about moving your website into a cloud provider? If not, what is stopping you from doing that? Post your questions and comments below.