Jun 22 2008

Hosting Woes

Tag: Computers and Web DesignAlex @ 8:22 pm

Tom and I have had a hosting account with Startlogic for a few years now, it was very competitively priced and we wanted a cheap VPS. We haven’t had too many problems up until now, the uptime is good and the problems faced so far have only been with slow response times to online tickets and email.

This post gets into technical details of the DNS system and what went wrong so unless you’re interested in reading technical details, you may want to skip down to the end of this post. [skip]

In terms of DNS, our server runs named (BIND) so it can be a nameserver for any domains we host. This really isn’t a very good option for so many reasons. Two of these reasons are:

  1. You should use two separate name servers which are geographically dispersed.

    Using a single server for a nameserver means there is a single point of failure.

  2. Downtime on the VPS account will mean that after a while, the DNS records cease to exist on the internet.

    One effect this has is that if the VPS account goes down or needs to be rebuilt and it is down for longer than the Expiry time on our domains, mail sent to those domains will be bounced and not be resent automatically.

There are plenty of references for this information on the internet.

StartLogic set us up by putting our primary domain name’s DNS record on their geographically dispersed name servers meaning it would be quite resilient to network failures.

They then told us that if we wanted additional domains pointed, we could just email them. This is what we did for about 2 years.

Tom actually forgot this and has a few of our client’s domains pointing to ns1 and ns2.yigg.net (as Startlogic suggest now).

Anyway, at some stage, they accidently signed us up for an additional account, a standard hosting account, under our primary domain name. This hadn’t caused any problems at the time because they didn’t touch the DNS records.

What happened was that they recently upgraded all standard hosting accounts to a new platform. This meant the erroneous account was moved onto a new server. Last week they completed the migration and changed the DNS record.

Suddenly, I couldn’t get to our control panel*. Yigg.net was resolving to a completely different IP. So I figure there as been a problem and phone through to them.

Now I must mention here that earlier this year we received an email from a representative at IPowerWeb, the parent company (?) for StartLogic saying that he was our point of contact if we needed anything, we could call him.

So, I rang through and discussed somethings and then got put through to a technician. After a while he figured out what had gone wrong after I explained the situation and he found out VPS account and said he’d get those DNS records corrected and pointed back to our VPS account. He said he would also delete the erroneous standard hosting account.

Only the latter got done, probably a mistake, the guy I spoke to seemed really helpful and the extra hosting account got deleted quick smart. The account switched to a holding page instead of the “VDeck Default” page it was showing before. So I waited to see if their servers would update and start announcing the correct IP again.

You need to wait up to 48 hours for a DNS change to reach everyone in the world (worst-case) but there is a trick to finding out if the change has started propagating. On Mac OS (and most *nix machines) you can query a DNS server directly for what it thinks the IP address is for a domain. Because I knew the name of the authoritative server with the record on it, I could ask it directly and see whether it was announcing a the correct IP.


# dig yigg.net @ns1.startlogic.com

; < <>> DiG 9.4.1-P1 < <>> yigg.net @ns1.startlogic.com
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER< <- opcode: QUERY, status: NOERROR, id: 53630
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;yigg.net.			IN	A

;; ANSWER SECTION:
yigg.net.		3600	IN	A	66.96.134.55

;; Query time: 291 msec
;; SERVER: 66.96.142.100#53(66.96.142.100)
;; WHEN: Sat Jun 21 23:35:01 2008
;; MSG SIZE  rcvd: 42

It wasn’t.

I actually forgot about it until lunch time the next day when I Tom got an email from a client saying his email was no longer coming through. Turns out, his name servers were set to ns1. and ns2.yigg.net which now resolved to Startlogic’s standard hosting account, not to BIND running on our VPS account.

So I wrote a support ticket explaining the situation and then phoned America to get them to sort it out as soon as practically possible. The support representative said a technician was looking at it right then. So I hang up and waited… and waited. No change.

The next day I updated the ticket again trying to clarify what needed to be done, point yigg.net back to our VPS’s IP address and away from the IP it was incorrectly set to previously.

Today, later I rang them again to see what was going on. The representative then proceeded to tell me that what I wanted wasn’t possible any more and that I needed to set up name servers myself. I got quite confused as to what she was asking me to do because she said that I wouldn’t need to sign up with a DNS provider, I could do it all through the registrars control panel.

The main problem with that, is that I didn’t know where that was, as we registered the domain with StartLogic, they had taken care of registering it, setting up the contact information and adding their name servers to the records. I realised when she told me the URL to go to that I had been there before, to update my contact details when I was told I had to. She couldn’t tell me the username and password so I had to figure that out myself, which I did without too much trouble.

What I couldn’t figure out was what she was asking me to do. She said I needed to “create a name server” for my domain name. Now, in normal domain name management talk, that means setting up a box running BIND and telling the registrar to hold a glue record pointing to this box. It turns out that she was simply asking me to add glue records at the registry level for our VPS (which is running BIND already).

There is a difference between creating a name server and simply pointing A records at a registry level at it.

If anyone from Startlogic reads this blog post. Please reconsider this policy, the way it worked for the last two years was much more reliable and didn’t require a user to go into that unbranded OpenSRS management console to try and decipher fairly non-standard terms.

In terms of customer service, they get a low score. I got it resolved in the end, if it hadn’t I would have called my contact at IPowerWeb and got put through to someone who may’ve been more helpful, I’m not sure though.


* I could still get to our control panel as I just needed to access the correct at our IP, it doesn’t require a hostname to get into the control panel but I think I was only one who knew that (don’t know if Tom did and I knew James didn’t).

** If anyone has a problem with this post (including StartLogic), I will take it down and replace it with one about not using a single domain name server in a VPS situation, which is still relevant but applies to a few more hosting providers.


Jun 18 2008

Firefox 3

Tag: Computers and Web DesignAlex @ 12:09 pm

Firefox 3 was launched this morning (5am NZ time). Mozilla Foundation have made a huge marketing effort to the point of setting a world record for the most downloaded software in 24 hours. Now, you’d expect, with a market share just shy of 20% of the global web browser market share and that kind of marketing, you would need some serious distribution technology.

Mozilla.com web traffic is pushing well over 2 Gigabits a second of just pure HTTP traffic. That is in addition to the 13 Gigabits a second or so of download traffic. We are still at around 14,000 download/minute and mozilla.com is responding well! Go Mozilla community and IT team! schrep – June 17th, 2008

Considering the numerous reports of the Mozilla.com website being down for at least the first hour of the record attempt, lloks like something in the middle couldn’t quite handle the load. At 15Gb/s that is close enough to 2GB/s, that is fairly large.

The official counter at SpreadFirefox.com shows 2,135,640 as at midday here, only 7 hours in. Either their counter is off or things have calmed considerably. If they were still blasting data out at 2GB/s the counter would be closer to 7 million by now.

I’m not going to go in to the features of Firefox 3 too much but for Mac Firefox junkies, it is a massive update. The interface is now using native widgets and is much faster. It also brings Firefox back into the running with Safari and Opera again in terms of standards support.

If you download it today, from http://getfirefox.com you’ll help Mozilla set a world record, the servers seem to be coping now so it downloads quite quickly however the Mac version weighs in just over 17MB which is on the steep side.


Jun 10 2008

deleteCell may have side-effects

Tag: Computers and Web DesignAlex @ 12:13 am
UPDATE:

I have created an interactive test case and an automated test case.

Turns out the behaviour is reproducible. Now, is this a bug or am I just missing something about the way that method should work?

I was debugging a client-side heavy application today in IE7 and ran up against a very weird bug. Anyway, even with the Microsoft Script debugger, it took me a couple of hours to trace down the cause of this one bug.

The system has a page which works similarly to Apple Mac OS X’s Finder windows in Column View. In IE7, I was having a problem where all the children of a node inside a table cell where being removed from the DOM when I removed the table cell itself. If you think this sounds normal, let me explain.

In Javascript, DOM Elements exist as Javascript objects and can be attached into a document at any point, they do not have to be a document, they can exist solely as fragments. The normal DOM ‘removeChild’ method does exactly this; it detaches the child from the parentNode but if there are any references to the DOM element in javascript-land, it does not get deleted (or garbage collected to be more precise). This means you can reattach the node somewhere else after some processing.

Safari, Firefox and Opera appear to do this with the deleteCell method of a ‘table’ as well but Internet Explorer appears to do something different. There is a little bit of speculation here as I haven’t created a minimal test case of this bug yet but it appear that IE detaches all the DOM Nodes recursively beneath the cell it deleted. This doesn’t appear in the MSDN documentation so I’m not sure quite what is happening, am I experiencing a bug caused by something else maybe.

How would that be useful!? I can’t see a purpose for it but whatever.

My fix was to remove all the child nodes of the ‘td’ element using removeChild first, then delete the cell. That meant the elements that were in the cell before hand still existed if there were any references to them, which there were.


Jul 30 2007

Millions of domains - Single Sign-in

Tag: Web DesignAlex @ 10:40 pm

Alright, “millions of domains” is a bit of an exageration, hundreds is more accurate. 613 to be exact. That’s how many domains one project I’ve been working on has. What makes this situation a little stranger is that all these domains point to the same hosting account. I’ll give a bit more information about the project in a bit.

There is one feature of this project which required a bit of extra thinking and that was having a single-sign-on from any of the domains. The feature would need to allow a user to login from one of the 600 domains and still be logged in if they accessed the site from another. The problem doesn’t get complicated until you add in that the idea was to not have the address in the address bar be a single site but stay at the domain the user typed in and that frame-based “masking” was out due to IE6 SP2 (might have been IE7) by default, blocks third-party cookies - which is what the login cookies would have been if using frame-based “masking”.

My solution to the problem probably isn’t the most elegant but it does the job for now.

When a user enters one of the domains, if there isn’t an authentication cookie in their browser for that domain, a redirect is performed. The user is sent briefly to a central domain which checks if the user is logged in on the central domain and if they are, send back the session_id in the URL. Now, because all these domains ultimately lead to one hosting account on a single server, the session_id sent back works perfectly and allows them to log in. At this point, the system also attaches a cookie to the current domain in case they come back to this one, another redirect won’t be required.

When a user goes to one of the domains but isn’t logged in, they are of course briefly sent to the central domain which comes back saying they aren’t logged in there either and the login page is shown. One filled in the login form is posted back to the central domain which authenicates the details and redirects the user back to the domain they typed in, along with the new session_id. This now means the authentication cookie is set on the central domain and it will now be set on the domain they typed in.


Jul 30 2007

Internet Explorer Friendly Error Pages

Tag: Web DesignAlex @ 10:15 pm

In the past I’ve read hundreds of articles about the failings, “features” and related idiosyncrasies of Internet Explorer but until last week I hadn’t heard of this one. When we tested the site in Internet Explorer we found that every page would load correctly and then, right at the last second, change to the standard 404 error page saying that the page could not be found. If the user had turned off the “friendly” error pages in Internet Explorer the site performed fine.

I don’t know about you, but I wasn’t too sure where to start.

After thinking for a while, I checked the headers being sent by the server, they were all fine and dandy. Next I started ripping code haphazardly from the page until the quirk didn’t exhibit itself any more. Turned out that the whole problem was located in the stylesheet. To confirm this, I emptied the stylesheet and reloaded the page, it was fine.

After chopping back and chopping back and then slowly adding back in, I discovered that our designer had used the csshover.htc behaviour file to allow :hover on all elements in Internet Explorer 6. He has used this before successfully many, many times so I knew that something else was amiss.

Anyway, 5 minutes later I discovered that is was being referenced incorrectly from the stylesheet, the URL was given relative to the main page[1]. The 404 was being generated when the onload event fired for the page when it couldn’t find the .htc to include.

Moral of the story, check your damn code people. A missing forward slash in a stylesheet turned into a 45-minute debugging session.

Another one to chalk up as to why Internet Explorer should die.

[1] For whatever reason, I think IE was looking for the behaviour in the folder relative to the CSS (which is correct), not finding it and then looking relative to the document it was included from. I say this because the front page didn’t give the 404 whereas every other page did.


Feb 01 2007

Internet Explorer 6 Performance

Tag: Computers and Web DesignAlex @ 5:18 pm

I spent a huge part of today trying to determine the cause of a major Internet Explorer performance and rendering regression in a Web Application I'm developing. The reason it was such a bitch to find was because so much code had changed since the last time I had done a proper test in IE6. This was in code which I didn't expect to cause rendering issues, performance issues maybe, but those could have been dealt with.

So, I went through carefully ripping apart the project, disabling stylesheets, removing images, removing individual styles to figure out what was going on. About a hour later I hadn't got very far. I had determined that every1 browser was fine except IE6.

In frustration, I turned to Google. Trying to find pages describing huge black areas rendering over the page while using Scriptaculous Sortables got my nowhere so I tried searching for causes of huge black rendering artefacts in IE in general... nothing.

I turn back to trying to make a minimal test case. I ended up removing an image that was inside each Sortable item and suddenly the rendering problem went away and the performance jumped up significantly. Anyway, some dilly-dallying around and it turns out it was the javascript I'd written causing problems at all. It was the IE6 CSS Alpha PNG hack which was causing all the problems (which was working fine without performance issues the last time I tested this site in IE6).

It is indeed common knowledge among web developers that IE6 does not support transparency on 24-bit+Alpha PNG images. The alpha channel is rendered [usually] in a baby blue colour instead of being see-through.

To remedy this I was using the fastest hack around the issue that I have found and it involves pasting 5 lines in the CSS file for a site and bingo. I didn't make this hack but here's how it works. It uses the proprietary filter CSS attribute combined with the DirectX ActiveX control to draw the PNG, to target only PNGs and the also proprietary CSS extension 'expression' which uses JScript to compute values. I haven't had a problem with this hack yet, it doesn't handle CSS background images but I haven't needed it to yet.

CSS:
  1. img {
  2.     height:expression((this.complete&&(String(this.src).substr(String(this.src).length-4,4)==".png"))?"1px":"");
  3.     width:expression((this.complete&&(String(this.src).substr(String(this.src).length-4,4)==".png"))?"1px":"");
  4.     filter:expression((this.complete&&(String(this.src).substr(String(this.src).length-4,4)==".png"))?("progid:DXImageTransform.Microsoft.AlphaImageLoader(src="+this.src+")"):"")
  5. }

This combined with the addition of a PNG on each individual Sortable Item caused the poor DirectX Transform to basically crap itself and render parts of the window black. Very black.

Blah.

What a waste of many hours of my day.

Honestly though, this rendering bug caused a few 'wow' moments from people in my office when they saw what IE was displaying on the screen and how it took roughly 20 seconds and 100% CPU to drag one single item between two lists. I'll have a screen-shot of this up in the next few days.


#1 every means "Safari, Firefox 1.5, Firefox 2.0, IE7, WebKit nightlies and Opera 9"


Jan 25 2007

Productivity++

Tag: Apple and Web DesignAlex @ 9:22 pm

This afternoon, Tom got me to check out a SIMBL plug-in named megazoomer. After upgrading my version of SIMBL and dropping the megazoomer bundle into its place in my Library folder, each Cocoa program I started from then on had fullscreen capability.

With the default key combination that seems borrowed from Windows ( + - similar to Alt + Enter which is full-screen/window mode toggle for apps on Windows) it sends the Application full screen.

The big deal for me was that it worked multi-monitor flawlessly even allowing 2 windows to be full screen on different monitors creating an uninterrupted workflow between the two. Now I can set up a text editor on one screen and Safari on the other at work and not have any distractions.


Jan 15 2007

PHP4, Constructors, $this and Output Buffering

Tag: Web DesignAlex @ 1:56 pm

Just spent half an hour trying to figure out a very random bug in a PHP4 application I'm developing at the moment. I'm using an MVC design pattern and in the constructor for the View object I wanted the class to set itself up to capture all output using output buffering. I've used this style in a couple of sites now and once it was all figured out, it worked fine but this time I was doing one subtle thing differently.

This time around I was using the following code:

PHP:
  1. class viewXHTML
  2. {
  3.         function viewXHTML($model)
  4.         {
  5.             if ($model === null)
  6.                 return;
  7.  
  8.             $this->model = $model;
  9.  
  10.             ob_start(array(&$this, 'output'));
  11.         }
  12.  
  13.         function output($text)
  14.         {
  15.             []
  16.         }
  17. []

Which sets the model to an instance variable and provides the output buffering function a callback to the output function of this object. This works fine and dandy, when the page is done, the output function gets called with the parameter being all output that was sent using echo and print statements. The problem is only visible when you want to access instance variables of your object within that function.

Theory One:

The output function was being called on a copy of the original object. Problem was that some of the instance variables were set but the ones set after the constructor weren't. Turns out that the $this variable you get given in the constructor is a special object which you can't pass by reference, only by value.

This was all fixed by calling a bind method after the constructor returned so the code is now similar to:

PHP:
  1. class viewXHTML
  2. {
  3.         function viewXHTML($model)
  4.         {
  5.             if ($model === null)
  6.                 return;
  7.  
  8.             $this->model = $model;
  9.         }
  10.  
  11.         function bindOutput()
  12.         {
  13.             ob_start(array(&$this, 'output'));
  14.         }
  15.  
  16.         function output($text)
  17.         {
  18.             []
  19.         }
  20. []

The bindOutput function is now called straight after the constructor in my bootstrap code.

Anyway, this mildly reiterates to me that I really shouldn't be pushing PHP4 to do things that I should be using Ruby, maybe Python or at very least PHP5 to do.

This was in PHP v4.4.1.

edit: typos for the win
edit2: added syntax highlight plugin


Jul 03 2006

Backed by WordPress

Tag: Life and Web DesignAlex @ 2:25 pm

I installed WordPress 2 last week and decided to have a play around with the capabilities and see whether it would be a feasible alternative to my custom CMS which is missing a few features (mainly any blogging capability).

I decided to see if I could port my current site's visual style to a WordPress theme and instead of starting from scratch, I decided play around with K2. It turned out to be surprisingly easy. The fact is, K2's DOM Tree was almost exactly the same as the DOM tree for my site. The differences were mainly around the fact that I'd used ID attributes for my main content divs and K2 used classes, simply fixed, took about 10 minutes to complete and now I have WordPress in my original theme.

I'll probably go back through later today and reinstall the K2 theme and just change the stylesheet instead of what I did which was edit all the files to change the classes/ids around.

I've seen a couple of issues so far, one is that the comment box stretches right out past my container div. The other is that I'm not using any styles from K2 so there aren't any nice things happening with quotes and lists etc that K2 had.

Something I noticed about Themes in WP is that you can almost totally forget about the page content when you're messing around with the ordering of divs and each file is quite independant, finally being brought together in the template files. Nice.

Anyway, I intend to actually blog to my site, I've found a WordPress-integrated wiki which may serve as a replacement for my 'Resources' module, the search is a little better than I had mine.