Is it just me, or does the current rash of “melted servers” on the Warrior Forum start to feel like it’s time internet marketing took IT seriously?
If you were anywhere near the launch of Ninja Curation Profits yesterday, you’ll have seen the emails flying about server-melting traffic and then, at one point, the rumours turned into a fully-fledged denial of service attack. Was it? Was it really?
Or was it just another example of a WSO being killed by the systems we have all grown up using. Don’t get me wrong, the creators seem to have tried to get the launch server right – certainly, the hardware was in the right ballpark, but when I finally got logged in I found just another WP setup awaiting me, running OptimizePress no less. (Review to follow :))
Lessons From My Past
I sat there for a while and then thought it was about time I shared my darkest secrets with everyone on here. It turns out I used to work on large internet system deployments. So, the truth is out….I helped to get the Inland Revenue (HMRC) working in the UK. Sorry for that!
Here’s an interesting thing about the HMRC. Lots of people do internet filing of their tax return nowadays – the deadline is near the beginning of April. How many do you think do their return in the previous 360 days (say)? It’s a ridiculously small number – everyone leaves it until the last week and a huge percentage do it on the last day. It used to be about 85% of all returns in the UK on one day!
I’m sure you can see where I’m headed with this. Strangely, they never seem to “melt their servers”. Tax happens, we all go home poorer, but no piles of server slag ever appear. Admittedly they have big servers on their side, but they also have some notion of software architecture too and that is where this is all leading.
Building a Better WordPress Architecture for Heavy Traffic.
It occurs to me that much of the IM world sees WordPress as “good enough”. We love its functionality and the way we can get plugins for things. We love the fact that it is free and that it is available on almost every hosting system without us having to do much. What we don’t ever seem to do, is to question its suitability for running large Warrior Forum launches.
So, what can be done?
Fix Dynamic Page Generation
Up front, the absolute minimum for any WP launch platform should be the guarantee that it is serving up “static” content. If you don’t know the term, it describes pages that don’t change. They are fetched from somewhere and pushed out of the door without any intervention in between.
This is not the normal routine for WordPress at all. It gets the page (or post) and does clever stuff to all that php code looking for places where you might have asked for a database lookup or a piece of content that changes based on the day of the week, or whatever! If you were to think “that sounds slow”, you’d be right and in most cases it’s a terrible waste of time and energy because the content doesn’t change at all. Most posts get written once and stay that way forever. Certainly, by the time of a launch, the sales page should be pretty much locked in for 99.99% of all prospects.
The answer, of course, is to cache the pages. Users always have their own preference (sometimes dependent on their technical knowledge) but Hyper Cache comes out well for speed and friendliness and W3 Total Cache is super quick with many extra integration features.
What these plugins do, is to grab the content using all the funky php interpretation and munging of database content for the very first person to ask for that page. It bundles it all up into “the final output” and from that point onwards it only sends that blob to anyone else asking for the same page. It never goes back to the database or tries to run all that code, it just serves up the same old stuff (until it hits some timeout that you may have set). This works great for pages that are genuinely not changing, but can cause mucho confusion if you expect to see the time updating on your page (for example).
Sort Out Database Access
The problem with the average launch is that there are different moving parts in the process. One of the biggest areas that gets hit hard is all the access to the database and at this point I need to digress into n-tier architecture design. Huh, what?
The majority of big systems deployed around the planet rely on an idea of separating out different bits of the functionality of the system into different blocks. These blocks are often called tiers and each tier does its own bit towards making everything run smoothly.
In a very standard layout, you might have a 3-tier architecture, with a Web Server tier on the front, and Application Server middle tier and then a third database tier for your data. It’s very possible that all 3 tiers might actually live on the same box, but the design gives you the flexibility to move the database somewhere else and add firepower to just that one tier if you need it.
How does this relate to WordPress?
Well actually, WP is something like the middle tier of that description above. Its job is to get data from the database and ship it to the customer (via Apache with is the actual tier one web server installed on most hosting). So, all of the problems faced by monster systems, should apply to WordPress…..and so should all their solutions.
The biggest problem with talking to a database is that it takes a small amount of time to start the conversation, to work through the request and then to close the connection. That start and stop time is added to every request that comes through and, what’s more, most databases get worse and worse at handling those conversations as the number of connections increases.
It seems counter-intuitive, but sometimes making 100 lots of ten connections is faster than just blasting all 1000 connections at the database at the same time. It spends so much time trying to book-keep the connections, that data access suffers.
Not only that, but we can also save time for each connection by keeping it available when we are done with it. So, we open 10 connections (and pay the price in time for each one) but when the requests are all satisfied, the connections are not shut down, but just left for the next 10 users to come along and all that opening and closing time disappears.
The way an enterprise application server does that is to build that pool at the “start of the day” and then dole out the connections to the next person in line – a connection manager, if you like. WordPress doesn’t have that, relying instead on the maximum number of connections figure which is set in the mySQL database. The problem comes when you hit the limit – instead of saying “hold on, there will be one along in a minute”, the database says “get thee hence and never return!” and so you see the connections exceeded message.
An obvious first fix is to up the limits, but you can only really do that if you have a dedicated server and it still leaves all that nasty opening and closing behaviour. Fortunately, there are “open and stay open” calls available, that would give the performance benefits of the pooling solution – but still with that limit in the database.
This may be an insurmountable problem as far as the current WP is concerned. Wherever there is a chance of a traffic flood, there is always a chance of hitting the max connections limit. The persistent connections will help a lot, but connection “max+1” will blow it up.
Proxy Serving for Fun and Profit
The final piece that I want to throw into this WordPress performance 101 is the idea of using a reverse proxy server on the front end of the web server – effectively giving us a fourth tier.
This server acts as another layer of caching in front of the web server, but can also be used to “spray” requests around a cluster of machines, all doing the same work, but sharing the load. This falls into the realms of clustering and probably needs to wait for another post but again this is really only an option for servers where you can control the whole operating system environment. In this space, two main contenders have arisen: nginx and Varnish but I’ll leave researching them to the interested reader.
Third Party Speed-Ups for WordPress
There is, of course, one final tier that can be involved in all this and that is some kind of content delivery network (or CDN). The role of the CDN is to send the end-user things like movies and large images – of course you can also use it to send anything else that doesn’t change – a kind of super-cache that is out there on the internet.
The beauty of this kind of arrangement is that the CDN is often a global entity and that means that it is much more likely that they will have your content stored somewhere much closer to your customer than you are (or at least the main server is). Amazon S3 works well in this respect as a way of moving large files at small cost and they offer an add-on service called Amazon Cloudfront which is a true CDN with global reach for the storage of your files (where S3 sits on one server).
Another big player in this space is Cloudflare, offering caching and CDN management. The interesting thing about this solution is that W3 Total Cache “knows about” Cloudflare and can be configured out of the box to use them as the next-tier CDN for all of your site’s caching.
This post has gone on far longer than I expected, but I wanted the purveyors of big-launch WSO’s to know what they are getting into when using WordPress as the main host for their products. Hopefully this will help alleviate some of that server-melting.
To be honest, it’s tempting to build a WSO launch platform that would just cope with all the grief. Any interest in that?
Leave a Reply