tdog.blog header

Surveying the Fortune 1000 (pt. 2)

August 26th, 2005 by Tom Myer

Hi there! Welcome to our blog. Don't forget to sign up for our free RSS feed. We Triple Dog Dare Ya! And thanks for visiting!

We ran our spiders, two at a time, 24 hours a day for about 10 days. Our target: the Fortune 1000. We wanted to know what the world’s most prestigious companies are doing with their web sites. We not only wanted to know how big their web sites were, we also wanted to know how many bad links and how much stale content existed on each site, among other facts. We also wanted to know what kinds of problems are faced by those trying to administer sites of different sizes. By the time the last process wound down, we had detailed log files for 965 companies.

So what did we discover? Lots of interesting things, best summarized by these three takeaways:

Takeaway #1: Web sites hit critical mass somewhere between 1500 and 2500 content items, then balloon quickly to truly unmanageable sizes.

Takeaway #2: No matter what size a web site is, problems with bad links and stale content are pretty much universal.

Takeaway #3: Web site size and technology diversity are directly related to each other. The bigger the site, the more technologies used.

The Details

Let’s take a quick look at the data table that summarizes our results.

 
Number of Pages
  up to 500 501-1000 1001-2500 2501+
Total Companies
640
86
76
163
> 10% Stale Content
291
59
47
56
> 10% 404
295
23
26
73
Both
61
14
11
23

This table’s horizontal axis categorizes web sites by number of pages. I’ve arbitrarily assigned size categories of under 500 pages, 501-1000 pages, 1001-2500 pages, and 2501 or more pages. The table’s vertical axis contains information slots for each of the categories that we found relevant: how many companies had web sites in a particular size range, how many had at least 10% stale content on their sites, how many had at least 10% unreachable (404) page errors, and how many had both stale content and 404s at a 10% or higher rate.

When you look at that first data column, there are no big surprises. The data indicates that nearly half of all smaller web sites have problems with stale content and bad links. This makes sense, as smaller web sites are generally managed without cutting-edge software or processes that would help prevent these problems. Notice that 61 of these company web sites have problems with both stale content and unreachable pages. Looking at the list of these companies reveals a wide range of industries (consulting, home mortgage, energy, accounting, etc.), employee size, revenue, and geographic dispersement, so there doesn’t seem to be any obvious correlation between these factors and the care and feeding of a web site.

When we look at the bigger web sites, however, some things do stand out. Notice that more web sites exist at the 2501+ page size than the two previous size categories combined. This suggests that at some point, web sites hit some kind of critical mass and grow very large, very quickly (Takeaway #1).

In fact, when we break the 1001-2500 page group down into three groups of 500 each, and then keep going until we hit the 4000 page mark, we see this:

 
Number of Pages
  1001- 1500 1501- 2000 2001- 2500 2501- 3000 3001- 3500 3501- 4000 4001+
Total Companies
40
18
18
11
9
7
136

This data suggests that once companies grow their web sites to 1500 to 2500 pages, something (entropy? chaos? bad processes?) takes over and the sites start to grow. Very few web sites appear in the 1500 to 4000 page range, compared to those that are 4001 and above. In fact, companies are 30 times as likely to have a sub-500 page site and 9 times as likely to have a 4000+ page site as a 1600 or 2200 page site.

At the high end of our survey, those sites with 2501 or more pages, problems still exist, but they take on a slightly different tone. Instead of the 45% staleness rate, we see 34% rates. This suggests more care taken to keeping content fresh, but 44% of web sites in this size category still had 404 errors higher than 10%. For a 4000 page site, that means at least 400 pages or documents that were unreachable! These companies might have processes and tools in place, but essentially, bad links are a problem for web sites regardless of their size (Takeaway #2).

Although an overwhelming majority of Fortune 1000 companies (76% give or take) have web sites under 1500 pages, a significant number of them (136) weigh in at 4000 content items or more, including some of the most recognizable online brands in the world:

3m.com
adobe.com
aetna.com
amazon.com
barnesandnoble.com
cisco.com
costco.com
hasbro.com
hersheys.com
hollywoodvideo.com
ibm.com
intel.com
lizclaiborne.com
revlon.com
scholastic.com
siebel.com
sun.com
symbol.com
target.com
walgreens.com
walmart.com
wellsfargo.com
wholefoods.com
yahoo.com

What else jumps out? Not as many companies as you think are using key organizational labels like “about us” or “contact us” for their sites. Only 430 surveyed sites are using “about” or “about us” in their linking, and 550 sites are using “contact” or “contact us”. Who’s running site maps? 269 web sites are, which means that roughly two-thirds of the list isn’t using some kind of site map to help orient visitors. These numbers suggest that companies have a long way to go to standardize on common linking strategies that most people rely on to navigate a site.

Out of 965 companies surveyed, 109 are using JSP, 53 are using PHP, and 382 are using ASP. This matches another report I read that Fortune 1000 companies tend to go with Microsoft environments because they can afford the licensing. Of these companies, 117 are using Vignette–the tell-tale commas in the URLs give the product away. Over 90 companies are using XML and XSLT on their web sites, which is encouraging but still far from nominal when you consider how these technologies can make things more efficient. A whopping 517 are using PDFs, so we can safely say that Adobe’s technology has penetrated the Fortune 1000 space to a great extent.

The most interesting thing about the use of technologies? Companies with the biggest web sites mix and match. Cisco.com uses Vignette for some pages, XML for others, and ASP for still others. Adobe.com has Vignette-, JSP-, ASP-, PHP-, and XML-driven pages. At the other end of the spectrum, sites tend to be less heterogeneous. Landolakesinc.com, a web site in the low 100s, uses ASP on 95% of their web site. Swgas.com, another site with less than 150 pages, uses PHP on 90% of their site. The bigger you are, the more likely you are to use lots of different technology approaches (Takeaway #3).

Tags: No Comments

Leave A Comment

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.