I've been in an introspective mood lately.

At the beginning of this year (15 years after the creation of Distilled in 2005), we created a new company called SearchPilot to focus on our SEO and meta-CMS A/B testing technology (previously known as Distilled ODN), and we merged the consulting and conferencing portion of the business with Brainlabs.

I'm now CEO of SearchPilot (which is mostly owned by Distilled shareholders), and I'm also an SEO Partner at Brainlabs, so... I'm sorry everyone, but I'm really staying in the SEO industry.

As such, it feels a bit like the end of a chapter to me rather than the end of the book, but it still allowed me to look back on what has and hasn't changed in the last 15 years I've been in the industry.

I can't claim to be a member of the first generation of SEO experts, but since I've been building websites since 1996 or so and have seen the growth of Google from the beginning, I feel like I'm a second generation member, and I may have some interesting stories to share with those who are newer.

I've been racking my brain trying to remember what I thought was important at the time, and I've also reviewed the major trends that have emerged over my career in the industry to come up with an interesting reading list that most people working on the Web today would do well to know.

The great eras of research

I joked at the beginning of a presentation I gave in 2018 saying that the great eras of search have oscillated between search engine guidelines and search engines that quickly backed away from those guidelines when they saw what webmasters were really doing:

Even though this slide was a bit tongue-in-cheek, I think there's something to think about eras like:

  1. Create websites: Do you have a website? Do you want a website? It's hard to believe today, but in the early days of the webMany people had to be persuaded to put their business online.
  2. Keywords: Basic information retrieval turned into adversarial information retrieval when webmasters realized they could game the system by stuffing keywords, hiding text, etc.
  3. Links: As the scale of the Web grew beyond user-created directories, link-based search algorithms began to dominate.
  4. Not these links: Link-based algorithms have begun to give way to adversarial link-based algorithms, with webmasters trading, buying and manipulating links across the web graph.
  5. Content for the long tail: In parallel to this era, the length of the long tail began to be better understood by both webmasters and Google itself - and it was in the interest of both parties to create massive amounts of (often obscure) content and have it indexed for when it was needed.
  6. Not this content: As expected (see the trend here?), the average quality of content returned in search results has dropped dramatically, and so we see the first ranking factors of machine learning in the form of attempts to assess "quality" (next to website relevance and authority).
  7. Machine Learning: Arguably, everything from this point forward has been an adventure in machine learning and artificial intelligence, and has also played out over the careers of most marketers working in SEO today. So, as much as I love writing about this topic, I'll come back to it another day.

History of SEO: the crucial moments

While I'm sure there are interesting stories to tell about the pre-Google era of SEO, I'm not the right person to tell them (if you have a good resource, please share it in the comments), so let's start early in the Google journey:

Google's core technology

Even if you're getting into SEO in 2020, in a world of machine-learned ranking factors, I recommend going back and reading the surprisingly accessible early academic work:

If you weren't using the Web at the time, it's probably hard to imagine how much of an improvement Google's PageRank-based algorithm was over the state of the art at the time (and it's hard to remember, even for those of us who were using it) :

Google's IPO

As part of the "things that are hard to remember clearly," at the time of Google's IPO in 2004, very few people expected Google to become one of the most profitable companies of all time. At the time, the founders expressed their disdain for advertising and reluctantly experimented with keyword-based ads. Because of this attitude, even within the company, most employees didn't know what rocket they were building.

At that time, I recommend reading the founders' IPO letter (see this excellent article from Danny Sullivan - which, ironically, is now @SearchLiaison at Google):

"Our research results are the best we know how to produce. They are unbiased and objective, and we accept no payment for them or for more frequent inclusion or updating."

"Because we don't charge merchants for inclusion in Froogle [now Google shopping], our users can browse product categories or perform product searches knowing that the results we provide are relevant and unbiased." - Deposit S1

In addition, In the Plex is a nice book published in 2011 by Steven Levy. It tells the story of what then-CEO Eric Schmidt called (around the time of the IPO) "the cover-up strategy:

"Those who knew the secret [...] were instructed quite firmly not to say anything about it."

"What Google was hiding was how it had cracked the code for making money on the Internet."

Fortunately for Google, for users, and even for organic search marketers, it turned out that this wasn't really inconsistent with their pure pre-IPO ideals because, as Levy recounts, "in repeated tests, searchers were happier with pages containing ads than those where they were removed." Whew!

Index all

In April 2003, Google acquired a company called Applied Semantics and triggered a series of events which, in my opinion, is the most underestimated part of Google's history.

Applied Semantics' technology was integrated with their own contextual ad technology to form what became AdSense. While AdSense's revenue has always been overshadowed by AdWords (now simply "Google Ads"), its importance in the history of SEO is hard to understate.

By democratizing the monetization of content on the web and allowing anyone to get paid to produce obscure content, it has financed the creation of absurd amounts of that content.

Most of this content would never have been seen without the existence of a search engine that excelled in its ability to provide excellent results for long tail searches, even if these searches were incredibly infrequent or had never been seen before.

Thus, Google's search engine (and its search advertising business) formed a powerful flywheel with its AdSense business, enabling the funding of the content creation it needed to differentiate itself from the largest and most comprehensive index on the Web.

However, as in many chapters of history, this also created a monster in the form of low-quality, even auto-generated content, which eventually led to public relations crises and considerable effort to remedy.

If you're interested in the era of all-index, you can read more of my thoughts on the subject in the 47+ slides of From the Horse's Mouth.

Spam on the Internet

The first forms of spam on the Internet were various messages, which spread in the form of email spam. In the early 2000s, Google started talking about a problem it eventually called "web spam" (the first mention I saw of link spam was in a 2005 presentation by Amit Singhal entitled Challenges in running a Commercial Web Search Engine [PDF]).

I suspect that even people starting out in SEO today might have heard of Matt Cutts - the original webspam manager - as he is still often referenced even though he hasn't worked at Google since 2014. I enjoyed this 2015 presentation who talks about his career path at Google.

The era of research quality

Over time, due to the opposing nature of webmasters trying to make money and Google (and others) trying to create the best search engine possible, pure web spam was not the only quality issue facing Google. The cat-and-mouse game of spotting manipulation (especially of page content, external links and anchor text) was to be a defining feature of the next decade of search.

It was after Singhal's presentation above that Eric Schmidt (then CEO of Google) saidBrands are the solution, not the problem... Brands are the way to sort out the cesspool.

People who are newer to the industry will likely have experienced some of Google's updates firsthand (such as the recent "core updates") and will likely have heard about a few specific older updates. But "Vince," which came after "Florida" (Google's first confirmed major update) and launched shortly after Schmidt's statements about brands, was particularly notable for favoring big brands. If you haven't been following the whole story, you can read up on past major updates here:

A real threat to reputation

As I mentioned above in the AdSense section, webmasters had a strong incentive to create tons of content, targeting the booming long tail of search. If your domain was powerful enough, Google would crawl and index a huge number of pages, and for obscure enough queries, any matching content would potentially rank. This triggered the rapid growth of so-called "content farms," which would mine keyword data wherever they could and produce low-quality content matching the keywords. At the same time, websites were succeeding by allowing large databases of content to be indexed, even in the form of very thin pages, or by allowing a large number of pages of user-generated content to be indexed.

This was a real threat to Google's reputation, as it was coming out the echo chamber of research and referencing. It had become such a scourge for communities like Hacker News and StackOverflow, that Matt Cutts submitted a personal update to the Hacker News community when Google released an update to fix a specific symptom that scraping sites were consistently ranking higher than the original content they were copying.

Shortly thereafter, Google released the update initially named " farmer update" . After its launch, we learned that it was made possible by the breakthrough of a engineer called PandaThis is why it is called the "big Panda" update internally at Google, and since then, the SEO community mainly calls it the Panda update.

While we speculated that the inner workings of the update were one of the first real uses of machine learning at the heart of Google's organic search algorithm, the features it modeled were more easily understood as human-centric quality factors, and so we began recommending targeted SEO changes to our clients based on the results of the human quality surveys.

Everything goes mobile first

I made a presentation at SearchLove London in 2014 where I talked about the incredible growth and scale of mobile and how we were late to realize how seriously Google was taking this. I highlighted the surprise many felt when they heard that Google was designing mobile first:

"Late last year, we launched some pretty significant design improvements for search on mobile and tablet devices. Today, we've translated many of those changes to the desktop experience." - Jon Wiley (senior engineer for Google Search speaking on Google+, which means there is no link to a perfect reference for the quote, but it is referenced here as well as in my presentation).

This surprise came despite the fact that, at the time I gave this presentation in 2014, we knew that mobile search had begun to cannibalize desktop search (and we had seen the first decline in desktop search volumes) :

And it came even as people were starting to say that the first year Google made the majority of its revenue on mobile was less than two years away:

As we write this in 2020, we feel like we've fully internalized the importance of mobile, but it's interesting to remember that it took a while for this to become a reality.

Machine learning becomes the norm

Since the Panda update, machine learning has been mentioned more and more in Google's official communications about algorithm updates, and is involved in even more of them. We know that, historically, there was resistance from some quarters (including Singhal) to using machine learning in the core algorithm because of the way it prevented human engineers from explaining the results. In 2015, Sundar Pichai took over as CEO, pushed Singhal aside (although this may have been for d Other reasons) and has installed AI / ML fans in key positions.

The loop is closed

Prior to the Florida update (in fact, until Google rolled out an update called Fritz in the summer of 2003), search results were routinely shuffled in a process nicknamed the Google dance:

Most things have changed in real time since then, but the recent "core updates" seem to have brought back this type of dynamic where changes happen according to Google's schedule rather than the timeline of website changes. I have hypothesized that that this is because the "core updates" are actually Google's recycling of a massive deep learning model that is very much in tune with the shape of the web at that time. Whatever the cause, our experience working with a wide range of clients is consistent with the official Google line:

General kernel updates tend to occur every few months. Content that has been affected by one of these may not be restored - assuming improvements have been made - until the next broad kernel update is released.

Linking recent trends and discoveries like this to ancient history like the Google dance is just one way that knowledge of SEO history is "useful."

If you are interested in all this

I hope this trip down memory lane has been interesting. For those of you who also worked in the industry during those years, what did I miss? What are the big milestones you remember? Send them in the comments below or drop me a line at Twitter.

If you enjoyed this walk down memory lane, you might also enjoy my presentation From the Horse's MouthIn this article, I try to use Google's official and unofficial statements to understand what is really going on behind the scenes, and to give some tips on how to do the same:

To help us serve you better, please consider taking the 2020 Moz blog reader survey, which asks who you are, what challenges you face, and what you'd like to see more of on the Moz blog.