Using a search engine is easy: You open up a web page, type a few words into the search bar, and voilà—millions of results appear, in a fraction of a second. A Google search for “search engine,” for example, yields 1.43 billion results in 0.69 seconds. But how, exactly? Here’s how the wild, wild web became fully indexed, searchable, and ranked—in less than a decade.
What is a search engine?
A search engine is software designed to retrieve specific information. The type of search engine most of us are familiar with is the internet search engine, which is a web service that finds information on the internet (sometimes called the “world wide web”) based on a user’s query, which is typically a set of words.
Today, many people think search engines are synonymous with internet browsers—thanks in part to the Google Chrome browser building search engine functionality into the web address bar. But search engines are web services specifically built to retrieve information. They can be accessed easily from a browser, but they’re different technologies.
How do search engines work?
Although search engines have gotten more complex over the years, they still follow a pretty basic formula: Crawl and index all the data on the web so that when you search for something, it can present you with a set of results, ranked by relevance. Here’s how they do it.
Crawl. Web crawlers, also known as spiders, are programs that constantly search the internet, finding new sites and identifying new links. Crawlers also send text from every website to an index to be analyzed. Crawlers may even store all or part of a web page, called a cache. Webmasters (the people who run websites) can add a file called robots.txt to their sites, which tells a crawler which pages to look at or ignore.
Index. The data that crawlers gather is analyzed, organized, and stored in an index so that the engine can find information quickly. Like the index found in the back of a book—but far more detailed—a search engine index includes an entry for every word on every indexed web page.
Search. When you query a search engine, the search engine must first translate your words into terms that relate to its index. This is done via a host of techniques including natural language processing (NLP which uses machine learning to understand what you’re looking for). The output of this initial translation process is a rewritten query that identifies the important parts of your query, corrects misspellings, and adds on synonyms. The search engine then consults its index to find web pages that match the rewritten query.
Rank. Search engines use algorithms to present you with a list of results prioritized by what it thinks will best answer your query. For vague searches, like “ramen,” your search engine may provide a range of answers to cover its bases, such as general information about what ramen is, along with other popular results like recipes, local ramen shops, and even a “people also ask” section to help you narrow down your search.
How do search engines rank results?
A single search may turn up billions of relevant web pages, so part of the job of a search engine is to sort these listings using ranking algorithms. And although these algorithms are designed to provide you with the best answers to your questions, they are biased towards certain factors. Search engines want to show you results that you’ll click on, and they use a variety of factors to rank results according to what they think you’ll engage with. These include but aren’t limited to:
Use of keywords. Your search results should match at least some of the words in your query. Search engines prioritize pages on which those keywords appear in a prominent position, such as the title of the page, or often throughout the page.
Page content. Search engines prioritize high-quality content by analyzing the length, depth, and breadth of web pages.
Backlinks. Backlinks, or mentions of one website on another website, can be seen as a vote towards that site’s authority. Pioneered by Google PageRank, backlink ranking rates pages based on how many other sites link back to that site, and how highly those sites rank.
Mainstream search engines like Google might be top of mind when we think about search engines, but there are other types of search engines that allow us to navigate the internet.
Mainstream search engines. Mainstream search engines like Google, Bing, and Yahoo! are all free to use and supported by online advertising. They all use variations of the same strategy (crawling, indexing, and ranking) to let you search the entirety of the internet.
Private search engines. Private search engines have risen in popularity recently due to privacy concerns raised by the data collection practices of mainstream search engines. These include anonymous, ad-supported search engines like DuckDuckGo and private, ad-free search engines like Neeva.
Vertical search engines. Vertical search, or specialized search, is a way of narrowing your search to one topic category, rather than the entirety of the web. Examples of vertical search engines include:
The search bar on shopping sites like eBay and Amazon
Google Scholar, which indexes scholarly literature across publications
Searchable social media sites and apps like Pinterest
Computational search engines. WolframAlpha is an example of a computational search engine, devoted to answering questions related to math and science.
Popular search engines
Search technology has changed a lot since the development of the first search engine in 1989. Here are the major players today.
Google. There’s only one search engine so popular it became a verb synonymous with “to search.” With 92.24 percent of the global search engine market share, Google is by far the world’s largest and most popular search engine. Google’s clean look and backlink-based ranking system earned users’ favor in the ’90s, and it has maintained its dominance with near-constant innovations and a slew of exclusive agreements with device manufacturers, wireless carriers, and browser developers that funnel about 60 percent of internet searches straight to Google.
Bing. Microsoft’s search engine, Bing, currently accounts for 2.29 percent of the global market share, making it the world’s second largest search engine. Since its launch in 2009, Bing featured photography on its homepage, a stark contrast from Google’s austere landing page.
Baidu. Baidu is a Chinese search engine accounting for 1.48 percent of the global search engine market. Like Google, Baidu started as a search engine, and is now one of China’s largest tech companies.
DuckDuckGo. DuckDuckGo is a private, ad-supported search engine that currently accounts for 0.58 percent of the global market share.
In 2020, Neeva announced that it was creating the world's first ad-free, private subscription search engine.
A brief history of search engines
During the early days of the internet, there were so few web servers (basically, computers that host websites) that Tim Berners-Lee, creator of the World Wide Web, kept them all on one list. Using this Berners-Lee’s list, you could easily access every single web page in existence—mostly informational sites run by universities or government organizations.
Today, there are billions of web pages and no central system for keeping track of them, which is why we rely on search engines to find information online.
1989: While a grad student at McGill University, Alan Emtage built the first public search engine, ARCHIE (archive without the “V”) in 1989, he launched it to the public a year later. Emtage’s program allowed him to more easily find files on FTP (File Transfer Protocol) sites, which predated the web.
1994: David Filo and Jerry Yang founded Yahoo! as a web directory of their favorite sites. By the late 1990s, Yahoo! operated as both a web portal—a landing page for accessing different features of the internet—and web search engine.
1995: AltaVista launched as the first natural language search engine, meaning that it accepted queries written in spoken language, not just keywords. At the time, the web was home to at least 30 million pages, about 20 million of which were indexed by AltaVista.
1996: Ask Jeeves, which encouraged users to pose their queries as a question, launched. Ask Jeeves used human editors to match results to the most popular queries. Today, about 8 percent of searches are written as questions, and Ask Jeeves (now Ask.com) is no longer considered a major search engine.
1998: Larry Page and Sergey Brin founded Google, based on their 1996 search engine, Backrub, which used backlinks as a way to rank search results. At the time, Google had a very simple, ad-free interface of blue links followed by a two-line description of each site. (Ads would come later, in 2000.)
2009: Microsoft Bing was launched as a rebrand of MSN/Live search, which originally launched in 1998. Shortly after its launch, Bing began powering the Yahoo! search engine.
Major innovations in search engine technology
Since the launch of the first search engines in the 1990s, the field’s leaders have innovated on search technology to serve more and more needs with a single interface. Now, we don’t necessarily have to leave the search engine results page to get the answers we’re looking for. Here are some of the major moments in the evolution of search engine technology.
Machine learning: Microsoft developed and launched RankNet in 2005, which used machine learning to rank relevant search results. A version of RankNet would later be used by Microsoft Bing. Google introduced its own machine learning component, RankBrain, in 2015.
Universal search: In 2007, Google launched Universal Search, which integrated some of its different vertical search tools (such as Images, News, Video, Maps, and Books) into one multimedia search engine results page (SERP). When you search “sunset images” on Google.com and see a collection of images at the top of the results page instead of a list of links, that’s Universal Search. Before Universal Search, you would have had to go to Google Images to find images.
Localized results: In 2012, Google began showing local results (based on a user’s IP address) for generic searches. This meant that when you search for “T-shirts,” Google might suggest a nearby T-shirt printer, whereas previously only searches for “T-shirts near Brooklyn” would trigger Maps integration. In 2016, Google started leveraging smartphone location services and wifi positioning (which uses the location of nearby hotspots to pinpoint your location) to give you local results on your precise location.
Hummingbird: Google introduced its Hummingbird algorithm in 2013, which looked beyond a user’s search terms, using context to try to determine their intent. For example, a search like “what is the weather” will pull up local weather results, not an explanation of the concept of weather. A search for “weather” without the “what is” will list news stories from weather.com.
Knowledge Graph: Google acquired Metaweb and Freebase, its database of “over 12 million things,” in 2010. This laid the foundation for Knowledge Graph, which launched in 2012. This technology allows users to get information from other websites without leaving the SERP. When you see a Wikipedia snippet to the right of your search results, that's the Knowledge Graph. This feature had far-reaching consequences: In 2020, about 65 percent of Google searches ended without the user clicking on any results, presumably because they found what they were looking for on the SERP. (Google argued that there are many reasons a search result might end without any clicks, such as reformulating a question.)
Want to try a different kind of search engine, one that was built only for people and not advertising? Neeva is the world’s first private, ad-free search engine, committed to showing you the best results for every search. We will never sell or share your data with anyone, especially advertisers. Try Neeva for yourself, at neeva.com.