Thursday 13 February 2014

History of search engines

Before there was Yahoo! Before there was Webcrawler. Before there was AltaVista. There were Archie, Jughead, and Veronica (but no Betty). Before 1990, there was no way to search the Internet. At that time there were few websites. Most sites contained collections of files that you could download (by FTP) if you knew that they were there. The only way you could find out that a file was on a specific site was by word-of-mouth. Then came Archie. Created by Peter Deutsch, Alan Emtage and Bill Wheelan, Archie was the first program to scour the Internet for the contents of all of the anonymous FTP sites all over the world. It is not a true search engine but like Yahoo, it is a searchable list of files. You needed to know the exact name of the file that you were looking for. Armed with that information, Archie would tell you from which FTP site you could download the file.
If Archie was the grandfather of all search engines, then Veronica was the grandmother. Developed by the University of Nevada Computing Services, it searched Gopher servers for files. A Gopher server stores plain-text documents while an FTP server stores other kinds of files (images, programs, etc.). Jughead performed functions similar to Veronica.
By 1993, the Web was beginning to change. Rather than being populated mainly by FTP sites, Gopher sites, and e-mail servers, web sites began to proliferate. In response to this change, Matthew Gray introduced his World Wide Web Wanderer. The program was a series of robots that hunted down web urls and listed them in s database called Wandex.
Again around 1993, ALIWEB was developed as the web page equivalent to Archie and Veronica. Instead of cataloging files or text documents, webmasters would submit a special index file with site information.
The next development in cataloging the web came late in 1993 with spiders. Like robots, spiders scoured the web for web page information. These early versions looked at the titles of the web pages, the header information, and the URL as a source for key words. The database techniques used by these early search engines were primitive. For example, a search would yield hits in the order that the hits were in the database. Only one of these search engines made any attempt to rank the hits according to the sites' relationships to the key words.
The first popular search engine, Excite, has it roots in these early days of web cataloging. The Excite project was begun by a group of Stanford undergraduates. It was released for general use in 1994.
Again in 1994, two Stanford Ph.D. students posted web pages with links on them. They called these pages Yahoo!. As the number of links began to grow, they developed a hierarchical listing. As the pages become more popular, they developed a way to search through all of the links. Yahoo! became the first popular searchable directory. It was not considered a search engine because all the links on the pages were updated manually rather than automatically by spider or robot and the search feature searched only those links.
The first full-text search was WebCrawler. WebCrawler began as an undergraduate seminar project at the University of Washington. It became so popular that is virtually shut down the University of Washington's network because of the amount of traffic it generated. Eventually, AOL bought it and operated it on their own network. Later, Excite bought WebCrawler from AOL but AOL still uses it in their NetFind feature. At Home Corp. currently owns Webcrawler (as well as Excite and Blue Mountain Cards).
The next search engine to appear on the web was Lycos. It was named for the wolf spider (Lycosidae lycosa) because the wolf spider pursues its prey. According to Michael Maudlin in Lycos: Design choices in an Internet search service" (1997), by 1997, Lycos had indexed more than 60,000,000 web pages and ranked 1st on Netscape's list of search engines.
The next major player in the search engine wars as it was becoming was Infoseek. The Infoseek search engine itself was unremarkable and showed little innovation beyond Webcrawler and Lycos. What made this search engine stand out was its deal with Netscape to become the browser's default search engine replacing Yahoo!.
By 1995, Digital Equipment Corporation (DEC) introduced AltaVista. This search engine contained some innovations that set it apart from the others. First, it ran on a group of DEC Alpha-based computers. At the time, these were among the most powerful processors in existence. This meant that the search engine could run even with very high traffic hardy slowing down. (The DEC Alpha processor ran a version of UNIX. From its inception, UNIX had been designed for such heavy multi-use loads.) It also featured the ability for the user to ask a question rather than enter key words. This innovation made it easier for the average user find the results needed. It was also the first to implement the use of Boolean operators (and, or, but, not) to help refine searches. It also gave tips to help he user refine searches.
Next came HotBot, a project from the University of California at Berkeley. Designed as the most powerful search engine, its current owner, Wired Magazine claims that it can index more than 10,000,000 pages a day. Wired claims that HotBot should be able to update its entire index daily making it contain the most up-to-date information of any major search engine. (You'll have the opportunity to test that claim if you wish.)
In 1995, a new type of search engine was introduced - the metasearch engine. The concept was simple. The metasearch engine would get key words from the user either by the user typing key words or a question and then forward the keywords to all of the major search engines. These search engines would send the results back to the metasearch engine and the metasearch engine would format the hits all on one page for concise viewing. The first of these search engines was Metacrawler. Metacrawler initially ran afoul of the major search engines because Metacrawler took the output of the search engines but not the advertising banners that users of the search engines see reducing the advertising revenues of the search engine companies.. Metacrawler finally relented and began including the banner ads with each set of search results.
Besides Metacrawler, other major metasearch engines exist including ProFusion, Dogpile, Ask Jeeves, and C-Net's Search.com. Ask Jeeves combines many of the features such as natural language queries with the ability to search using several different search engines. C-Net's entry claims to use over 700 different search engines to obtain its results. Although conceptually very good, the searches using a metasearch engine are only as good as the underlying search engines and directories and the question that the user asks..

No comments:

Post a Comment