[HOWTO] Searching The Deep Web

Category: Search-Engines

There are over 60 trillion individual Web pages as of this writing. But that's only the very tip of the iceberg. Beyond what popular search engines offer up, there is a universe of information that's online and discoverable -- if you have the right skills and tools. Here's how to gain access to the rest of the Web...

What's Out There on the Deep Web?

Sixty trillion is a big number. According to Google's "How Search Works" page, the number of web pages has doubled since 2013, and in 2008 there were "only" one trillion Web pages. Clearly, none of us will ever lack for reading material. And that's only the beginning of what's available online.

General search engines like Google, Bing, and Yahoo! index only the "surface Web," pages that have unique URLs such as the one that appears in your browser's address box right now. They don't even index all of the surface Web. Some website owners prefer not to have their sites (or portions of them) indexed, so they use a file called robots.txt that tells search engines, "Don't index this."

Search engines choose to exclude many other surface Web pages from their indexes for a variety of reasons including relevance, legality, and violations of search optimization policies. Other pages are locked behind passwords, intended only for those who are granted access.

Searching the Deep Web

Beneath the surface Web lies the "deep Web," a mass of information 500 times greater than the 60 trillion surface pages discovered by Google. By their very nature, deep Web resources cannot be found by the Web-crawling software that search engines use to find and index pages.

Note that the "deep" Web is not the same as the "dark" Web where criminals lurk. On the dark Web, everyone tries to hide their identities as well as what they're doing. The deep Web consists of perfectly legitimate information and its users.

Some of these Deep Web pages can be accessed only by a user clicking or manually typing a link that's not been indexed by a search engine. Other Deep Web pages can be accessed only by a user who directly enters a query in a search form. The desired data exist in a database, not on a Web page that a crawler can find by following links from other pages. The data retrieved in response to a user query is displayed as an ephemeral "dynamic" Web page that lasts only until the user moves on.

Some Deep Web Search Tools

When Google or Bing don't serve up the results you need, try a specialized search tool. My article Free Online Research Tools will point you to dozens of them, categorized by subject matter. You may also want to try an alternative search engine. See my article The Other Search Engines for a list of those.

The Library of Congress Online Catalog is a good example of a deep Web resource. It's a database containing over 18 million records of books, periodicals, audio recordings, photographs, and more. None of its records can be retrieved directly through Google. You need visit the LOC's Online Catalog page and enter search terms in the appropriate boxes.

Web archives such as The Wayback Machine store copies of Web sites that have been modified or deleted. Such archived pages are not indexed by search engines, which strive to index the current version. I've often found the Wayback Machine handy when I want to see what a website looked like in the past. (Want to see what Yahoo.com looked like in October of 1996? It's in there.)

To find deep Web material via Google, et. al, try adding the term, "database" to your search query. "Plane crash database," "drug interaction database," "government grants database," and so on, will often lead to the home page of a database where you can enter search terms specific to that resource.

There are also paid tools such as LexisNexis and Factiva which professional researchers use to find information about legal and business topics. Genealogy researchers can find a wealth of free information online, but often the best sources require payment. Ancestry.com is one such example. It's also becoming more common for online newspapers to limit free content, and erect paywalls that require a subscription to view more than current headlines.

General search engines suffice for most needs. Scholars, journalists, and other serious researchers often must resort to the deep Web. Do you use any of these tools to access online information that's not available with a quick Google search? Got any search tips of your own to share? Your thoughts on this topic are welcome. Post your comment or question below...

Ask Your Computer or Internet Question

  (Enter your question in the box above.)

It's Guaranteed to Make You Smarter...

AskBob Updates: Boost your Internet IQ & solve computer problems.
Get your FREE Subscription!


Check out other articles in this category:

Link to this article from your site or blog. Just copy and paste from this box:

This article was posted by on 8 Feb 2016

For Fun: Buy Bob a Snickers.

Prev Article:
Can Online Voting Ever Work?

The Top Twenty
Next Article:
[ZAP!] Don't Buy the Wrong USB-C Cable

Most recent comments on "[HOWTO] Searching The Deep Web"

Posted by:

Marcus Zillman
08 Feb 2016

My Subject Tracer Deep Web Research and Discovery Resources is freely available and has been downloaded over three million times!!


Posted by:

Sarah L
08 Feb 2016

Finding newspaper archives seems an ever changing task. Some years back, I got access to my local paper via Proquest and my public library. Now Proquest no longer provides that. Some archives I can get via my subscription for current newspapers, but other is given or sold to newspapers dot com. for a while, it was available through Ancestry, but I am not sure now. I look for family history in one city, but also for book reviews from various newspapers. It is a moving target. Not even sure what I can access via my public library. Will any order be restored to this archiving?

Posted by:

09 Feb 2016

I didn't realize it, yes I have done a Deep Web Search! I have used the Library of Congress, lots of times and had to search further, for the information I wanted.

But, this article brought to mind, some of the best Researchers I have ever seen!!! Back in 1982-1984 I was a Clinical Instructor for Surgical Technician Students. My location was a great place, with a wonderful Medical Library. There were Interns and Residents working in the hospital is why.

The 3 Librarians were awesome and would always try to help my students with their written projects. Plus, all 3 would get all the medical information for both the Interns and Residents!!! I mean, there were reams of paper in piles, for each medical student, who had requested the research. Yes, I know that they were all Doctors and were called Doctor So-in-So, but, they were still students. They were not done with their schooling or residency.

Must admit, think back to 1982-1984 how the computers were in those days, yet, that didn't stop those awesome 3 Librarians!!!

Posted by:

09 Feb 2016

So: How could the tally of 60Trillion be accurate if the robot.txt (at root url) prevent crawlers from counting the individual pages that belongs to root? But then again, robots.txt is nothing but a formality and is just a mere 'request' from the owner to not be cavity searched and some engines are not so polite to abide w/an agreement.

Posted by:

10 Feb 2016

I was reading your usb-c cables article when I ran across this. I take notes for my desktop browsing from your articles and I seriously have increased my computer/internet IQ from reading the abundance of pertinent info. I have mentioned your site to more than a few people.

Posted by:

12 Feb 2016

I call this 'going down rabbit holes' and Marcus's information on the Deep Web is excellent for researchers. Been on his list for...years, and always get pointed in the right direction. But you really have to dig. I rarely use general search engines like Google or Bing or Yahoo.

If you really want to dig deeper, also check incoming links on the websites (not the "links" pages). Often many have incoming links from sources you'd not expect to find.

Post your Comments, Questions or Suggestions

*     *     (* = Required field)

    (Your email address will not be published)
(you may use HTML tags for style)

YES... spelling, punctuation, grammar and proper use of UPPER/lower case are important! Comments of a political nature are discouraged. Please limit your remarks to 3-4 paragraphs. If you want to see your comment posted, pay attention to these items.

All comments are reviewed, and may be edited or removed at the discretion of the moderator.

NOTE: Please, post comments on this article ONLY.
If you want to ask a question click here.

Free Tech Support -- Ask Bob Rankin
RSS   Add to My Yahoo!   Feedburner Feed
Subscribe to AskBobRankin Updates: Free Newsletter
Copyright © 2005 - Bob Rankin - All Rights Reserved
Privacy Policy

Article information: AskBobRankin -- [HOWTO] Searching The Deep Web (Posted: 8 Feb 2016)
Source: https://askbobrankin.com/howto_searching_the_deep_web.html
Copyright © 2005 - Bob Rankin - All Rights Reserved