Search Tools For the Deep (and Dark) Web
The “Dark Web” is sometimes portrayed as a place where criminals, terrorists, hackers, and spammers conspire to victimize unwitting Internet users. The reality is a bit more nuanced and not so scary. There is a “Deep Web” that you can't access with ordinary search engines, and a “Dark Web” where people lurk anonymously for both good and evil purposes. What's really out there? Read on...
What's Hiding in the Deep and Dark Webs?
Technically defined, the “Deep Web” is simply that vast portion of the Web that search engines don’t (or can't) index. While Google, Bing, and other search engines can provide billions more Web pages than you can live to view, that still leaves over 90% of Internet destinations unsearchable. (One source I found estimated that over 130 trillion individual Web pages exist.) If you don’t know the URL (web page address), you can’t just “google” it. You have to find the “secret” URL some other way.
The majority of the Deep Web is unindexed simply because it’s uninteresting. Most of the Internet of Things fits into this category. Who really wants to google a light bulb, doorbell, or toaster? (See my related article Things That Should NOT Be Connected To The Internet.)
There are also password-protected sites that are accessible only to those with memberships or subscriptions to the content stored there. If there's a lock on the door, search engines can't get in to index the pages stored at that location. That would apply to Facebook, Twitter and other social media accounts with billions of pages of user-created content. Online newspapers, professional journals, and research databases also fall into this category.
Many local government websites offer access to public records, such as real estate and legal filings. You can access them, often without a password, but these databases will not be indexed. That's because the search engine "spider" doesn't know what to do when it sees the search input box.
And then there are websites that perform dynamic, real-time queries for things like travel. You can go to Expedia and find out how much it will cost for a ticket from NYC to Miami on July 23rd, but the results won't be indexed by search engines, because they can change from one minute to the next.
And of course there are legions of websites that just have no useful content. They may be spammy, scammy, ripoffs or duplicates that will never appear in search results, because search engines have gotten wise to many of the tricks that black-hats use to "game" the search results. There are also websites that have no inbound links (links from pages on other sites) so search engines will never find them.
The Library of Congress Online Catalog is a good example of a Deep Web resource. It's a database containing millions of records of books, periodicals, audio recordings, photographs, and more. None of its records can be retrieved directly through Google. You need visit the LOC's Online Catalog page and enter search terms in the appropriate boxes.
Web archives such as The Wayback Machine store copies of Web sites that have been modified or deleted. Such archived pages are not indexed by search engines, which strive to index the current version. I've often found the Wayback Machine handy when I want to see what a website looked like in the past. (Want to see what Yahoo.com looked like in October of 1996? It's in there.)
To find Deep Web material via Google, et. al, try adding the term, "database" to your search query. "Plane crash database," "drug interaction database," "government grants database," and so on, will often lead to the home page of a database where you can enter search terms specific to that resource.
There are also paid tools such as LexisNexis and Factiva which professional researchers use to find information about legal and business topics. Genealogy researchers can find a wealth of free information online, but often the best sources require payment. Ancestry.com is one such example. It's also becoming more common for online newspapers and magazines to limit free content, and erect paywalls that require a subscription to view more than current headlines.
My article Free Online Research Tools will point you to dozens of specialized search tools, categorized by subject matter.
So the Deep Web isn't scary -- it's just a part of the Internet that can't be (or hasn't been) indexed by search engines, or pages that require human interaction to continue on to the desired content or search query.
The Dark Side of the Internet
But what’s out there in the dark parts of the Web? Yes, there are bad places, people, and activities; they’re part of what’s called the “Dark Web” or “Darknet” for dramatic effect. These places typically require prior authorization and special software instead of, or in addition to your everyday Web browser, to access. (See discussion of “Tor” below.)
“The Silk Road” was one infamous criminal site where drugs, weapons, data, hacking services and all manner of illicit things were traded until the FBI arrested its owner back in October, 2013. Some referred to Silk Road as the Amazon.com of the underworld, because it made shopping for illegal goods so easy. Ross William Ulbricht, known by his hacker handle "Dread Pirate Roberts," was nailed on charges of narcotics trafficking conspiracy, computer hacking conspiracy and money laundering conspiracy. Court documents allege that over $1.7 million in illegal money changed hands each month on The Silk Road. Other black market websites exist, but Silk Road was the best known.
Private forums exist where cyber-criminals offer services such as hacking, denial of service attacks, ransomware and phishing scams. There are images on dark web pages that you would wish to forget after seeing them. And certainly, terrorists use encrypted messaging channels on the Internet to communicate and collaborate.
Sometimes You Need to Hide
But there are also oases of light in the Dark Web that can’t be called dark by any means. They’re where the struggle for freedom rages. Dissidents, journalists, peace activists, and other good guys often need to hide their activities from oppressive governments and other institutions. Many citizens of totalitarian nations cannot freely access uncensored news or trade opinions and facts about politics or corruption. Some of these people turn to the Dark Web, to hidden forums, sites, and servers of information that protect their secrets and identities.
One of the most popular privacy tools is called Tor. Tor is, essentially, a network of Web proxy servers and browser software designed for them. When using the Tor browser, your identity and location are obscured and your connection to the Tor network is encrypted. Even your ISP doesn’t know where you’re really going because they can’t read the data stream that passes between you and the Tor proxy server. All anyone knows is that you accessed a Tor server.
Your requests for Web content go to a Tor server, which then reaches out to grab the requested content and relay it back to you over that encrypted connection. The destination site sees the Tor server’s location and ID but never yours. Theoretically, there is no way to tell what you accessed via a Tor server.
Along those lines, some users employ a VPN (Virtual Private Network) to protect their privacy and hide their online activities. A VPN is a private network set up on the public Internet, using encryption to ensure that no uninvited parties can eavesdrop on information that flows over the network. My article PRIVACY: Do You Need a VPN? goes into more detail on the pros and cons.
To summarize, the Deep Web is just that portion of the Internet that can't be reached directly by search engines. The Dark Web is actually a mixture of light and dark, good and evil, benefit and harm. It’s symbol might well be the Yin Yang which illustrates how opposite forces can be interconnected and intermingled.
Your thoughts on this topic are welcome. Post your comment or question below…