Many may consider the Internet and World Wide Web to be synonymous; they are not. Rather, the web is one portion of the Internet, and a medium through which information may be accessed. While most of people think that Internet as a single, homogeneous entity, that perception is also incorrect. It actually consists of different types of content, with different access methods and degrees of risk and legality. Some of the content is, at best, unsettling and, at worst, criminal. The most commonly known Internet is the publicly available web that is familiar to most users and is generally accessed by browsers and search engines. In conceptualizing the web, some may view it as consisting solely of the websites accessible through a traditional search engine such as Google. However, this content—known as the “Surface Web”—is only one portion of the web. It is also known as the surface net, the visible web or the clear net and is used to communicate with others, gather news and information, buy and sell goods, and promote products and services. The Deep Web refers to “a class of content on the Internet that, for various technical reasons, is not indexed by search engines,” and thus would not be accessible through a traditional search engine. Information on the Deep Web includes content on private intranets (internal networks such as those at corporations, government agencies, or universities), commercial databases like Lexis Nexis or Westlaw, or sites that produce content via search queries or forms. Perhaps not as well known is the deep web, which is accessible only to users who know how to get there, perhaps via a direct link, IP address or Internet relay chat (IRC) room. Within the deep web lies the dark net, Going even further into the web, the Dark Web is the segment of the Deep Web that has been intentionally hidden. The Dark Web is a general term that describes hidden Internet sites that users cannot access without using special software. While the content of these sites may be accessed, the publishers of these sites are concealed. Users access the Dark Web with the expectation of being able to share information and/or files with little risk of detection. A world of content that exists on overlay networks (such as Tor) whose URL addresses are hidden. The darknet uses a .onion domain and its URLs are random and hard to remember/find because they feature nonsensical numbers/letters and can change every few hours. Accessing the dark net requires special software that uses a randomized path to its destination. This indirect path helps obscure users’ location and identity and protects their (and the publishers’) anonymity— an important feature, given the nature of some of the activity that takes place there.
The layers of the Internet go far beyond the surface content that many can easily access in their daily searches. The other content is that of the Deep Web, content that has not been indexed by traditional search engines such as Google. The furthest corners of the Deep Web, segments known as the Dark Web, contain content that has been intentionally concealed. The Dark Web may be used for legitimate purposes as well as to conceal criminal or otherwise malicious activities. It is the exploitation of the Dark Web for illegal practices that has garnered the interest of officials and policymakers
There is more to the internet than meets the eye, with its three distinct layers of depth. The Surface Web, occupying 10% of the internet, contains those websites with visible contents resulting from search engine indexing (Beckstrom & Lund, 2019). These searchable, publicly available pages can be accessed from a standard web browser and connect to other pages using hyperlinks. However, information is being overlooked that was never intended to be hidden (Devine, Egger-Sider, & Rojas, 2015). This information, invisible to regular search engines, requires persistence and specialized search tools to locate. Beyond the Surface Web exist the Deep Web and the Dark Web.
Several distinct characteristics identify the differing layers of the internet (Table 1). Inaccurate understanding of the difference between the Surface Web and the Deep Web can be remedied. The Surface Web contains several billion websites and documents with diverse subsets which can be indexed by most search engines. The next layer, the Deep Web, includes millions of databases and dynamic web pages that often reside behind pay walls or require passwords. This layer contains higher quality information than is usually found on the Surface Web (Prasad, 2018).
|Surface Web||Deep Web||Dark Web|
|• Freely accessible |
• Indexed by standard search engines
• Mostly HTML files
• Fixed content
|• Not indexed |
• Proprietary databases
• Dynamically generated content
• Login authorization
|• Specialized software/tools|
• Intentionally hidden data
• Encrypted and anonymous
• Difficult to track
The surface web is the indexed web that all of us know, including search engines like Google. Estimates vary about how large the surface web is, some saying it is only 1 percent of the total content that is actually on the web. Others say that figure is wildly exaggerated. Current estimates that the surface web contains over 4.6 billion pages seem to be accurate. All can agree that it is growing exponentially every year In 2005, the number of Internet users reached 1 billion worldwide. This number surpassed 2 billion in 2010 and crested over 3 billion in 2014. As of July 2016, more than 46% of the world population was connected to the Internet. While data exist on the number of Internet users, data on the number of users accessing the various layers of the web and on the breadth of these layers are less clear.
The magnitude of the web is growing. According to one estimate, there were 334.6 million Internet top-level domain names registered globally during the second quarter of 2016. This is a 12.9% increase from the number of domain names registered during the same period in 2015. As of February 2017, there were estimated to be more than 1.154 billion websites. As researchers have noted, however, these numbers “only hint at the size of the Web,” as numbers of users and websites are constantly fluctuating.
The Deep Web is also known as the hidden web or invisible web, and it is comprised of any content not accessible to search engines, i.e it cannot be accessed by traditional search engines because the content in this layer of the web is not indexed. This definition thus includes dynamic web pages, blocked sites (like those that ask you to answer a CAPTCHA to access), unlinked sites, private sites (like those that require login credentials), nonHTML/-contextual/-scripted content, and limited-access networks.Information here is not “static and linked to other pages” as is information on the Surface Web.
Limited-access networks cover all those resources and services that wouldn’t be normally accessible with a standard network configuration and so offer interesting possibilities for malicious actors to act partially or totally undetected by law enforcers. These include sites with domain names that have been registered on Domain Name System (DNS) roots that aren’t managed by the Internet Corporation for Assigned Names and Numbers (ICANN) and, hence, feature URLs with nonstandard top-level domains (TLDs) that generally require a specific DNS server to properly resolve. Other examples are sites that registered their domain name on a completely different system from the standard DNS, like the .BIT domains we discussed in “Bitcoin Domains”. These systems not only escape the domain name regulations imposed by the ICANN; the decentralized nature of alternative DNSs also makes it very hard to sinkhole these domains, if needed.
As researchers have noted, it’s almost impossible to measure the size of the Deep Web. While some early estimates put the size of the Deep Web at 4,000–5,000 times larger than the surface web, the deep web includes such content as proprietary corporate data, confidential public data protected by government regulation, and commercial information accessible only to select groups of people such as subscribers, private email, and private social media content. Examples of deep web content include that contained on university library sites, NASA or LexisNexis. The changing dynamic of how information is accessed and presented means that the Deep Web is growing exponentially and at a rate that defies quantification.
The dark web forms a part of the deep web. It consists of anything that is not commonly indexed on the surface web and on search engines like Google. It can only be accessed using special software, such as the Tor browser.
Within the Deep Web, the Dark Web is also growing as new tools make it easier to navigate. Because individuals may access the Dark Web assuming little risk of detection, they may use this arena for a variety of legal and illegal activities. It is unclear, however, how much of the Deep Web is taken up by Dark Web content and how much of the Dark Web is used for legal or illegal activities. In a recent analysis of the dark web by Secure list, many of these sites had a short lifespan. Most dark web sites were active for at least 200 days, but usually not more than 300 days. Some were online for less than two months. 1 The dark web does take some level of technical sophistication to access. It’s not just a matter of downloading the Tor browser and typing in a URL, so people need to put in some effort in order to gain entry.
What is Tor?
Few Internet technologies have had more of an impact on anonymous Internet use than The Onion Router browser, commonly known as “Tor,” Tor is simply an Internet browser modified from the popular Firefox Internet browser. The browser modifications hide the user’s originating Internet Protocol (IP) address when surfing websites or sending e-mail. By hiding the true IP address of the user, attempts to trace or identify the user are nearly impossible without the use of extraordinary methods. In other words Tor stands for “the onion router” and is a method for anonymizing data. It actually refers to two things—the network and the browser. The network constitutes a large number of volunteer computers that run a specialized server application. The browser enables users to both hide their identity behind anonymizing software and access special services only available through the Tor browser. Run by activists who are dedicated to privacy and anonymity, the Tor Project is a non-profit organization that supports the network and develops the software. The technology upon which Tor is based (.onion) was developed back in the 1990s by the U.S. Navy in order to protect intelligence communications. Today, the U.S. government, specifically the U.S. Department of State Bureau of Democracy, Human Rights and Labor, remains a major funder of the Tor Project.
How does it work?
The technology behind Tor is open source, which means that programmers and experts can see into the source code. Unlike some virtual private networks (VPN), Tor does not have a commercial stake in collecting user data and they are committed to remaining non-profit. More than 17.5 million downloads of Tor have been recorded to date. Despite its anonymizing mechanics, Tor is certainly not completely untraceable. For example, if you access sites on the surface web from the Tor browser, you can still be identified. And due to its layered, anonymizing technology, the Tor browser is a little slower than your normal connection.
Who uses Tor?
The Tor Project claims that more than 2 million people use Tor daily. But despite the reputation of the dark web as being a haven for criminal activity, a recent survey concluded that only 45% of .onion sites appear to host illegal activity. And it’s not as vast as some people have made it out to be. While the surface web hosts billions of different sites, it is estimated that Tor hidden sites number only in the thousands, perhaps tens of thousands but no more. Therefore, we can conclude that its reputation for being a place where criminals go to hang out is not entirely accurate—or at the very least, not a complete picture of the situation.
In fact, many use Tor for perfectly legitimate purposes. Activists from countries that suppress freedoms, journalists looking to protect their sources and even law enforcement and the military use Tor to establish secure communications, avoid surveillance and get around censorship. In countries where Facebook is banned, Tor has even launched a hidden version of the site where more than 1 million users can access the social networking platform.
Many also think that merely downloading the Tor browser is a sign of criminal activity. While law enforcement and intelligence authorities may monitor Tor downloads, it is not illegal to do so, although some countries do view it as a signal of possible nefarious activity.