File Transfer Protocol.
Internet protocol suite
Application layer
* DHCP
* DHCPv6
* DNS
* FTP
* HTTP
* IMAP
* IRC
* LDAP
* MGCP
* NNTP
* BGP
* NTP
* POP
* RPC
* RTP
* RTSP
* RIP
* SIP
* SMTP
* SNMP
* SOCKS
* SSH
* Telnet
* TLS/SSL
* XMPP
* (more)
Transport layer
* TCP
* UDP
* DCCP
* SCTP
* RSVP
* (more)
Internet layer
* IP
o IPv4
o IPv6
* ICMP
* ICMPv6
* ECN
* IGMP
* IPsec
* (more)
Link layer
* ARP/InARP
* NDP
* OSPF
* Tunnels
o L2TP
* PPP
* Media access control
o Ethernet
o DSL
o ISDN
o FDDI
* (more)
* v
* t
* e
File Transfer Protocol (FTP) is a standard network protocol used to transfer files from one host to another host over a TCP-based network, such as the Internet.
FTP is built on a client-server architecture and uses separate control and data connections between the client and the server.[1] FTP users may authenticate themselves using a clear-text sign-in protocol, normally in the form of a username and password, but can connect anonymously if the server is configured to allow it. For secure transmission that hides (encrypts) the username and password, and encrypts the content, FTP is often secured with SSL/TLS ("FTPS"). SSH File Transfer Protocol ("SFTP") is sometimes also used instead, but is technologically different.
The first FTP client applications were command-line applications developed before operating systems had graphical user interfaces, and are still shipped with most Windows, Unix, and Linux operating systems.[2][3] Dozens of FTP clients and automation utilities have since been developed for desktops, servers, mobile devices, and hardware, and FTP has been incorporated into hundreds of productivity applications, such as Web page editors.
History
The original specification for the File Transfer Protocol was written by Abhay Bhushan and published as RFC 114 on 16 April 1971 and later replaced by RFC 765 (June 1980) and RFC 959 (October 1985), the current specification. Several proposed standards amend RFC 959, for example RFC 2228 (June 1997) proposes security extensions and RFC 2428 (September 1998) adds support for IPv6 and defines a new type of passive mode.[4]
Protocol overview
Communication and data transfer
The protocol was first specified June 1980 and updated in RFC 959,[2] which is summarized here.[5]
Illustration of starting a passive connection using port 21
FTP may run in active or passive mode, which determines how the data connection is established.[6] In active mode, the client creates a TCP control connection to the server and sends the server the client's IP address and an arbitrary client port number, and then waits until the server initiates the data connection over TCP to that client IP address and client port number.[7] In situations where the client is behind a firewall and unable to accept incoming TCP connections, passive mode may be used. In this mode, the client uses the control connection to send a PASV command to the server and then receives a server IP address and server port number from the server,[7][6] which the client then uses to open a data connection from an arbitrary client port to the server IP address and server port number received.[5] Both modes were updated in September 1998 to support IPv6. Further changes were introduced to the passive mode at that time, updating it to extended passive mode.[8]
The server responds over the control connection with three-digit status codes in ASCII with an optional text message. For example "200" (or "200 OK") means that the last command was successful. The numbers represent the code for the response and the optional text represents a human-readable explanation or request (e.g. <Need account for storing file>).[1] An ongoing transfer of file data over the data connection can be aborted using an interrupt message sent over the control connection.
While transferring data over the network, four data representations can be used:[2][3][4]
* ASCII mode: used for text. Data is converted, if needed, from the sending host's character representation to "8-bit ASCII" before transmission, and (again, if necessary) to the receiving host's character representation. As a consequence, this mode is inappropriate for files that contain data other than plain text.
* Image mode (commonly called Binary mode): the sending machine sends each file byte for byte, and the recipient stores the bytestream as it receives it. (Image mode support has been recommended for all implementations of FTP).
* EBCDIC mode: use for plain text between hosts using the EBCDIC character set. This mode is otherwise like ASCII mode.
* Local mode: Allows two computers with identical setups to send data in a proprietary format without the need to convert it to ASCII
For text files, different format control and record structure options are provided. These features were designed to facilitate files containing Telnet or ASA
Data transfer can be done in any of three modes:[1][2]
* Stream mode: Data is sent as a continuous stream, relieving FTP from doing any processing. Rather, all processing is left up to TCP. No End-of-file indicator is needed, unless the data is divided into records.
* Block mode: FTP breaks the data into several blocks (block header, byte count, and data field) and then passes it on to TCP.[4]
* Compressed mode: Data is compressed using a single algorithm (usually run-length encoding).
Login
FTP login utilizes a normal username and password scheme for granting access.[2] The username is sent to the server using the USER command, and the password is sent using the PASS command.[2] If the information provided by the client is accepted by the server, the server will send a greeting to the client and the session will commence.[2] If the server supports it, users may log in without providing login credentials, but the same server may authorize only limited access for such sessions.[2]
Anonymous FTP
A host that provides an FTP service may provide anonymous FTP access.[2] Users typically log into the service with an 'anonymous' (lower-case and case-sensitive in some FTP servers) account when prompted for user name. Although users are commonly asked to send their email address instead of a password,[3] no verification is actually performed on the supplied data.[9] Many FTP hosts whose purpose is to provide software updates will allow anonymous logins.[3]
NAT and firewall traversal
FTP normally transfers data by having the server connect back to the client, after the PORT command is sent by the client. This is problematic for both NATs and firewalls, which do not allow connections from the Internet towards internal hosts.[10] For NATs, an additional complication is that the representation of the IP addresses and port number in the PORT command refer to the internal host's IP address and port, rather than the public IP address and port of the NAT.
There are two approaches to this problem. One is that the FTP client and FTP server use the PASV command, which causes the data connection to be established from the FTP client to the server.[10] This is widely used by modern FTP clients. Another approach is for the NAT to alter the values of the PORT command, using an application-level gateway for this purpose.[10]
Differences from HTTP
FTP is considered out-of-band control, as opposed to in-band control which is used by HTTP.[11]
Web browser support
Most common web browsers can retrieve files hosted on FTP servers, although they may not support protocol extensions such as FTPS.[3][12] When an FTP—rather than an HTTP—URL is supplied, the accessible contents on the remote server are presented in a manner that is similar to that used for other Web content. A full-featured FTP client can be run within Firefox in the form of an extension called FireFTP
Syntax
FTP URL syntax is described in RFC1738,[13] taking the form: ftp://[<user>[:<password>]@]<host>[:<port>]/<url-path>[13] (The bracketed parts are optional.)
More details on specifying a username and password may be found in the browsers' documentation, such as, for example, Firefox [14] and Internet Explorer.[15] By default, most web browsers use passive (PASV) mode, which more easily traverses end-user firewalls.
Security
FTP was not designed to be a secure protocol—especially by today's standards—and has many security weaknesses.[16] In May 1999, the authors of RFC 2577 listed a vulnerability to the following problems:[17]
* Brute force attacks
* Bounce attacks
* Packet capture (sniffing)
* Port stealing
* Spoof attacks
* Username protection
FTP is not able to encrypt its traffic; all transmissions are in clear text, and usernames, passwords, commands and data can be easily read by anyone able to perform packet capture (sniffing) on the network.[2][16] This problem is common to many of the Internet Protocol specifications (such as SMTP, Telnet, POP and IMAP) that were designed prior to the creation of encryption mechanisms such as TLS or SSL.[4] A common solution to this problem is to use the "secure", TLS-protected versions of the insecure protocols (e.g. FTPS for FTP, TelnetS for Telnet, etc.) or a different, more secure protocol that can handle the job, such as the SFTP/SCP tools included with most implementations of the Secure Shell protocol.
Secure FTP
There are several methods of securely transferring files that have been called "Secure FTP" at one point or another.
FTPS
Explicit FTPS is an extension to the FTP standard that allows clients to request that the FTP session be encrypted. This is done by sending the "AUTH TLS" command. The server has the option of allowing or denying connections that do not request TLS. This protocol extension is defined in the proposed standard: RFC 4217. Implicit FTPS is a deprecated standard for FTP that required the use of a SSL or TLS connection. It was specified to use different ports than plain FTP.
SFTP
SFTP, the "SSH File Transfer Protocol," is not related to FTP except that it also transfers files and has a similar command set for users. SFTP, or secure FTP, is a program that uses Secure Shell (SSH) to transfer files. Unlike standard FTP, it encrypts both commands and data, preventing passwords and sensitive information from being transmitted openly over the network. It is functionally similar to FTP, but because it uses a different protocol, standard FTP clients cannot be used to talk to an SFTP server, nor can one connect to an FTP server with a client that supports only SFTP.
FTP over SSH (not SFTP)
FTP over SSH (not SFTP) refers to the practice of tunneling a normal FTP session over an SSH connection.[16] Because FTP uses multiple TCP connections (unusual for a TCP/IP protocol that is still in use), it is particularly difficult to tunnel over SSH. With many SSH clients, attempting to set up a tunnel for the control channel (the initial client-to-server connection on port 21) will protect only that channel; when data is transferred, the FTP software at either end will set up new TCP connections (data channels), which bypass the SSH connection and thus have no confidentiality or integrity protection, etc.
Otherwise, it is necessary for the SSH client software to have specific knowledge of the FTP protocol, to monitor and rewrite FTP control channel messages and autonomously open new packet forwardings for FTP data channels. Software packages that support this mode include:
* Tectia ConnectSecure (Win/Linux/Unix) of SSH Communications Security's software suite
* Tectia Server for IBM z/OS of SSH Communications Security's software suite
* FONC (the GPL licensed)
* Co:Z FTPSSH Proxy
FTP over SSH is sometimes referred to as secure FTP; this should not be confused with other methods of securing FTP, such as SSL/TLS (FTPS). Other methods of transferring files using SSH that are not related to FTP include SFTP and SCP; in each of these, the entire conversation (credentials and data) is always protected by the SSH protocol.
List of FTP commands
Below is a list of FTP commands that may be sent to an FTP server, including all commands that are standardized in RFC 959 by the IETF. All commands below are RFC 959-based unless stated otherwise. Note that most command-line FTP clients present their own set of commands to users. For example, GET is the common user command to download a file instead of the raw command RETR.
Command RFC Description
ABOR &1000000 Abort an active file transfer
ACCT &1000000 Account information
ADAT &1002228RFC 2228 Authentication/Security Data
ALLO &1000000 Allocate sufficient disk space to receive a file
APPE &1000000 Append.
AUTH &1002228 RFC 2228 Authentication/Security Mechanism
CCC &1002228 RFC 2228 Clear Command Channel
CDUP &1000000 Change to Parent Directory
CONF &1002228 RFC 2228 Confidentiality Protection Command
CWD &1000697 RFC 697 Change working directory
DELE &1000000 Delete file.
ENC &1002228 RFC 2228 Privacy Protected Channel
EPRT &1002428 RFC 2428 Specifies an extended address and port to which the server should connect
EPSV &1002428 RFC 2428 Enter extended passive mode
FEAT &1002389 RFC 2389 Get the feature list implemented by the server
HELP &1000000 Help
LANG &1002640 RFC 2640 Language Negotiation
LIST &1000000 Returns information of a file or directory if specified, else information of the current working directory is returned
LPRT &1001639 RFC 1639 Specifies a long address and port to which the server should connect
LPSV &1001639 RFC 1639 Enter long passive mode
MDTM &1003659 RFC 3659 Return the last-modified time of a specified file
MIC &1002228 RFC 2228 Integrity Protected Command
MKD &1000000 Make directory
MLSD &1003659 RFC 3659 Lists the contents of a directory if a directory is named
MLST &1003659 RFC 3659 Provides data about exactly the object named on its command line, and no others
MODE &1000000 Sets the transfer mode (Stream, Block, or Compressed)
NLST &1000000 Returns a list of file names in a specified directory
NOOP &1000000 No operation (dummy packet; used mostly on keepalives)
OPTS &1002389 RFC 2389 Select options for a feature
PASS &1000000 Authentication password
PASV &1000000 Enter passive mode
PBSZ &1002228 RFC 2228 Protection Buffer Size
PORT &1000000 Specifies an address and port to which the server should connect
PROT &1002228 RFC 2228 Data Channel Protection Level
PWD &1000000 Print working directory. Returns the current directory of the host
QUIT &1000000 Disconnect
REIN &1000000 Re initializes the connection
REST &1003659 RFC 3659 Restart transfer from the specified point
RETR &1000000 Transfer a copy of the file
RMD &1000000 Remove a directory
RNFR &1000000 Rename from.
RNTO &1000000 Rename to
SITE &1000000 Sends site specific commands to remote server
SIZE &1003659 RFC 3659 Return the size of a file
SMNT &1000000 Mount file structure
STAT &1000000 Returns the current status
STOR &1000000 Accept the data and to store the data as a file at the server site
STOU &1000000 Store file uniquely
STRU &1000000 Set file transfer structure
SYST &1000000 Return system type
TYPE &1000000 Sets the transfer mode (ASCII/Binary)
USER &1000000 Authentication username
XCUP &1000775 RFC 775 Change to the parent of the current working directory
XMKD &1000775 RFC 775 Make a directory
XPWD &1000775 RFC 775 Print the current working directory
XRCP &1000743 RFC 743
XRMD &1000775 RFC 775 Remove the directory
XRSQ &1000743 RFC 743
XSEM &1000737 RFC 737 Send, mail if cannot
XSEN &1000737 RFC 737 Send to terminal
[edit] FTP reply codes
Main article: List of FTP server return codes
Below is a summary of the reply codes that may be returned by an FTP server. These codes have been standardized in RFC 959 by the IETF. As stated earlier in this article, the reply code is a three-digit value. The first digit is used to indicate one of three possible outcomes—success, failure or to indicate an error or incomplete reply:
* 2yz – Success reply
* 4yz or 5yz – Failure Reply
* 1yz or 3yz – Error or Incomplete reply
The second digit defines the kind of error:
* x0z – Syntax. These replies refer to syntax errors.
* x1z – Information. Replies to requests for information.
* x2z – Connections. Replies referring to the control and data connections.
* x3z – Authentication and accounting. Replies for the login process and accounting procedures.
* x4z – Not defined.
* x5z – File system. These replies relay status codes from the server file system.
The third digit of the reply code is used to provide additional detail for each of the categories defined by the second digit.
Internet
Visualization of Internet routing paths
An Opte Project visualization of routing paths through a portion of the Internet.
General[show]
* Access
* Censorship
* Democracy
* Digital divide
* Digital rights
* Freedom of information
* History of the Internet
* Internet phenomena
* Net neutrality
* Pioneers
* Privacy
* Sociology
* Usage
Governance[show]
* ICANN
*
Internet Engineering Task Force
* Internet Governance Forum
* Internet Society
Information infrastructure[show]
* Domain Name System
* Hypertext Transfer Protocol
* Internet exchange point
* Internet Protocol
* Internet protocol suite
* Internet service provider
* IP address
* POP3 email protocol
* Simple Mail Transfer Protocol
Services[show]
* Blogs
o Microblogging
* Email
* Fax
* File sharing
* File transfer
* Games
* Instant messaging
* Podcasts
* Shopping
* Television
* Voice over IP
* World Wide Web
o search
Guides[show]
* Book
* Index
* Outline
Portal icon Internet portal
* v
* t
* e
World Wide Web Center
The web's logo designed by Robert Cailliau
Inventor Tim Berners-Lee[1]
Robert Cailliau
Company CERN
Availability Worldwide
The World Wide Web (abbreviated as WWW or W3,[2] commonly known as the web), is a system of interlinked hypertext documents accessed via the Internet. With a web browser, one can view web pages that may contain text, images, videos, and other multimedia, and navigate between them via hyperlinks.
Using concepts from his earlier hypertext systems like ENQUIRE, British engineer, computer scientist and at that time employee of the CERN, Sir Tim Berners-Lee, now Director of the World Wide Web Consortium (W3C), wrote a proposal in March 1989 for what would eventually become the World Wide Web.[1] At CERN, a European research organisation near Geneva straddling the border between France and Switzerland,[3] Berners-Lee and Belgian computer scientist Robert Cailliau proposed in 1990 to use hypertext "to link and access information of various kinds as a web of nodes in which the user can browse at will",[4] and they publicly introduced the project in December of the same year.[5]
Contents
Main article: History of the World Wide Web
The NeXT Computer used by Berners-Lee. The handwritten label declares, "This machine is a server. DO NOT POWER IT DOWN!!"
In the May 1970 issue of Popular Science magazine, Arthur C. Clarke predicted that satellites would someday "bring the accumulated knowledge of the world to your fingertips" using a console that would combine the functionality of the photocopier, telephone, television and a small computer, allowing data transfer and video conferencing around the globe.[6]
In March 1989, Tim Berners-Lee wrote a proposal that referenced ENQUIRE, a database and software project he had built in 1980, and described a more elaborate information management system.[7]
With help from Robert Cailliau, he published a more formal proposal (on 12 November 1990) to build a "Hypertext project" called "WorldWideWeb" (one word, also "W3") as a "web" of "hypertext documents" to be viewed by "browsers" using a client–server architecture.[4] This proposal estimated that a read-only web would be developed within three months and that it would take six months to achieve "the creation of new links and new material by readers, [so that] authorship becomes universal" as well as "the automatic notification of a reader when new material of interest to him/her has become available." While the read-only goal was met, accessible authorship of web content took longer to mature, with the wiki concept, blogs, Web 2.0 and RSS/Atom.[8]
The proposal was modeled after the SGML reader Dynatext by Electronic Book Technology, a spin-off from the Institute for Research in Information and Scholarship at Brown University. The Dynatext system, licensed by CERN, was a key player in the extension of SGML ISO 8879:1986 to Hypermedia within HyTime, but it was considered too expensive and had an inappropriate licensing policy for use in the general high energy physics community, namely a fee for each document and each document alteration.
The CERN datacenter in 2010 housing some WWW servers
A NeXT Computer was used by Berners-Lee as the world's first web server and also to write the first web browser, WorldWideWeb, in 1990. By Christmas 1990, Berners-Lee had built all the tools necessary for a working Web:[9] the first web browser (which was a web editor as well); the first web server; and the first web pages,[10] which described the project itself. On 6 August 1991, he posted a short summary of the World Wide Web project on the alt.hypertext newsgroup.[11] This date also marked the debut of the Web as a publicly available service on the Internet. Many newsmedia have reported that the first photo on the web was uploaded by Berners-Lee in 1992, an image of the CERN house band Les Horribles Cernettes taken by Silvano de Gennaro; Gennaro has disclaimed this story, writing that media were "totally distorting our words for the sake of cheap sensationalism."[12]
The first server outside Europe was set up at the Stanford Linear Accelerator Center (SLAC) in Palo Alto, California, to host the SPIRES-HEP database. Accounts differ substantially as to the date of this event. The World Wide Web Consortium says December 1992,[13] whereas SLAC itself claims 1991.[14][15] This is supported by a W3C document titled A Little History of the World Wide Web.[16]
The crucial underlying concept of hypertext originated with older projects from the 1960s, such as the Hypertext Editing System (HES) at Brown University, Ted Nelson's Project Xanadu, and Douglas Engelbart's oN-Line System (NLS). Both Nelson and Engelbart were in turn inspired by Vannevar Bush's microfilm-based "memex", which was described in the 1945 essay "As We May Think".[17]
Berners-Lee's breakthrough was to marry hypertext to the Internet. In his book Weaving The Web, he explains that he had repeatedly suggested that a marriage between the two technologies was possible to members of both technical communities, but when no one took up his invitation, he finally assumed the project himself. In the process, he developed three essential technologies:
1. a system of globally unique identifiers for resources on the Web and elsewhere, the universal document identifier (UDI), later known as uniform resource locator (URL) and uniform resource identifier (URI);
2. the publishing language HyperText Markup Language (HTML);
3. the Hypertext Transfer Protocol (HTTP).[18]
The World Wide Web had a number of differences from other hypertext systems available at the time. The web required only unidirectional links rather than bidirectional ones, making it possible for someone to link to another resource without action by the owner of that resource. It also significantly reduced the difficulty of implementing web servers and browsers (in comparison to earlier systems), but in turn presented the chronic problem of link rot. Unlike predecessors such as HyperCard, the World Wide Web was non-proprietary, making it possible to develop servers and clients independently and to add extensions without licensing restrictions. On 30 April 1993, CERN announced that the World Wide Web would be free to anyone, with no fees due.[19] Coming two months after the announcement that the server implementation of the Gopher protocol was no longer free to use, this produced a rapid shift away from Gopher and towards the Web. An early popular web browser was ViolaWWW for Unix and the X Windowing System.
Robert Cailliau, Jean-François Abramatic of IBM, and Tim Berners-Lee at the 10th anniversary of the World Wide Web Consortium.
Scholars generally agree that a turning point for the World Wide Web began with the introduction[20] of the Mosaic web browser[21] in 1993, a graphical browser developed by a team at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign (NCSA-UIUC), led by Marc Andreessen. Funding for Mosaic came from the U.S. High-Performance Computing and Communications Initiative and the High Performance Computing and Communication Act of 1991, one of several computing developments initiated by U.S. Senator Al Gore.[22] Prior to the release of Mosaic, graphics were not commonly mixed with text in web pages and the web's popularity was less than older protocols in use over the Internet, such as Gopher and Wide Area Information Servers (WAIS). Mosaic's graphical user interface allowed the Web to become, by far, the most popular Internet protocol.
The World Wide Web Consortium (W3C) was founded by Tim Berners-Lee after he left the European Organization for Nuclear Research (CERN) in October 1994. It was founded at the Massachusetts Institute of Technology Laboratory for Computer Science (MIT/LCS) with support from the Defense Advanced Research Projects Agency (DARPA), which had pioneered the Internet; a year later, a second site was founded at INRIA (a French national computer research lab) with support from the European Commission DG InfSo; and in 1996, a third continental site was created in Japan at Keio University. By the end of 1994, while the total number of websites was still minute compared to present standards, quite a number of notable websites were already active, many of which are the precursors or inspiration for today's most popular services.
Connected by the existing Internet, other websites were created around the world, adding international standards for domain names and HTML. Since then, Berners-Lee has played an active role in guiding the development of web standards (such as the markup languages in which web pages are composed), and has advocated his vision of a Semantic Web. The World Wide Web enabled the spread of information over the Internet through an easy-to-use and flexible format. It thus played an important role in popularizing use of the Internet.[23] Although the two terms are sometimes conflated in popular use, World Wide Web is not synonymous with Internet.[24] The web is a collection of documents and both client and server software using Internet protocols such as TCP/IP and HTTP. Tim Berners-Lee was knighted in 2004 by Queen Elizabeth II for his contribution to the World Wide Web.
[edit] Function
The terms Internet and World Wide Web are often used in everyday speech without much distinction. However, the Internet and the World Wide Web are not the same. The Internet is a global system of interconnected computer networks. In contrast, the web is one of the services that runs on the Internet. It is a collection of text documents and other resources, linked by hyperlinks and URLs, usually accessed by web browsers from web servers. In short, the web can be thought of as an application "running" on the Internet.[25]
Viewing a web page on the World Wide Web normally begins either by typing the URL of the page into a web browser or by following a hyperlink to that page or resource. The web browser then initiates a series of communication messages, behind the scenes, in order to fetch and display it. In the 1990s, using a browser to view web pages—and to move from one web page to another through hyperlinks—came to be known as 'browsing,' 'web surfing,' or 'navigating the web'. Early studies of this new behavior investigated user patterns in using web browsers. One study, for example, found five user patterns: exploratory surfing, window surfing, evolved surfing, bounded navigation and targeted navigation.[26]
The following example demonstrates how a web browser works. Consider accessing a page with the URL http://example.org/wiki/World_Wide_Web.
First, the browser resolves the server-name portion of the URL (example.org) into an Internet Protocol address using the globally distributed database known as the Domain Name System (DNS); this lookup returns an IP address such as 208.80.152.2. The browser then requests the resource by sending an HTTP request across the Internet to the computer at that particular address. It makes the request to a particular application port in the underlying Internet Protocol Suite so that the computer receiving the request can distinguish an HTTP request from other network protocols it may be servicing such as e-mail delivery; the HTTP protocol normally uses port 80. The content of the HTTP request can be as simple as the two lines of text
GET /wiki/World_Wide_Web HTTP/1.1
Host: example.org
The computer receiving the HTTP request delivers it to web server software listening for requests on port 80. If the web server can fulfill the request it sends an HTTP response back to the browser indicating success, which can be as simple as
HTTP/1.0 200 OK
Content-Type: text/html; charset=UTF-8
followed by the content of the requested page. The Hypertext Markup Language for a basic web page looks like
<html>
<head>
<title>Example.org – The World Wide Web</title>
</head>
<body>
<p>The World Wide Web, abbreviated as WWW and commonly known ...</p>
</body>
</html>
The web browser parses the HTML, interpreting the markup (<title>, <p> for paragraph, and such) that surrounds the words in order to draw the text on the screen.
Many web pages use HTML to reference the URLs of other resources such as images, other embedded media, scripts that affect page behavior, and Cascading Style Sheets that affect page layout. The browser will make additional HTTP requests to the web server for these other Internet media types. As it receives their content from the web server, the browser progressively renders the page onto the screen as specified by its HTML and these additional resources.
Most web pages contain hyperlinks to other related pages and perhaps to downloadable files, source documents, definitions and other web resources. In the underlying HTML, a hyperlink looks like
<a href="http://example.org/wiki/Main_Page">Example.org, a free encyclopedia</a>
Graphic representation of a minute fraction of the WWW, demonstrating hyperlinks
Such a collection of useful, related resources, interconnected via hypertext links is dubbed a web of information. Publication on the Internet created what Tim Berners-Lee first called the WorldWideWeb (in its original CamelCase, which was subsequently discarded) in November 1990.[4]
The hyperlink structure of the WWW is described by the webgraph: the nodes of the webgraph correspond to the web pages (or URLs) the directed edges between them to the hyperlinks.
Over time, many web resources pointed to by hyperlinks disappear, relocate, or are replaced with different content. This makes hyperlinks obsolete, a phenomenon referred to in some circles as link rot and the hyperlinks affected by it are often called dead links. The ephemeral nature of the Web has prompted many efforts to archive web sites. The Internet Archive, active since 1996, is the best known of such efforts.Dynamic updates of web pages
Main article: Ajax (programming)
JavaScript is a scripting language that was initially developed in 1995 by Brendan Eich, then of Netscape, for use within web pages.[27] The standardised version is ECMAScript.[27] To make web pages more interactive, some web applications also use JavaScript techniques such as Ajax (asynchronous JavaScript and XML). Client-side script is delivered with the page that can make additional HTTP requests to the server, either in response to user actions such as mouse movements or clicks, or based on lapsed time. The server's responses are used to modify the current page rather than creating a new page with each response, so the server needs only to provide limited, incremental information. Multiple Ajax requests can be handled at the same time, and users can interact with the page while data is being retrieved. Web pages may also regularly poll the server to check whether new information is available.[28] WWW prefix
Many domain names used for the World Wide Web begin with www because of the long-standing practice of naming Internet hosts (servers) according to the services they provide. The hostname for a web server is often www, in the same way that it may be ftp for an FTP server, and news or nntp for a USENET news server. These host names appear as Domain Name System or [domain name server](DNS) subdomain names, as in www.example.com. The use of 'www' as a subdomain name is not required by any technical or policy standard and many web sites do not use it; indeed, the first ever web server was called nxoc01.cern.ch.[29] According to Paolo Palazzi,[30] who worked at CERN along with Tim Berners-Lee, the popular use of 'www' subdomain was accidental; the World Wide Web project page was intended to be published at www.cern.ch while info.cern.ch was intended to be the CERN home page, however the dns records were never switched, and the practice of prepending 'www' to an institution's website domain name was subsequently copied. Many established websites still use 'www', or they invent other subdomain names such as 'www2', 'secure', etc. Many such web servers are set up so that both the domain root (e.g., example.com) and the www subdomain (e.g., www.example.com) refer to the same site; others require one form or the other, or they may map to different web sites.
The use of a subdomain name is useful for load balancing incoming web traffic by creating a CNAME record that points to a cluster of web servers. Since, currently, only a subdomain can be used in a CNAME, the same result cannot be achieved by using the bare domain root.
When a user submits an incomplete domain name to a web browser in its address bar input field, some web browsers automatically try adding the prefix "www" to the beginning of it and possibly ".com", ".org" and ".net" at the end, depending on what might be missing. For example, entering 'microsoft' may be transformed to http://www.microsoft.com/ and 'openoffice' to http://www.openoffice.org. This feature started appearing in early versions of Mozilla Firefox, when it still had the working title 'Firebird' in early 2003, from an earlier practice in browsers such as Lynx.[31] It is reported that Microsoft was granted a US patent for the same idea in 2008, but only for mobile devices.[32]
In English, www is usually read as double-u double-u double-u. Some users pronounce it dub-dub-dub, particularly in New Zealand. Stephen Fry, in his "Podgrammes" series of podcasts, pronouncing it wuh wuh wuh. The English writer Douglas Adams once quipped in The Independent on Sunday (1999): "The World Wide Web is the only thing I know of whose shortened form takes three times longer to say than what it's short for". In Mandarin Chinese, World Wide Web is commonly translated via a phono-semantic matching to wà n wéi wang (???), which satisfies www and literally means "myriad dimensional net",[33] a translation that very appropriately reflects the design concept and proliferation of the World Wide Web. Tim Berners-Lee's web-space states that World Wide Web is officially spelled as three separate words, each capitalised, with no intervening hyphens.[34]
Use of the www prefix is declining as Web 2.0 web applications seek to brand their domain names and make them easily pronounceable.[35] As the mobile web grows in popularity, services like Gmail.com, MySpace.com, Facebook.com, Bebo.com and Twitter.com are most often discussed without adding www to the domain (or, indeed, the .com).Scheme specifiers: http and https
The scheme specifier http:// or https:// at the start of a web URI refers to Hypertext Transfer Protocol or HTTP Secure respectively. Unlike www, which has no specific purpose, these specify the communication protocol to be used for the request and response. The HTTP protocol is fundamental to the operation of the World Wide Web and the added encryption layer in HTTPS is essential when confidential information such as passwords or banking information are to be exchanged over the public Internet. Web browsers usually prepend http:// to addresses too, if omitted.Web servers
Main article: Web server
The primary function of a web server is to deliver web pages on the request to clients. This means delivery of HTML documents and any additional content that may be included by a document, such as images, style sheets and scripts.Privacy
Main article: Internet privacy
Every time a web page is requested from a web server the server can identify, and usually it logs, the IP address from which the request arrived. Equally, unless set not to do so, most web browsers record the web pages that have been requested and viewed in a history feature, and usually cache much of the content locally. Unless HTTPS encryption is used, web requests and responses travel in plain text across the internet and they can be viewed, recorded and cached by intermediate systems.
When a web page asks for, and the user supplies, personally identifiable information such as their real name, address, e-mail address, etc., then a connection can be made between the current web traffic and that individual. If the website uses HTTP cookies, username and password authentication, or other tracking techniques, then it will be able to relate other web visits, before and after, to the identifiable information provided. In this way it is possible for a web-based organisation to develop and build a profile of the individual people who use its site or sites. It may be able to build a record for an individual that includes information about their leisure activities, their shopping interests, their profession, and other aspects of their demographic profile. These profiles are obviously of potential interest to marketeers, advertisers and others. Depending on the website's terms and conditions and the local laws that apply information from these profiles may be sold, shared, or passed to other organisations without the user being informed. For many ordinary people, this means little more than some unexpected e-mails in their in-box, or some uncannily relevant advertising on a future web page. For others, it can mean that time spent indulging an unusual interest can result in a deluge of further targeted marketing that may be unwelcome. Law enforcement, counter terrorism and espionage agencies can also identify, target and track individuals based on what appear to be their interests or proclivities on the web.
Social networking sites make a point of trying to get the user to truthfully expose their real names, interests and locations. This makes the social networking experience more realistic and therefore engaging for all their users. On the other hand, photographs uploaded and unguarded statements made will be identified to the individual, who may regret some decisions to publish these data. Employers, schools, parents and other relatives may be influenced by aspects of social networking profiles that the posting individual did not intend for these audiences. On-line bullies may make use of personal information to harass or stalk users. Modern social networking websites allow fine grained control of the privacy settings for each individual posting, but these can be complex and not easy to find or use, especially for beginners.[36]
Photographs and videos posted onto websites have caused particular problems, as they can add a person's face to an on-line profile. With modern and potential facial recognition technology, it may then be possible to relate that face with other, previously anonymous, images, events and scenarios that have been imaged elsewhere. Because of image caching, mirroring and copying, it is difficult to remove an image from the World Wide Web.
[edit] Intellectual property
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (January 2013)
Main article: Intellectual property
The intellectual property rights for any creative work initially rests with its creator. Web users who want to publish their work onto the World Wide Web, however, need to be aware of the details of the way they do it. If artwork, photographs, writings, poems, or technical innovations are published by their creator onto a privately owned web server, then they may choose the copyright and other conditions freely themselves. This is unusual though; more commonly work is uploaded to web sites and servers that are owned by other organizations. It depends upon the terms and conditions of the site or service provider to what extent the original owner automatically signs over rights to their work by the choice of destination and by the act of uploading.[citation needed]
Many users of the web erroneously assume that everything they may find on line is freely available to them as if it was in the public domain. This is almost never the case, unless the web site publishing the work clearly states that it is. On the other hand, content owners are aware of this widespread belief, and expect that sooner or later almost everything that is published will probably be used in some capacity somewhere without their permission. Many publishers therefore embed visible or invisible digital watermarks in their media files, sometimes charging users to receive unmarked copies for legitimate use. Digital rights management includes forms of access control technology that further limit the use of digital content even after it has been bought or downloaded.[citation needed]
[edit] Security
The web has become criminals' preferred pathway for spreading malware. Cybercrime carried out on the web can include identity theft, fraud, espionage and intelligence gathering.[37] Web-based vulnerabilities now outnumber traditional computer security concerns,[38][39] and as measured by Google, about one in ten web pages may contain malicious code.[40] Most web-based attacks take place on legitimate websites, and most, as measured by Sophos, are hosted in the United States, China and Russia.[41] The most common of all malware threats is SQL injection attacks against websites.[42] Through HTML and URIs the web was vulnerable to attacks like cross-site scripting (XSS) that came with the introduction of JavaScript[43] and were exacerbated to some degree by Web 2.0 and Ajax web design that favors the use of scripts.[44] Today by one estimate, 70% of all websites are open to XSS attacks on their users.[45]
Proposed solutions vary to extremes. Large security vendors like McAfee already design governance and compliance suites to meet post-9/11 regulations,[46] and some, like Finjan have recommended active real-time inspection of code and all content regardless of its source.[37] Some have argued that for enterprise to see security as a business opportunity rather than a cost center,[47] "ubiquitous, always-on digital rights management" enforced in the infrastructure by a handful of organizations must replace the hundreds of companies that today secure data and networks.[48] Jonathan Zittrain has said users sharing responsibility for computing safety is far preferable to locking down the Internet.[49]
[edit] Standards
Main article: Web standards
Many formal standards and other technical specifications and software define the operation of different aspects of the World Wide Web, the Internet, and computer information exchange. Many of the documents are the work of the World Wide Web Consortium (W3C), headed by Berners-Lee, but some are produced by the Internet Engineering Task Force (IETF) and other organizations.
Usually, when web standards are discussed, the following publications are seen as foundational:
* Recommendations for markup languages, especially HTML and XHTML, from the W3C. These define the structure and interpretation of hypertext documents.
* Recommendations for stylesheets, especially CSS, from the W3C.
* Standards for ECMAScript (usually in the form of JavaScript), from Ecma International.
* Recommendations for the Document Object Model, from W3C.
Additional publications provide definitions of other essential technologies for the World Wide Web, including, but not limited to, the following:
* Uniform Resource Identifier (URI), which is a universal system for referencing resources on the Internet, such as hypertext documents and images. URIs, often called URLs, are defined by the IETF's RFC 3986 / STD 66: Uniform Resource Identifier (URI): Generic Syntax, as well as its predecessors and numerous URI scheme-defining RFCs;
* HyperText Transfer Protocol (HTTP), especially as defined by RFC 2616: HTTP/1.1 and RFC 2617: HTTP Authentication, which specify how the browser and server authenticate each other.
[edit] Accessibility
Main article: Web accessibility
There are methods available for accessing the web in alternative mediums and formats, so as to enable use by individuals with disabilities. These disabilities may be visual, auditory, physical, speech related, cognitive, neurological, or some combination therin. Accessibility features also help others with temporary disabilities like a broken arm or the aging population as their abilities change.[50] The Web is used for receiving information as well as providing information and interacting with society. The World Wide Web Consortium claims it essential that the Web be accessible in order to provide equal access and equal opportunity to people with disabilities.[51] Tim Berners-Lee once noted, "The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect."[50] Many countries regulate web accessibility as a requirement for websites.[52] International cooperation in the W3C Web Accessibility Initiative led to simple guidelines that web content authors as well as software developers can use to make the Web accessible to persons who may or may not be using assistive technology.[50][53]
[edit] Internationalization
The W3C Internationalization Activity assures that web technology will work in all languages, scripts, and cultures.[54] Beginning in 2004 or 2005, Unicode gained ground and eventually in December 2007 surpassed both ASCII and Western European as the Web's most frequently used character encoding.[55] Originally RFC 3986 allowed resources to be identified by URI in a subset of US-ASCII. RFC 3987 allows more characters—any character in the Universal Character Set—and now a resource can be identified by IRI in any language.[56] Statistics
Between 2005 and 2010, the number of web users doubled, and was expected to surpass two billion in 2010.[57] Early studies in 1998 and 1999 estimating the size of the web using capture/recapture methods showed that much of the web was not indexed by search engines and the web was much larger than expected.[58][59] According to a 2001 study, there were a massive number, over 550 billion, of documents on the Web, mostly in the invisible Web, or Deep Web.[60] A 2002 survey of 2,024 million web pages[61] determined that by far the most web content was in the English language: 56.4%; next were pages in German (7.7%), French (5.6%), and Japanese (4.9%). A more recent study, which used web searches in 75 different languages to sample the web, determined that there were over 11.5 billion web pages in the publicly indexable web as of the end of January 2005.[62] As of March 2009[update], the indexable web contains at least 25.21 billion pages.[63] On 25 July 2008, Google software engineers Jesse Alpert and Nissan Hajaj announced that Google Search had discovered one trillion unique URLs.[64] As of May 2009[update], over 109.5 million domains operated.[65][not in citation given] Of these 74% were commercial or other sites operating in the .com generic top-level domain.[65]
Statistics measuring a website's popularity are usually based either on the number of page views or on associated server 'hits' (file requests) that it receives. Speed issues
Frustration over congestion issues in the Internet infrastructure and the high latency that results in slow browsing has led to a pejorative name for the World Wide Web: the World Wide Wait.[66] Speeding up the Internet is an ongoing discussion over the use of peering and QoS technologies. Other solutions to reduce the congestion can be found at W3C.[67] Guidelines for web response times are:[68]
* 0.1 second (one tenth of a second). Ideal response time. The user does not sense any interruption.
* 1 second. Highest acceptable response time. Download times above 1 second interrupt the user experience.
* 10 seconds. Unacceptable response time. The user experience is interrupted and the user is likely to leave the site or system.
[edit] Caching
Main article: Web cache
If a user revisits a web page after only a short interval, the page data may not need to be re-obtained from the source web server. Almost all web browsers cache recently obtained data, usually on the local hard drive. HTTP requests sent by a browser will usually ask only for data that has changed since the last download. If the locally cached data are still current, they will be reused. Caching helps reduce the amount of web traffic on the Internet. The decision about expiration is made independently for each downloaded file, whether image, stylesheet, JavaScript, HTML, or other web resource. Thus even on sites with highly dynamic content, many of the basic resources need to be refreshed only occasionally. Web site designers find it worthwhile to collate resources such as CSS data and JavaScript into a few site-wide files so that they can be cached efficiently. This helps reduce page download times and lowers demands on the Web server.
There are other components of the Internet that can cache web content. Corporate and academic firewalls often cache Web resources requested by one user for the benefit of all. (See also caching proxy server.) Some search engines also store cached content from websites. Apart from the facilities built into web servers that can determine when files have been updated and so need to be re-sent, designers of dynamically generated web pages can control the HTTP headers sent back to requesting users, so that transient or sensitive pages are not cached. Internet banking and news sites frequently use this facility. Data requested with an HTTP 'GET' is likely to be cached if other conditions are met; data obtained in response to a 'POST' is assumed to depend on the data that was POSTed and so is not cached.
Browser
Browse, browser or browsing may refer to:
* Browse, a kind of orienting strategy in animals and human beings
* Browsing (herbivory), a type of feeding behavior in herbivores
* Web browser, used to access the World Wide Web
* File browser, also known as a file manager, used to manage files and related objects
* Help browser, for reading online help
* Code browser, for navigating source code
* Browser service, a feature of Microsoft Windows to let users browse and locate shared resources in neighboring computers
Cara Aman Berselancar di Internet
Banyak penjahat di dunia internet ini, dan mereka selalu berusaha mencari kelengahan kita sewaktu sedang surfing di internet, apalagi pada saat ini bisnis di dunia internet sangat menjanjikan. Oleh karena itu ke hati-hatian sangat diutamakan jangan sampai para penyusup masuk ke system dan mengobrak-abriknya.
Berikut ini ada beberapa tips agar terhindar dari tangan tangan jahil di dunia maya.
1. Gunakan Favorites atau Bookmarks
Pengguanaan Favorites atau Bookmarks ini dimaksudkan untuk menjamin website yang dimasuki adalah benar-benar website bisnis internet yang telah diikuti, sebab banyak upaya pencurian username dan password dengan cara membuat website palsu yang sama persis dengan aslinya, dengan URL yang mirip dengan aslinya. Jika dalam melakukan aktifitas menemukan kejanggalan yaitu tampilan halaman yang berubah dan koneksi terputus lalu muncul halaman yang meminta memasukkan username dan password,
2. Gunakan Antivirus
Pastikan pada komputer sudah terinstal Antivirus, gunakan Antirus profesional seperti Norton Antivirus, McAfee Antivirus, Kaspersky, F-Secure dan antivirus buatan vendor yang sudah berlisensi. Penggunaan antivirus akan sangat membantu untuk mengantisipasi masuknya virus ke PC. Update antivirus juga sangat bermanfaat untuk menangani jika terdapat virus baru yang beredar.
3. Gunakan anti Spyware dan anti Adware
Selain Virus ada yang harus diwaspadai yaitu Spyware dan Adware, Spyware adalah sebuah program kecil yang masuk ke komputer kita dengan tujuan memata-matai kegiatan berinternet kita dan mencuri semua data penting termasuk username dan password, Adware juga begitu tetapi lebih pada tujuan promosi yang akan memunculkan jendela/pop-up di komputer kita ketika sedang browsing, biasanya berupa iklan website porno.
4. Gunakan Firewall
Untuk lebih mengoptimalkan pertahanan komputer maka gunakanlah firewall, untuk Windows XP dan Vista bisa menggunakan firewall standar yang ada, saat ini ada beberapa firewall yang cukup mumpuni untuk mencegah para penyusup, seperti Comodo Firewal, Zone Alarm, ataupun mengaktifkan Fireall bawaan Windows.
5. Gunakan Internet Browser yang lebih baik
Daripada menggunakan Internet Explorer bawaan WIndows, lebih baik menggunakan Browser alternatif yang lebih aman dan mempunyai proteksi terhadap hacker yang lebih canggih.Saat ini beberapa penyedia browser yang selalu bersaing memberikan yang terbaik bagi user, seperti Mozila Firefox, Opera, Google Chrome dan lain-lain.
6. Hilangkan Jejak
Windows dan browser biasanya akan menyimpan file-file cookies, history atau catatan aktivitas user ketika berinternet, ini merupakan sumber informasi bagi para hacker untuk mengetahui kegiatan user dan juga mencuri username dan password yang telah digunakan dalam berinternet, selain itu hacker juga biasa mengikut sertakan file-file pencuri data mereka di folder-folder yang menyimpan cookies dan history ini di komputer .(Cookies = file yang masuk ke komputer ketika kita mengunjungi sebuah website.
History = Daftar kegiatan kita ketika berinternet yang disimpan oleh browser yang kita gunakan). Selalu hapus semua jejak berinternet agar para hacker tidak bisa menyusup ke komputer.
7. Ganti password sesering mungkin
Yang paling penting adalah mengganti password yang digunakan sesering mungkin, sebab secanggih apapun para hacker dapat mencuri username dan password tidak akan berguna. jika password sudah berubah ketika para hacker itu berusaha masuk ke website bisnis internet yang diikuti.
8. Buat password yang sukar ditebak
Jangat buat password yang berisikan tanggal lahir, nama keluarga, nama biatang peliharaan , atau menggunakan kalimat pendek dan umum digunakan sehari-hari. Buatlah password sepanjang mungkin semaksimal mungkin yang diperbolehkan, buat kombinasi antara huruf besar dan huruf kecil dan gunakan karakter spesial seperti ? > ) / & % $, dan yang paling penting jangan simpan catatan password di komputer dalam bentuk file, buatlah catatan pada selembar kertas dan taruh di tempat yang aman di sisi komputer , buatlah seolah-olah itu bukan catatan password, jangan simpan di dompet, jika dompet hilang maka akan kesulitan nantinya.
9. Jangan terkecoh e-mail palsu
Jika mendapatkankan email yang seakan-akan dari pengelola website bisnis internet atau e-gold yang ikuti dan meminta untuk mengirimkan username dan password , jangan hiraukan dan segera hapus email tersebut, jangan klik link apapun yang ada dan jangan buka attachment yang disertakan, pihak pengelola bisnis internet dan e-gold tidak pernah mengirim email semacam itu.
Domain Name System
Internet protocol suite
Application layer
* DHCP
* DHCPv6
* DNS
* FTP
* HTTP
* IMAP
* IRC
* LDAP
* MGCP
* NNTP
* BGP
* NTP
* POP
* RPC
* RTP
* RTSP
* RIP
* SIP
* SMTP
* SNMP
* SOCKS
* SSH
* Telnet
* TLS/SSL
* XMPP
* (more)
Transport layer
* TCP
* UDP
* DCCP
* SCTP
* RSVP
* (more)
Internet layer
* IP
o IPv4
o IPv6
* ICMP
* ICMPv6
* ECN
* IGMP
* IPsec
* (more)
Link layer
* ARP/InARP
* NDP
* OSPF
* Tunnels
o L2TP
* PPP
* Media access control
o Ethernet
o DSL
o ISDN
o FDDI
* (more)
* v
* t
* e
The Domain Name System (DNS) is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities. Most prominently, it translates easily memorised domain names to the numerical IP addresses needed for the purpose of locating computer services and devices worldwide. By providing a worldwide, distributed keyword-based redirection service, the Domain Name System is an essential component of the functionality of the Internet.
An often-used analogy to explain the Domain Name System is that it serves as the phone book for the Internet by translating human-friendly computer hostnames into IP addresses. For example, the domain name www.example.com translates to the addresses 192.0.43.10 (IPv4) and 2001:500:88:200::10 (IPv6). Unlike a phone book, the DNS can be quickly updated, allowing a service's location on the network to change without affecting the end users, who continue to use the same host name. Users take advantage of this when they use meaningful Uniform Resource Locators (URLs) and e-mail addresses without having to know how the computer actually locates the services.
The Domain Name System distributes the responsibility of assigning domain names and mapping those names to IP addresses by designating authoritative name servers for each domain. Authoritative name servers are assigned to be responsible for their particular domains, and in turn can assign other authoritative name servers for their sub-domains. This mechanism has made the DNS distributed and fault tolerant and has helped avoid the need for a single central register to be continually consulted and updated. Additionally, the responsibility for maintaining and updating the master record for the domains is spread among many domain name registrars, who compete for the end-user's (the domain-owner's) business. Domains can be moved from registrar to registrar at any time.
The Domain Name System also specifies the technical functionality of this database service. It defines the DNS protocol, a detailed specification of the data structures and data communication exchanges used in DNS, as part of the Internet Protocol Suite.
The Internet maintains two principal namespaces, the domain name hierarchy[1] and the Internet Protocol (IP) address spaces.[2] The Domain Name System maintains the domain name hierarchy and provides translation services between it and the address spaces. Internet name servers and a communication protocol implement the Domain Name System.[3] A DNS name server is a server that stores the DNS records for a domain name, such as address (A or AAAA) records, name server (NS) records, and mail exchanger (MX) records (see also list of DNS record types); a DNS name server responds with answers to queries against its database.
History
The practice of using a name as a simpler, more memorable abstraction of a host's numerical address on a network dates back to the ARPANET era. Before the DNS was invented in 1982, each computer on the network retrieved a file called HOSTS.TXT from a computer at SRI (now SRI International).[4][5] The HOSTS.TXT file mapped names to numerical addresses. A hosts file still exists on most modern operating systems by default and generally contains a mapping of "localhost" to the IP address 127.0.0.1. Many operating systems use name resolution logic that allows the administrator to configure selection priorities for available name resolution methods.
The rapid growth of the network made a centrally maintained, hand-crafted HOSTS.TXT file unsustainable; it became necessary to implement a more scalable system capable of automatically disseminating the requisite information.
At the request of Jon Postel, Paul Mockapetris invented the Domain Name System in 1983 and wrote the first implementation. The original specifications were published by the Internet Engineering Task Force in RFC 882 and RFC 883, which were superseded in November 1987 by RFC 1034[1] and RFC 1035.[3] Several additional Request for Comments have proposed various extensions to the core DNS protocols.
In 1984, four Berkeley students—Douglas Terry, Mark Painter, David Riggle, and Songnian Zhou—wrote the first Unix name server implementation, called The Berkeley Internet Name Domain (BIND) Server.[6] In 1985, Kevin Dunlap of DEC significantly re-wrote the DNS implementation. Mike Karels, Phil Almquist, and Paul Vixie have maintained BIND since then. BIND was ported to the Windows NT platform in the early 1990s.
BIND was widely distributed, especially on Unix systems, and is the dominant DNS software in use on the Internet.[7] Alternative name servers have been developed, partly motivated by a desire to improve upon BIND's record of vulnerability to attack. BIND version 9 is also written from scratch and has a security record comparable to other modern DNS software.[citation needed]
Structure
Domain name space
The domain name space consists of a tree of domain names. Each node or leaf in the tree has zero or more resource records, which hold information associated with the domain name. The tree sub-divides into zones beginning at the root zone. A DNS zone may consist of only one domain, or may consist of many domains and sub-domains, depending on the administrative authority delegated to the manager.
The hierarchical Domain Name System, organized into zones, each served by a name server
Administrative responsibility over any zone may be divided by creating additional zones. Authority is said to be delegated for a portion of the old space, usually in the form of sub-domains, to another name server and administrative entity. The old zone ceases to be authoritative for the new zone.
Domain name syntax
The definitive descriptions of the rules for forming domain names appear in RFC 1035, RFC 1123, and RFC 2181. A domain name consists of one or more parts, technically called labels, that are conventionally concatenated, and delimited by dots, such as example.com.
* The right-most label conveys the top-level domain; for example, the domain name www.example.com belongs to the top-level domain com.
* The hierarchy of domains descends from right to left; each label to the left specifies a subdivision, or subdomain of the domain to the right. For example: the label example specifies a subdomain of the com domain, and www is a sub domain of example.com. This tree of subdivisions may have up to 127 levels.
* Each label may contain up to 63 characters. The full domain name may not exceed the length of 253 characters in its textual representation.[1] In the internal binary representation of the DNS the maximum length requires 255 octets of storage, since it also stores the length of the name.[3] In practice, some domain registries may have shorter limits.[citation needed]
* DNS names may technically consist of any character representable in an octet. However, the allowed formulation of domain names in the DNS root zone, and most other sub domains, uses a preferred format and character set. The characters allowed in a label are a subset of the ASCII character set, and includes the characters a through z, A through Z, digits 0 through 9, and the hyphen. This rule is known as the LDH rule (letters, digits, hyphen). Domain names are interpreted in case-independent manner.[8] Labels may not start or end with a hyphen.[9] There is an additional rule that essentially requires that top-level domain names not be all-numeric.[10]
* A hostname is a domain name that has at least one IP address associated. For example, the domain names www.example.com and example.com are also hostnames, whereas the com domain is not.
Internationalized domain names
The limited set of ASCII characters permitted in the DNS prevented the representation of names and words of many languages in their native alphabets or scripts. To make this possible, ICANN approved the Internationalizing Domain Names in Applications (IDNA) system, by which user applications, such as web browsers, map Unicode strings into the valid DNS character set using Punycode. In 2009 ICANN approved the installation of internationalized domain name country code top-level domains. In addition, many registries of the existing top level domain names (TLD)s have adopted the IDNA system.
Name servers
Main article: Name server
The Domain Name System is maintained by a distributed database system, which uses the client-server model. The nodes of this database are the name servers. Each domain has at least one authoritative DNS server that publishes information about that domain and the name servers of any domains subordinate to it. The top of the hierarchy is served by the root name servers, the servers to query when looking up (resolving) a TLD.
Authoritative name server
An authoritative name server is a name server that gives answers that have been configured by an original source, for example, the domain administrator or by dynamic DNS methods, in contrast to answers that were obtained via a regular DNS query to another name server. An authoritative-only name server only returns answers to queries about domain names that have been specifically configured by the administrator.
In other words, an authoritative name server lets recursive name servers know what DNS data (the IPv4 IP, the IPv6 IP, a list of incoming mail servers, etc.) a given host name (such as "www.example.com") has. As just one example, the authoritative name server for "example.com" tells recursive name servers that "www.example.com" has the IPv4 IP 192.0.43.10.
An authoritative name server can either be a master server or a slave server. A master server is a server that stores the original (master) copies of all zone records. A slave server uses an automatic updating mechanism of the DNS protocol in communication with its master to maintain an identical copy of the master records.
A set of authoritative name servers has to be assigned for every DNS zone. An NS record about addresses of that set must be stored in the parent zone and servers themselves (as self-reference).
When domain names are registered with a domain name registrar, their installation at the domain registry of a top level domain requires the assignment of a primary name server and at least one secondary name server. The requirement of multiple name servers aims to make the domain still functional even if one name server becomes inaccessible or inoperable.[11] The designation of a primary name server is solely determined by the priority given to the domain name registrar. For this purpose, generally only the fully qualified domain name of the name server is required, unless the servers are contained in the registered domain, in which case the corresponding IP address is needed as well.
Primary name servers are often master name servers, while secondary name servers may be implemented as slave servers.
An authoritative server indicates its status of supplying definitive answers, deemed authoritative, by setting a software flag (a protocol structure bit), called the Authoritative Answer (AA) bit in its responses.[3] This flag is usually reproduced prominently in the output of DNS administration query tools (such as dig) to indicate that the responding name server is an authority for the domain name in question.[3]
Operation
Address resolution mechanism
Domain name resolvers determine the appropriate domain name servers responsible for the domain name in question by a sequence of queries starting with the right-most (top-level) domain label.
A DNS recursor consults three name servers to resolve the address www.wikipedia.org.
The process entails:
1. A network host is configured with an initial cache (so called hints) of the known addresses of the root name servers. Such a hint file is updated periodically by an administrator from a reliable source.
2. A query to one of the root servers to find the server authoritative for the top-level domain.
3. A query to the obtained TLD server for the address of a DNS server authoritative for the second-level domain.
4. Repetition of the previous step to process each domain name label in sequence, until the final step which returns the IP address of the host sought.
The diagram illustrates this process for the host www.wikipedia.org.
The mechanism in this simple form would place a large operating burden on the root servers, with every search for an address starting by querying one of them. Being as critical as they are to the overall function of the system, such heavy use would create an insurmountable bottleneck for trillions of queries placed every day. In practice caching is used in DNS servers to overcome this problem, and as a result, root name servers actually are involved with very little of the total traffic.
Recursive and caching name server
In theory, authoritative name servers are sufficient for the operation of the Internet. However, with only authoritative name servers operating, every DNS query must start with recursive queries at the root zone of the Domain Name System and each user system would have to implement resolver software capable of recursive operation.
To improve efficiency, reduce DNS traffic across the Internet, and increase performance in end-user applications, the Domain Name System supports DNS cache servers which store DNS query results for a period of time determined in the configuration (time-to-live) of the domain name record in question. Typically, such caching DNS servers, also called DNS caches, also implement the recursive algorithm necessary to resolve a given name starting with the DNS root through to the authoritative name servers of the queried domain. With this function implemented in the name server, user applications gain efficiency in design and operation.
As one example, if a client wants to know the IP for "www.example.com", it will send, to a recursive caching name server, a DNS request stating "I would like the IPv4 IP for 'www.example.com'." The recursive name server will then query authoritative name servers until it gets an answer to that query (or return an error if it's not possible to get an answer)--in this case 192.0.43.10.
The combination of DNS caching and recursive functions in a name server is not mandatory; the functions can be implemented independently in servers for special purposes.
Internet service providers (ISPs) typically provide recursive and caching name servers for their customers. In addition, many home networking routers implement DNS caches and recursors to improve efficiency in the local network.
DNS resolvers
See also: resolv.conf
The client-side of the DNS is called a DNS resolver. It is responsible for initiating and sequencing the queries that ultimately lead to a full resolution (translation) of the resource sought, e.g., translation of a domain name into an IP address.
A DNS query may be either a non-recursive query or a recursive query:
* A non-recursive query is one in which the DNS server provides a record for a domain for which it is authoritative itself, or it provides a partial result without querying other servers.
* A recursive query is one for which the DNS server will fully answer the query (or give an error) by querying other name servers as needed. DNS servers are not required to support recursive queries.
The resolver, or another DNS server acting recursively on behalf of the resolver, negotiates use of recursive service using bits in the query headers.
Resolving usually entails iterating through several name servers to find the needed information. However, some resolvers function more simply by communicating only with a single name server. These simple resolvers (called "stub resolvers") rely on a recursive name server to perform the work of finding information for them.
Circular dependencies and glue records
Name servers in delegations are identified by name, rather than by IP address. This means that a resolving name server must issue another DNS request to find out the IP address of the server to which it has been referred. If the name given in the delegation is a subdomain of the domain for which the delegation is being provided, there is a circular dependency. In this case the name server providing the delegation must also provide one or more IP addresses for the authoritative name server mentioned in the delegation. This information is called glue. The delegating name server provides this glue in the form of records in the additional section of the DNS response, and provides the delegation in the answer section of the response.
For example, if the authoritative name server for example.org is ns1.example.org, a computer trying to resolve www.example.org first resolves ns1.example.org. Since ns1 is contained in example.org, this requires resolving example.org first, which presents a circular dependency. To break the dependency, the name server for the org top level domain includes glue along with the delegation for example.org. The glue records are address records that provide IP addresses for ns1.example.org. The resolver uses one or more of these IP addresses to query one of the domain's authoritative servers, which allows it to complete the DNS query.
Record caching
The DNS Resolution Process reduces the load on individual servers by caching DNS request records for a period of time after a response. This entails the local recording and subsequent consultation of the copy instead of initiating a new request upstream. The time for which a resolver caches a DNS response is determined by a value called the time to live (TTL) associated with every record. The TTL is set by the administrator of the DNS server handing out the authoritative response. The period of validity may vary from just seconds to days or even weeks.
As a noteworthy consequence of this distributed and caching architecture, changes to DNS records do not propagate throughout the network immediately, but require all caches to expire and refresh after the TTL. RFC 1912 conveys basic rules for determining appropriate TTL values.
Some resolvers may override TTL values, as the protocol supports caching for up to 68 years or no caching at all. Negative caching, i.e. the caching of the fact of non-existence of a record, is determined by name servers authoritative for a zone which must include the Start of Authority (SOA) record when reporting no data of the requested type exists. The value of the MINIMUM field of the SOA record and the TTL of the SOA itself is used to establish the TTL for the negative answer.
Reverse lookup
A reverse lookup is a query of the DNS for domain names when the IP address is known. Multiple domain names may be associated with an IP address. The DNS stores IP addresses in the form of domain names as specially formatted names in pointer (PTR) records within the infrastructure top-level domain arpa. For IPv4, the domain is in-addr.arpa. For IPv6, the reverse lookup domain is ip6.arpa. The IP address is represented as a name in reverse-ordered octet representation for IPv4, and reverse-ordered nibble representation for IPv6.
When performing a reverse lookup, the DNS client converts the address into these formats, and then queries the name for a PTR record following the delegation chain as for any DNS query. For example, assume the IPv4 address 208.80.152.2 is assigned to Wikimedia. It is represented as a DNS name in reverse order like this: 2.152.80.208.in-addr.arpa. When the DNS resolver gets a PTR (reverse-lookup) request, it begins by querying the root servers (which point to The American Registry For Internet Numbers' (ARIN's) servers for the 208.in-addr.arpa zone). On ARIN's servers, 152.80.208.in-addr.arpa is assigned to Wikimedia, so the resolver sends another query to the Wikimedia name server for 2.152.80.208.in-addr.arpa, which results in an authoritative response.
Client lookup
DNS resolution sequence
Users generally do not communicate directly with a DNS resolver. Instead DNS resolution takes place transparently in applications such as web browsers, e-mail clients, and other Internet applications. When an application makes a request that requires a domain name lookup, such programs send a resolution request to the DNS resolver in the local operating system, which in turn handles the communications required.
The DNS resolver will almost invariably have a cache (see above) containing recent lookups. If the cache can provide the answer to the request, the resolver will return the value in the cache to the program that made the request. If the cache does not contain the answer, the resolver will send the request to one or more designated DNS servers. In the case of most home users, the Internet service provider to which the machine connects will usually supply this DNS server: such a user will either have configured that server's address manually or allowed DHCP to set it; however, where systems administrators have configured systems to use their own DNS servers, their DNS resolvers point to separately maintained name servers of the organization. In any event, the name server thus queried will follow the process outlined above, until it either successfully finds a result or does not. It then returns its results to the DNS resolver; assuming it has found a result, the resolver duly caches that result for future use, and hands the result back to the software which initiated the request.
Broken resolvers
An additional level of complexity emerges when resolvers violate the rules of the DNS protocol. A number of large ISPs have configured their DNS servers to violate rules (presumably to allow them to run on less-expensive hardware than a fully compliant resolver), such as by disobeying TTLs, or by indicating that a domain name does not exist just because one of its name servers does not respond.[12]
As a final level of complexity, some applications (such as web browsers) also have their own DNS cache, in order to reduce the use of the DNS resolver library itself. This practice can add extra difficulty when debugging DNS issues, as it obscures the freshness of data, and/or what data comes from which cache. These caches typically use very short caching times—on the order of one minute.[13]
Internet Explorer represents a notable exception: versions up to IE 3.x cache DNS records for 24 hours by default. Internet Explorer 4.x and later versions (up to IE 8) decrease the default time out value to half an hour, which may be changed in corresponding registry keys.[14]
Other applications
The system outlined above provides a somewhat simplified scenario. The Domain Name System includes several other functions:
* Hostnames and IP addresses do not necessarily match on a one-to-one basis. Multiple hostnames may correspond to a single IP address: combined with virtual hosting, this allows a single machine to serve many web sites. Alternatively, a single hostname may correspond to many IP addresses: this can facilitate fault tolerance and load distribution, and also allows a site to move physical locations seamlessly.
* There are many uses of DNS besides translating names to IP addresses. For instance, Mail transfer agents use DNS to find out where to deliver e-mail for a particular address. The domain to mail exchanger mapping provided by MX records accommodates another layer of fault tolerance and load distribution on top of the name to IP address mapping.
* E-mail Blacklists: The DNS is used for efficient storage and distribution of IP addresses of blacklisted e-mail hosts. The usual method is putting the IP address of the subject host into the sub-domain of a higher level domain name, and resolve that name to different records to indicate a positive or a negative. Here is a hypothetical example blacklist:
o 102.3.4.5 is blacklisted => Creates 5.4.3.102.blacklist.example and resolves to 127.0.0.1
o 102.3.4.6 is not => 6.4.3.102.blacklist.example is not found, or default to 127.0.0.2
o E-mail servers can then query blacklist.example through the DNS mechanism to find out if a specific host connecting to them is in the blacklist. Today many of such blacklists, either free or subscription-based, are available mainly for use by email administrators and anti-spam software.
* Sender Policy Framework and DomainKeys, instead of creating their own record types, were designed to take advantage of another DNS record type, the TXT record.
* To provide resilience in the event of computer failure, multiple DNS servers are usually provided for coverage of each domain, and at the top level, thirteen very powerful root name servers exist, with additional "copies" of several of them distributed worldwide via Anycast.
* Dynamic DNS (sometimes called DDNS) allows clients to update their DNS entry as their IP address changes, as it does, for example, when moving between ISPs or mobile hot spots.
Protocol details
DNS primarily uses User Datagram Protocol (UDP) on port number 53 to serve requests.[3] DNS queries consist of a single UDP request from the client followed by a single UDP reply from the server. The Transmission Control Protocol (TCP) is used when the response data size exceeds 512 bytes, or for tasks such as zone transfers. Some resolver implementations use TCP for all queries.
DNS resource records
Further information: List of DNS record types
A Resource Record (RR) is the basic data element in the domain name system. Each record has a type (A, MX, etc.), an expiration time limit, a class, and some type-specific data. Resource records of the same type define a resource record set (RRset). The order of resource records in a set, returned by a resolver to an application, is undefined, but often servers implement round-robin ordering to achieve Global Server Load Balancing. DNSSEC, however, works on complete resource record sets in a canonical order.
When sent over an IP network, all records use the common format specified in RFC 1035:[15]
RR (Resource record) fields Field Description Length (octets)
NAME Name of the node to which this record pertains (variable)
TYPE Type of RR in numeric form (e.g. 15 for MX RRs) 2
CLASS Class code 2
TTL Count of seconds that the RR stays valid (The maximum is 231-1, which is about 68 years) 4
RDLENGTH Length of RDATA field 2
RDATA Additional RR-specific data (variable)
NAME is the fully qualified domain name of the node in the tree. On the wire, the name may be shortened using label compression where ends of domain names mentioned earlier in the packet can be substituted for the end of the current domain name. A free standing @ is used to denote the current origin.
TYPE is the record type. It indicates the format of the data and it gives a hint of its intended use. For example, the A record is used to translate from a domain name to an IPv4 address, the NS record lists which name servers can answer lookups on a DNS zone, and the MX record specifies the mail server used to handle mail for a domain specified in an e-mail address (see also List of DNS record types).
RDATA is data of type-specific relevance, such as the IP address for address records, or the priority and hostname for MX records. Well known record types may use label compression in the RDATA field, but "unknown" record types must not (RFC 3597).
The CLASS of a record is set to IN (for Internet) for common DNS records involving Internet hostnames, servers, or IP addresses. In addition, the classes Chaos (CH) and Hesiod (HS) exist.[16] Each class is an independent name space with potentially different delegations of DNS zones.
In addition to resource records defined in a zone file, the domain name system also defines several request types that are used only in communication with other DNS nodes (on the wire), such as when performing zone transfers (AXFR/IXFR) or for EDNS (OPT).
Wildcard DNS records
Main article: Wildcard DNS record
The domain name system supports wildcard domain names which are names that start with the asterisk label, '*', e.g., *.example.[1][17] DNS records belonging to wildcard domain names specify rules for generating resource records within a single DNS zone by substituting whole labels with matching components of the query name, including any specified descendants. For example, in the DNS zone x.example, the following configuration specifies that all subdomains (including subdomains of subdomains) of x.example use the mail exchanger a.x.example. The records for a.x.example are needed to specify the mail exchanger. As this has the result of excluding this domain name and its subdomains from the wildcard matches, all subdomains of a.x.example must be defined in a separate wildcard statement.
The role of wildcard records was refined in RFC 4592, because the original definition in RFC 1034 was incomplete and resulted in misinterpretations by implementers.[17]
Protocol extensions
The original DNS protocol had limited provisions for extension with new features. In 1999, Paul Vixie published in RFC 2671 an extension mechanism, called Extension mechanisms for DNS (EDNS) that introduced optional protocol elements without increasing overhead when not in use. This was accomplished through the OPT pseudo-resource record that only exists in wire transmissions of the protocol, but not in any zone files. Initial extensions were also suggested (EDNS0), such as increasing the DNS message size in UDP datagrams.
Dynamic zone updates
Dynamic DNS updates use the UPDATE DNS opcode to add or remove resource records dynamically from a zone data base maintained on an authoritative DNS server. The feature is described in RFC 2136. This facility is useful to register network clients into the DNS when they boot or become otherwise available on the network. Since a booting client may be assigned a different IP address each time from a DHCP server, it is not possible to provide static DNS assignments for such clients.
Security issues
Originally, security concerns were not major design considerations for DNS software or any software for deployment on the early Internet, as the network was not open for participation by the general public. However, the expansion of the Internet into the commercial sector in the 1990s changed the requirements for security measures to protect data integrity and user authentication.
Several vulnerability issues were discovered and exploited by malicious users. One such issue is DNS cache poisoning, in which data is distributed to caching resolvers under the pretense of being an authoritative origin server, thereby polluting the data store with potentially false information and long expiration times (time-to-live). Subsequently, legitimate application requests may be redirected to network hosts operated with malicious intent.
DNS responses are traditionally not cryptographically signed, leading to many attack possibilities; the Domain Name System Security Extensions (DNSSEC) modify DNS to add support for cryptographically signed responses. Several extensions have been devised to secure zone transfers as well.
Some domain names may be used to achieve spoofing effects. For example, paypal.com and paypa1.com are different names, yet users may be unable to distinguish them in a graphical user interface depending on the user's chosen typeface. In many fonts the letter l and the numeral 1 look very similar or even identical. This problem is acute in systems that support internationalized domain names, since many character codes in ISO 10646, may appear identical on typical computer screens. This vulnerability is occasionally exploited in phishing.[18]
Techniques such as forward-confirmed reverse DNS can also be used to help validate DNS results.
Domain name registration
The right to use a domain name is delegated by domain name registrars which are accredited by the Internet Corporation for Assigned Names and Numbers (ICANN), the organization charged with overseeing the name and number systems of the Internet. In addition to ICANN, each top-level domain (TLD) is maintained and serviced technically by an administrative organization, operating a registry. A registry is responsible for maintaining the database of names registered within the TLD it administers. The registry receives registration information from each domain name registrar authorized to assign names in the corresponding TLD and publishes the information using a special service, the WHOIS protocol.
ICANN publishes the complete list of TLD registries and domain name registrars. Registrant information associated with domain names is maintained in an online database accessible with the WHOIS service. For most of the more than 290 country code top-level domains (ccTLDs), the domain registries maintain the WHOIS (Registrant, name servers, expiration dates, etc.) information. For instance, DENIC, Germany NIC, holds the DE domain data. Since about 2001, most gTLD registries have adopted this so-called thick registry approach, i.e. keeping the WHOIS data in central registries instead of registrar databases.
For COM and NET domain names, a thin registry model is used. The domain registry (e.g., VeriSign) holds basic WHOIS data (i.e., registrar and name servers, etc.) One can find the detailed WHOIS (registrant, name servers, expiry dates, etc.) at the registrars.
Some domain name registries, often called network information centers (NIC), also function as registrars to end-users. The major generic top-level domain registries, such as for the COM, NET, ORG, INFO domains, use a registry-registrar model consisting of many domain name registrars.[19][20] In this method of management, the registry only manages the domain name database and the relationship with the registrars. The registrants (users of a domain name) are customers of the registrar, in some cases through additional layers of resellers.