URL's Explained

URL = Uniform Resource Locator

RFC 1738 Updated by RFC1808, RFC2368, RFC2396 

www.w3.org calls these URI's (Uniform Resource Identifiers)

*** see also A Guide to URL's    A beginner's Guide to URLs    A Guide to URLs    and  RFC 1738

The Web is an information space overlaid on top of the underlying Internet structure. URL's are the addresses of points in that space.  These addresses are locations of resources in the web:  usually a website, but they can point to documents, images, downloadable files, services, electronic mailboxes, and other resources. They make resources available under a variety of naming schemes and access methods such as HTTP, FTP, and Internet mail addressable in the same simple way.

 

URI, URL, and URN

You will probably only need to know the acronym, URL.  But all three are closely linked:

URL (Uniform Resource Locator)

 - the worldwide web path of either a local (your computer) or remote (on the internet) file.  For example, the URL to my home page is:

The generic URL syntax is:   scheme://machine.domain/full-path-of-file

For example, this web page is actually a single file on a server.  The server is connected to the Internet, and is connected to the Worldwide Web structure (www).  The file, just like a file on your PC, has a path, which is:

Components of an HTTP URL

Types of URL's

*** info below is from http://www.netspace.org/users/dwb/url-guide.html#what 

Below are listed the most common URL schemes and their syntax. Generally, you will only run across the HTTP, FTP, Gopher, News, and Mailto schemes, but the others are included for completeness.

HyperText Transfer Protocol (HTTP)

http://<host>:<port>/<path>?<searchpart>

http - the Internet protocol specifically designed for use with the World Wide Web, and thus will be the most common scheme you are likely to use. Its syntax is:

host:port  - host is the Internet address of the WWW server, and the port is the port number to connect to (typically the http port number is 80). In most cases,    :<port> (colon and port) are omitted.  

path - the path tells the WWW server which file you want.  If path is omitted, this indicates that you want the "home page" for the system, and browsers will look for the file "index.html" or "index.htm" or default.html or default.htm 
WARNING:  if no default files for the path exist, then the browser will show a listing of all files in that directory 

?searchpart - may be used to pass information to the server, often to an executable CGI script, but for most WWW documents is not used. Generally, this part of the URL is omitted, along with the preceding question-mark.

Another character that may be frequently encountered when browsing the WWW is the pound sign (#), which can be used to point to a named anchor. An author of an HTML document can allow browsers to point to a specific section of a document by creating a named anchor within that document. Then, a URL with a pound sign and the anchor's name appended will reference that specific section. Named anchors are used throughout this document, and as an example, the following URL points directly to the section "What are URLs?":

http://www.netspace.org/users/dwb/url-guide.html#what

File Transfer Protocol (FTP)

FTP is a well-used means for transmitting files over the Internet. While there are many advantages to using HTTP instead, many systems don't offer full support of HTTP and clients are not as well developed as they are for FTP. Thus, many times files are distributed via FTP. Its syntax is:

ftp://<user>:<password>@<host>:<port>/<cwd1>/<cwd2>/.../<cwdN>/<name>;type=<typecode>

If contacting a site which provides general FTP access, the user and password can be omitted, including the colon between them and the at-symbol afterwards. The host is the Internet address of the FTP site. The port and its preceding colon can be omitted as well. The portion of "<cwd1>/<cwd2>/.../<cwdN>" refers to the series of "change directory" commands a client must use to move to the directory in which the file desired resides. The name is the name filename of the desired file. The construction ";type=<typecode>" allows for a transmission method (e.g. ascii vs. binary) to be specified, but I haven't found any clients which support this syntax, and in fact, most incorrectly assume that it is part of the filename. For now, avoid using the typecode.

Gopher Protocol (Gopher)

The Gopher protocol syntax is very similar to FTP and HTTP:

gopher://<host>:<port>/<gopher-path>

The host indicates the Internet address of the Gopher server, while the port, as in the previous cases, can generally be omitted along with its preceding colon. The gopher-path specifies the type of Gopher resource, a selector string, and perhaps other information. A detailed discussion of Gopher queries is not within the scope of this document, but generally you can determine a document's gopher-path from information provided by your browser.

Electronic Mail (Mailto)

The Mailto URL scheme is different from the previous three schemes, and it does not identify a file available over the Internet, but rather the email address of someone that can be reached via the Internet. The syntax is:

mailto:<account@site>

The account@site is the Internet email address of the person you wish to contact, as defined by RFC 822. Note that when encoded in WWW documents, some WWW browsers may not understand the Mailto scheme. Support for Mailto is increasing, but for now, one can switch to a different browser or interpret the Mailto URL manually.

Usenet News (News)

The News URL scheme allows for the referencing of Usenet newsgroups or specific articles. The syntax is either of the following:

news:<newsgroup-name>
news:<message-id>

The newsgroup-name is the Usenet newsgroup name (e.g. comp.infosystems.www.providers) and generally will tell the browser to retrieve the titles of all the available articles within that newsgroup. If the newsgroup-name is "*", the URL refers to "all available newsgroups." The message-id corresponds to the Message-ID of the specific article to obtain, and can be found within the article's header information.

Note that the News URL does not specify how a client is to obtain this information. A client must be properly configured to know where to obtain Usenet newsgroups and articles, generally from a specific NNTP server.

USENET News Using NNTP Access (NNTP)

The NNTP URL scheme is an alternative method to the News scheme for referencing Usenet articles and newsgroups. It has the syntax of:

nntp://<host>:<port>/<newsgroup-name>/<article-number>

The items within this syntax are all as described in previous schemes. Generally, it is better to use the News scheme and trust that the client knows how to obtain Usenet items. The NNTP scheme specifies that the NNTP protocol is used, and also specifies a specific NNTP server, designated by the host, to be used; most NNTP servers do not provide universal access. Thus, use News whenever possible.

Telnet to Remote Host (Telnet)

The Telnet URL designates an interactive session to a remote host on the Internet via the Telnet protocol. Its syntax is:

telnet://<user>:<password>@<host>:<port>/

The user and password tokens can be omitted, and are included only for advisory purposes. The host refers to the site to connect to, and port can be omitted, defaulting to the standard "23".

Telnet to Remote Hosts Requiring 3270 Emulation (TN3270)

The TN3270 URL scheme is for telnetting to systems which require 3270 terminal emulation, such as IBM mainframes. This is not a scheme defined by RFC 1738, but is a proposed addition. It is almost identical to the Telnet URL, and has the syntax:

tn3270://<user>:<password>@<host>:<port>/

Wide Area Information Search (WAIS)

The WAIS URL refers to WAIS databases, searches, or documents on a WAIS database. The WAIS URL scheme has one of the three following forms:

wais://<host>:>port>/<database>
wais://<host>:<port>/<database>?<search>
wais://<host>:<port>/<database>/<wtype>/<wpath>

The host and port (which can be omitted) describe the same constructs in previously described schemes. The first syntax indicates a specific WAIS database, the second a particular search, and the third a specific document.

Host-Specific File Names (File)

The File URL scheme indicates a file which can be obtained by the client machine. In many sources, this scheme is confused with the FTP scheme. FTP refers to a specific protocol for file transmission, and while the File URL leaves the retrieval method up to the client, which in some circumstances, might be via the FTP protocol. When the file is intended to be obtained via FTP, I recommend designating that URL scheme. The syntax for the File scheme is:

file://<host>/<path>

Example - to access the file, temp1.txt, on the root of your C drive - using your web browser, type in the address:

file://c:/temp1.txt

Notice the forward slash.  In Windows or DOS, paths use a backslash:   c:\temp1.txt

The host is the fully qualified domain name of the system, and the path is the hierarchical directory path of the form "directory/directory/.../filename". The host can be left as an empty string or "localhost" to refer to local files on the client on which the URL is being interpreted.

 

 

 

 

 

 

 

 

 

 

 

The 'http' is the protocol. It tells the browser how to display the file it will be opening. There are plenty of other protocols, such as FTP (File Transfer Protocol), News (Newsgroups), Gopher (Searching), and Mailto (Send e-mail). Protocols are usually separated by a colon (:) and 2 slashes (//). The exceptions are News and Mailto, which are followed just by one colon.

'www.w3nation.com' is the server that the file is located at. The browser will look for the file on the www, at the w3nation.com domain, which itself points to an IP address. An IP address is a series of 4 numbers that identifies a user on the internet. Mostly everyone connected uses an IP address. The default file of most web pages is 'index.html', and that is the first file displayed when someone visits w3nation.com.

Here is the URL you are currently at:

http://www.w3nation.com/learning/html/url.htm

Notice the forward slashes. Many people who use Windows are confused by this because Windows uses backslashes ("\" to separate it's file system).

After the server, the URL points to the exact path and filename where the file is located. This is how the web browser locates every file it comes across.

URL's can also be used locally. They don't have to point to the web server address. It would be quite a chore to have to enter the full path of the url everytime you create a new page and make a link. Luckily, there is a way around that.

There are two types of URLs, Absolute and Relative. The current URL as it is displayed in your browser and as it was displayed just above, is an Absolute URL. An Absolute URL is the full URL path. Your browser will always display Absolute URLs.

If you have a UNIX server and have a top level domain name, you can use a different type of Absolute URL. You can use forward slashes ("/") from the root directory for your URLs. For example, the absolute URL of this page would be:

/resources/learning/html/url.htm

The / specifies the root directory. Resources/learning/html/ are all directories. We know this because they all have a trailing forward slash. Think of these URLs as 'from the top' URLs. This makes it easier to move files around and not have to deal with all the problems of absolute URLs while working locally.

The 'from the top' Absolute URL to the home page would be:

/index.html

The forward slash is the top and 'index.html' is the filename. Notice there was no trailing slash after the filename.

Relative URLs are a little different, and are what most web developers use. With a relative URL, you navigate from the current URL, either backwards and forwards.

'../' (2 periods) - means back one level. If we were at http://www.w3nation.com/learning/index.html - then using the ../ would bring us back into the main directory.
/ (forward slash) - means forward one level, and the directory name must be specified.

So, if I was working on the w3nation.com home page, located at:

http://www.w3nation.com/index.html

and wanted to create a new link to the learning center, I would use this as the link:

resources/learning/

(the trailing forward slash isn't necessary, but is recommended. If there is no forward slash, the browser doesn't know you are looking for a directory and will first look for a file. The forward slash defines that as being a directory, which does reduce server time a little.)

Now, if I wanted to link back to my home page from my learning page, I would use this as a link:

../../index.html