Below is the structure of a basic HTML document, it is very plain and simple but also very important. This structure must be adhered to or else web browsers wouldn't know how to read our web pages.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<title>Page title goes here</title>
</head>
<body>
<p>Page content goes here.</p>
</body>
</html>
So this is a basic web page, not much to look at. If you copy the sample code into a text editor and save the file as "sample.htm", you can then open the file in a web browser to see that it is indeed working. So lets start going through this file line by line to see what is actually happening.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
This first line is not technically part of HTML, but it is important none-the-less. It is called the Document Type Declaration or DocType and is telling the user what type of document they are reading, in this case it is a HTML 4.01 compliant website using the strict subset. Now how do we know that? Lets break it down.
The first part "!DOCTYPE" tells the web browser that this is a DocType, kind of self explanatory but it should be explained for completeness.
The second part is "HTML", this part tells us what the top level tag should be. This can change depending on the DocType.
Next is "PUBLIC", this denotes the availability of the DocType file. This third part could also be "SYSTEM" if the file is to be found on your local computer.
The fourth part "-//W3C//DTD HTML 4.01//EN", is known as the Formal Public Identifier and has four different parts. The first part of the Formal Public Identifier is the "-" sign, this means that the organization that wrote it is not registered with the International Standards Organization (ISO), if there was a "+" sign then that would mean that the organization is resistered with the ISO. "W3C" is the name of the organization that wrote this DocType. In this case the World Wide Web Consortium (W3C). "DTD" denotes the type of DocType being used, in this case it is an actual Document Type Declaration file (dtd). "HTML 4.01" shows the human readable name and version of the DocType that is being used and "EN" denotes which language the DocType itself is written in. The language of the DocType plays no role in the language of the actual document.
The fifth part of the DocType is the URI that the document can be found at. In this case "http://www.w3.org/TR/html4/strict.dtd". If you type that URI into your browsers address bar, you can actually go and look at it. But it probably won't make much sense yet.
HTML comes in three different flavors, they are Strict, Transitional and Frameset. Each flavor has many tags in common but the key is really about their differences. Frameset allows you to use frames, which let you divy up one page and display many pages within it. Frameset should not be used anymore and was only created as a nicety to older websites while they upgraded. The transitional flavor allowed webdesigners to tell the browser how a page should be laid out directly within the HTML. This was also created as a nicety for older websites until they were upgraded and should not be used anymore. The reason is because that if you determine the layout for a webpage within the HTML then you cannot easily change how it is displayed, unless you change each individual page. The last flavor, the one we are using is Strict. HTML Strict gets rid of many tags and attributes whose sole purpose is website layout, with these tags gone webpages will look flat and very blocky unless other technologies are used. CSS should be used for layout as they can express positioning, coloring and even simple animations much better then HTML can. CSS also allows us to use one file to determine the look and feel of the entire website, but we will talk about that later on. It's time to talk about the next line in our document.
<html lang="en">
This tag is the root tag, all HTML documents must have it and there can only be one. This tag is saying that anything inside of it is HTML. It has one attribute set as well, it is called the language attribute (lang). Although this attribute is optional it is highly recommended that it is set so that people will know which language the document is written in, in this case it is "en" or English. The language attribute can also be localized, if I were to put "en-CA" that would mean that the document is written in Canadian English. The lang attribute uses the standard called ISO 639-1 to denote the languages and ISO 3166-2 for the country codes. You can go to wikipedia.org to find a list of language codes and a list of country codes. You can also read more about why defining a language is important at the W3C language information page.
<head>
This line is the start of the head tag. The head tag is used to store information about the docment that is not normally visible to the users themselves. Search engines, screen readers, PDA's and teletype machines will use the information within the head tags differently and sometimes not at all but it is good practice to set them as they almost always help users in some way.
<title>Page title goes here</title>
This is the title tag which denotes the title of the document. This tags contents normally shows up in the top part of the web browser. It should be kept short but as descriptive and relevent to the content as possible. This tag is normally the most important tag to search engines as this is the text they show the most predominately in the search results. It is also normally the first thing that blind users will hear when using screen readers, so it is almost always the documents first impression to its audience.
</head>
<body>
I have included two tags here as there really is nothing to say about the closing head tag, it simply states that the head section is done. The other tag shows the opening "body" tag; the body of a document is what is normally shown to the visitor, and is where all of your content goes. Everything from web links to pictures to plain old text is held here.
<p>Page content goes here.</p>
In this line we have a paragraph tag, it tells the user that this section consists of a collection of sentences that have one central idea or point. Many web browsers will present the paragraph with a blank line both above and below it, and sometimes with an indent based on user location. It is important to use paragraph tags when writing prose so that things like search engines and screen readers can understand it properly. It is also good to note that when using HTML the body tag cannot be empty. I put this paragraph tag here exactly for that reason, if the body tag was empty then strange things can begin to happen on the web page and some browsers may tell users that there is something wrong with the site.
</body>
</html>
We are now at the end of the document, these two lines simply state that we are now at the end of the body section and the end of the HTML document and no more content is found. When the closing HTML tag is encoutered then a well behaved web browser will stop looking at the file and finish displaying the page.
This work is licensed under a Creative Commons License.
Privacy Policy