This chapter explains the overall structure of an HTML document, including what types of informaiton are contained in the
<body>. It also explains how to organize the various sections of a typical web page.
Basic HTML Document Structure
HTML documents (web pages) need to follow a few basic structural rules in order to work properly and be read accurately by web browsers.
The document must begin by declaring a DOCTYPE. There are several different HTML (and related) standards that have been in use over the years, and so therefore it is important to specify which type of document (which HTML standard) your document is using.
Mostly, today, the correct DOCTYPE is simply
html. So an HTML document should begin with:
This isn’t exactly an HTML tag in the proper sense, but rather it tells the browser how to interpret all the other tags that follow.
After the DOCTYPE declaration, the opening tag is the
<html> tag. The closing of the
<html> tag will be the last line of the document.
Inside the HTML tag, you can specify the language of the document (in this case, English).
<!DOCTYPE html> <html lang="en"> . . . <!-- entire contents of page --> . . </html>
Nested inside the
<html> tag are two sections, the
<head> and the
<body>. The body contains all the visible content, while the head contains information about the document itself. Nothing is outside of these two sections.
<!DOCTYPE html> <html lang="en"> <head> . <!-- Info about document here. --> . </head> <body> . . <!-- Contents of document here. --> . . </body> </html>
This is the basic structure of every HTML document. Everything is basically extra.
<head> element of an html document usually contains all the information needed by a browser to properly render the document, plus additional information describing the contents (for the benefit of aggregators and bots).
<meta> tag is used several times in the
<head> to specify various metadata (data about the document).
Metatags are empty tags, requiring no closing tag. You may end them with the self-closing slash (
/>), but this is not required (and some people even specifically discourage it).
There are several different common ways to encode characters (letters, numbers, and punctuation) in computer memory. If you don’t specify which one you are using, the web browser may mess up and display some of the wrong characters.
Most of the the time, these days, you want to specify the UTF-8 character set.
(The other common encoding — ASCII — doesn’t have all the extended characters like em-dashes and curly-quotes. If you’ve ever seen weird type glitches where quotation marks or apostrophes have been replaced with seemingly random characters, it’s because the document was written in UTF-8 but displayed using ASCII — which means someone didn’t specify the correct character set in the document.)
Description, Author, and Keywords
Basic information about the document — who wrote it and what it is about — are also conveyed through
<meta> tags. These each have two attributes: the name of the tag, and the content of the tag.
<meta name="description" content="A page about HTML."> <meta name="keywords" content="HTML, tags, metadata"> <meta name="author" content="Adam Michael Wood">
This kind of information used to be especially important for SEO purposes. It is no longer the case that this plays a huge role in SEO, however it does affect it. More importantly, having correct and detailed information in these elements contributes to a semantic web, where content all is easily findable and parsable by machines.
(If you use a Content Management System, the tags and post descriptions you write in the editor screen will usually be displayed in these meta tags.)
<title> tag appears in the head, and usually does not have any attributes. It encloses the title.
<title> This is the title of the page. </title>
The title should be accurate and, if possible, match the on-page visible title (usually in an
<h2> headline tag) in the body. The contents of the title are typically displayed in the tab at the top of the browser window.
It is not a good idea to nest any other tags in the title (like
<i>) because they will usually not display properly.
An HTML document can only specify one title.
Style Sheets, written in the CSS (Cascading Style Sheet) language, are separate documents which provide information about how to display a page in a browser. Information about sizes, colors, placement, and fonts are all contained in the style sheet. Keeping these details separate from the main HTML document makes it easier to change them without affecting the content of the document itself.
CSS style sheets are linked to within the
<head> of the HTML document, using the
<link> tag. The
href attribute specifies the URL of the style sheet file, and the
rel attribute specifies that the link is a stylesheet link (there are other types of links).
<link href="/css/style.css" rel="stylesheet">
RSS — Rich Site Summary, or Really Simple Syndication — is a way of providing a feed of site updates (like new blog posts) to subscribers, so that they are informed of new content as it is posted and can read that content from an RSS reader without having to visit your site.
If you are using a Content Management System, it will generally create an RSS feed for you, which is an XML document available at its own URL. That URL should be linked to from the
<head> of your blog’s main index page, so that RSS readers and web browsers can find it easily.
<link rel="alternate" type="application/rss+xml" title="RSS" href="/feed.xml" />
rel="alternate" attribute means that the linked URL contains the same content (a list of blog posts), but in an alternative format. The
type attribute specifies the type of format (RSS).
A lot of additional details about a document frequently appear in the
<head>. These will be covered in more detail later, in the relevant chapters.
It is possible to link to JS files from within the head, and this is a common practice. However, it is generally better to place these at the end of the document if possible.
Example of HTML document with
<head> element completed
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="description" content="A page about HTML."> <meta name="keywords" content="HTML, tags, metadata"> <meta name="author" content="Adam Michael Wood"> <link href="/css/style.css" rel="stylesheet"> <title>Guide to HTML</title> <link rel="alternate" type="application/rss+xml" title="RSS" href="/feed.xml" /> </head> <body> . . <!-- Contents of document here. --> . . </body> </html>
<body> tag is the main portion of the HML document, and may contain all sorts of things.
Typically, the structure of an HTML body can be divided into several sections, each possibly having one or more subsections:
- logo / branding / site title
- main navigation
- search bar
- main content
- one or more articles
- article title
- article content
- article metadata (author, tags, date posted)
- secondary navigation (archives by date, category, or tag)
- copyright / license info
- tertiary navigation
- contact info
- address / phone
- social links
Not all of these sections will be included in every page, or appear the same way. However this provides a good starting point for an example of how these different pieces would be put together into the
<body> of a document.
The most generic block-level element for structuring a webpage is the
<div> element. This was once used for every section and subsection of the page contents.
This resulted in a lot of nested
<body> <div class="header"> <div class="logo"> <!-- logo here --> </div> <div class="main-nav"> <!-- main navigation menu here --> </div> <div class="search-bar"> <!-- Search bar form here --> </div> </div> <div class="page-content"> <div class="main"> <div class="article"> <div class="article-header"> <h1>Title of Article</h1> <div class="article-meta"> <!-- Date, Author --> </div> </div> <div class="article-content"> <p>Article.</p> <p>Content.</p> </div> <div class="article-footer"> <!-- Tags, Categories, etc. --> </div> <div class="comments"> <!-- Article comments and commenting form. --> </div> </div> </div> <div class="sidebar"> <!-- Sidebar content, widgets, etc. --> </div> </div> <div class="footer"> <div class="license"> <!-- Copyright info --> </div> <div class="contact-info"> <!-- Contact information --> </div> </div> </body>
Thanks to an extended set of structural tags in the latest HTML standard (HTML5), this can be made easier to read more meaningful to search engines and other systems that extract information from your page (like screen readers for the blind).
Semantic structural tags
Many (but not all) of the
<div> elements above can be replaced by newer semantic elements introduced in HTML5.
“Semantic” means, basically, “linguistically meaningful.” Rather than just a generic
<div>, semantic tags have specific meanings related to how they are used on the page.
The most important semantic tags for page structure are:
<header>— Used for both document header information (page title, logo, navigation) and also article header (post title, meta data). — Don’t confuse with
<head>, which contains metadata for the entire document.
<nav>— A container for navigation menus.
<main>— The primary, unique content of a page. — There can only be one
<main>element in a document.
<article>— A single piece of content. A blog index page might have several
<article>elements, but the permanent page of a post would have just the one.
<section>— A section of a document.
<aside>— Can be used for secondary content, like a sidebar. Can also be used within an
<article>, for example to display pull-quotes or for comments (which are, by nature, tangential to the article).
<footer>— The footer for an entire document or a section of a document (like an
<address>— Used to contain the primary contact information related to the author or publisher of a page. Should not be used for arbitrary postal addresses contained in page content, but only for the contact information (including postal address, if relevant) of the author or publisher of a page or article.
Using these tags, lets recreate the example document above with elements that actually specify their semantic meaning.
<body> <header> <div class="logo"> <!-- logo here --> </div> <nav> <!-- main navigation menu here --> </nav> <div class="search-bar"> <!-- Search bar form here --> </div> </header> <div class="page-content"> <main> <article> <header> <h1>Title of Article</h1> <div class="article-meta"> <!-- Date, Author --> </div> </header> <section class="article-content"> <p>Article.</p> <p>Content.</p> </section> <footer> <!-- Tags, Categories, etc. --> </footer> <aside class="comments"> <!-- Article comments and commenting form. --> </aside> </article> </main> <aside> <!-- Sidebar content, widgets, etc. --> </aside> </div> <footer> <div class="copyright"> <!-- Copyright info --> </div> <address> <!-- Contact information --> </address> </footer> </body>
Using semantic tags — tags that actually mean something specific — makes the markup easier to read, because there are fewer repeated
<div> tags. There’s also less need to make sure everything has a meaningful
class attribute related to its use in the document.
Of course, some
<div> tags are still needed, but far fewer.
But making markup easier to read only provides a benefit when developing or working with the code (debugging, updating your template). The bigger benefit to semantic markup is that it provides more detailed information to screen readers and bots about how your page is structured. This makes it more accessible to the blind, which is important. It also provides SEO benefit.
(More information about semantic markup, and related benefits, is covered in another chapter.)
A note about
As of this writing, Internet Explorer does not support the
<main> tag — it simply doesn’t understand what it means.
You can correct this by telling IE what the element is being used for with the
<main role="main"> <!-- main content here --> </main>
A note about
<article> element is intended to be used for a piece of “stand-alone” content, with the most obvious example being a blog post. However, it does not need to be thought of in the “newspaper article” sense of the word “article.”
Each comment on a post can be an
<article>, nested inside the larger
<article>. Also, each widget in a sidebar could be considered an individual
It seems likely, however, that having a multitude of
<article> elements on each page of a site could tend toward confusion about what content is actually central, and which content is not.
There is no definitive answer about whether this may be the case, but there also seems no real benefit to an abundant use of the
<article> tag. For this reason, the most sensible option is likely to restrict its use to the “primary” content of a page. Comments can then be included as an aside to the article text, or (if you prefer) outside the
<article> element, in a separate
<div> tags easier to read
If you still find it difficult to keep track of which closing
</div> tags relate to which
<div> elements, you can use comments as a helpful reminder. This strategy is used by many Content Management Systems and theme developers, especially when pieces of their HTML document are actually broken up across several different PHP template files.
The easiest way to do this is to put the class (or ID) name into a comment on the same line as the closing
</div> tag. Following CSS and JQuery convention, class names are prefixed with a period (
. ) and ids with a hash sign (
<div class="wrapper"> <div class="container"> <div id="center-div"> </div> <!-- / #center-div --> </div> <!-- / .container --> </div> <!-- / .wrapper -->
This has no actual impact on anything, but can make future debugging and ongoing development easier, especially in a particularly complicated or long HTML document.