Semantic vs. presentational HTML

2008-02-14

Today I debated with my colleagues the differences and merits of semantic HTML versus presentational HTML. This may seem a fairly esoteric topic to non-developers. However, for web developers it touches upon a fundamental issue, namely that of best coding practices. Should HTML be coded with semantic or presentational preference? Are there different situations where one coding style is more appropriate than the other? And what constitutes semantic versus presentational HTML in the first place?

Since my colleagues and I left these issues sort of unresolved, I am going to consider them in some more detail here. Web developers are divided in two camps, the semantic HTML advocates and the presentational HTML advocates. My colleague seemed to be arguing for a presentational approach. Before we look at the reasoning that backs each of the two positions, let us define these terms first.

Semantic HTML is the subset of HTML that describes the content and structure of a document, whereas presentational HTML is the subset of HTML used to determine the appearance of the document. While this definition is straightforward and unambiguous, in practice it is often difficult to point out the exact range of these sets. In other words, it’s not always easy to tell whether a given tag belongs into the semantic or into the presentational category.

Some HTML tags can be assigned quite easily, however. For example <address>, <abbr>, <body>, <code>, <kbd> are all semantic tags while <center>, <font>, <hr>, <b> and <br> (“bed and breakfast markup”) are all presentational. The same categorisation can be expanded to distinguish between presentational and semantic attributes. In some cases, HTML offers semantic and presentational alternatives that achieve the same thing. For example, most browsers render <i> (presentational) exactly as <em> (semantic), and <b> (presentational) exactly as <strong> (semantic).

To make things even more complicated, there are HTML tags which have both presentational and semantic aspects and other tags which have neither. Tags like <button>, <caption>, and <table> are examples of hybrids, whereas <script>, <applet>, <object> are neither semantic nor presentational but constitute containers for other types of content.

There are two principle arguments for preferring semantic markup over presentational markup. The first is that semantic markup helps to make documents easier to understand for machine parsers, as for example search engine robots, agents, screen readers, accessibility software and the like. The second argument is that it is always a good idea to separate content from presentation, because it aids automation and it helps to simplify maintenance. This argument gained momentum with the introduction of style sheets, which allow to move the appearance aspects to an external document.

There are also good arguments for preferring presentational markup over semantic markup. For example, there is the ease-of-use aspect. It is simply easier to write <b> than <strong> or <span style=”font-weight:bold”>. Then there is the backward-compatibility aspect. Most if not all of the presentational HTML markup is understood by even the most outdated browsers. The first pro-semantic argument can also be called into question, because today’s robots and search engine spiders are sophisticated enough to interpret the presentational aspects of a document and derive document structure from it.

Finally, the strongest point for giving presentational HTML preference is that HTML itself is designed for document presentation, not for document storage or structuring. My own point of view is that the distinction between presentational and semantic HTML is quite academic and probably irrelevant. We have to live with the fact that HTML is a bit messy by design. In practice, presentational elements are often (ab-)used to create document structure, for example by using <br> for paragraph separation. The reverse is also the case. Semantic and structural elements are often (ab-)used to create a certain visual appearance, as for instance the <blockquote> tag or the various tags used in conjunction with tables.

I tend to see HTML as a language that is chiefly concerned with presentation. In this capacity it has been extremely practical and successful. Ideally, HTML takes care of the document structure whereas CSS takes care of the finer aspects of visual appearance. In practice, however, it is rather difficult to achieve a complete separation. Therefore I suggest to abandon the attempt to rigidly structure content with semantic markup at the expense of visual definition.

If semantic structuring is a design goal, then choose a fitting XML format. XML is much better suited to that task and HTML can be generated quite easily from XML. The semantic approach only makes sense in those cases where rather simple documents are created in HTML and where HTML is the primary format. Otherwise semantics would have to be foisted onto the limited HTML vocabulary. Since the number of dynamically generated pages is outgrowing the number of static pages on the Internet, and since the use of XML is increasing, the distinction becomes less and less important.