Page Semantics

To ensure that assistive software and robots understand the content and context of the page it is important to lay it out in a logical fashion.This section of tutorials explains the importance of using appropriate structural (semantic) codes so that blind users can gain an initial overview of the page and its' structure. These semantic codes are also used by search robots, such as Google, and academic research tools, so their correct application helps all web users.

Although it sounds technical, using proper semantics is really quite simple and logical. The idea is that the semantic codes, such as headings, paragraphs, list items, labels and legends provide the basic skeleton of the page structure. They mark up individual pieces of text and explain how they all fit together as a logical story.

Another advantage of using these semantic codes is that you can design your own style for each and thereby easily maintain a consistent look to your site. One word of warning though - only use these codes for structuring your content. Do not use them to make the page look pretty. Some of these codes are stored by assistive software in a small memory cache to help the user. If you use a heading code to format a paragraph of text you can easily fill the users cache and cause the browser to crash.

Basic Page Structure

Introduction

The inventors of the Web wanted to make sure that their content could be read by machines and therefore focused much of their attention on the logical structure of the page. This is referred to as a "Semantic Structure". You will often hear people talking about "The Semantic Web" which is a continuation of that policy. Machines such as search robots and assistive software tools can only work properly if they have these clear, unambiguous, instructions. By following these semantic rules you are able to present your page information to the user in a logical format regardless of which platform or automated tool they use.

The main structure that these tools are looking for is as follows:

DOCTYPE –Which version of HTML/XHTML is used
TITLE – The title of the document
METADATA - Key information about the document
H1 – Main Heading of the page
H2 – Sub heading
H3 – Sub-sub heading
H4 – Sub-sub-sub heading
H5 – Sub-sub-sub-sub heading
H6 – Sub-sub-sub-sub-sub heading

Most web pages will only use the top two or three heading levels. Between each of the heading levels the automated software expects to find blocks of text (paragraphs) that expand upon (explain) the heading above. Each lower level of heading will provide more detail of the relevant topic until, at the lowest level (H6), we are dealing with the minutiae of the subject.

Doctype

The declaration is the very first thing in your document, even before the opening <html> tag. This declaration tells the browser which version of HTML or XHTML is being used. It is important for the browser or assistive software to know this so that it can interpret your code correctly.  The example of a !DOCTYPE declaration given below will tell the browser that the code used complies with the strict standard of HTML 4.01, english version. The declaration also tells the browser where to look if it needs to get an updated version of the standard.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

More examples of !DOCTYPE declarations can be found on the W3C web site. It is best not to use XHTML unless you are using an XML base, and understand the restrictions of XHTML, because it is less forgiving than standard HTML.

The Title Element

The Title of a page is the first thing that a search engine looks at. It is also the item that a browser publishes in its top border area and the first thing that a blind person hears when a new page opens. The TITLE is so important that it is embedded into the "head" section of the page code. Given this importance it is rather surprising that Google lists over 27 million pages that have the title “untitled document” and a further 2 million pages with the title “No Title”. That's 29 million documents that search engines will not catalogue correctly!.

Most web authoring software asks the author to provide a title for each new document it creates. The W3C states that “every HTML document must have a TITLE element in the HEAD section.” This title should be “context rich” (i.e. not a single word or phrase such as “Introduction” but rather “Introduction to Medieval Bee-Keeping”) so that the visitor (and computer software) has a clear, initial impression of what the document is about. It is also important to only use the standard set of ASCII characters for this title. Avoid using characters like &, $, @ and ? that might be interpreted by programmes such as PHP as machine code. Also avoid using the | (bar) character (bottom left of keyboard), or the hyphen (minus) character to separate parts of the title. The | character is read out by screen readers as “bar” and the hyphen (-) character as “minus”. These can cause confusion for blind people when they hear the title read out aloud. To separate phrases in the TITLE use a coma (,).

The title element should ideally be less than 64 characters in length. Titles that are longer than 64 characters may be truncated by the browser. Whilst there are some techniques for improving search engine rankings within a title, their overall impact is minimal. From an accessibility point of view it is more important that the title makes sense to the reader and clearly indicates the content of the document.

Avoid using the same word or phrase at the beginning of each page’s title as it makes it harder for blind people to navigate a site if all the page titles start with the same phrase. “Ways to contact Mycompany” is also more user-friendly that “Mycompany, contact us”

The meta elements

The element provides meta-information about your page, such as descriptions and keywords for search engines. Assistive software can also be set to read some of these elements out for blind users to get a better idea of the page content. Please do not use these elements to attract search engines by using irrelevant key words or phrases (it won’t work and will confuse blind users).

Some meta elements that should be on every page include:

  • Specifying the content type and character coding such as -
  • Providing a description of the page contents using -
  • Specifying relevant keywords that might be useful to search engines

The meta element can also be used to set refresh rates so that the page reloads itself after a set time. Please do not use this refresh attribute as it causes real problems for many disabled users. If the user is slow to work down the page it may automatically refresh itself before the user gets to the bottom. The user is returned to the top of the page each time it refreshes. This can be very frustrating. If you are running a live news feed page that requires refreshing to get the latest news then provide a “refresh” button for the user to click on.

More details on meta data can be found on the W3C web site. at http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.4

For the rest of this lesson we shall concentrate upon the body of the page as it is shown in the browser window.

Headings

The heading elements structure the page into meaningful blocks of information. These help assistive software (screen readers) and search robots understand the structure of the page and what is important within the context of the page. For blind users these elements can be vital as their screen reader can list just the section headings and enable the user to skip to a particular section of interest. For sighted users the section headings stand out from the body text because they are formatted differently (using bold or large text for example). Blind users cannot see these stylistic attributes and therefore have to rely upon the underlying HTML code to identify section headings.

Every page must start with a single top level heading element. This top-level heading (H1) must reflect the overall content of the page and will complement the page explained earlier. For this reason only one top level heading is allowed per page.

Section headings and their respective subsection headings need to follow a “tree" structure so that the user knows where they are within the structure of the page (see below). In order to interpret the page correctly it is important to apply heading levels sequentially. If you skip a level of heading (for example from h2to h4 without an intervening h3>) you break the sequence and thus the logic of the content.

<head>
<title>Page Title</title>
</head>
<body>
<h1>Page Heading</h1>
- some introductory text –
<h2> First section heading</h2>
- paragraph or two of text –
<h2> Next section heading</h2>
- some text
<h3>Subsection heading</h3>
some text
<h3>Next subsection</h3>
more text
<h2>Site Navigation</h2>
<ul>
<li>first link</li>
<li>second link</li>
<li>third link</li>
</ul>
</body>{}

Typical semantic structure of a page.

The correct use of headings makes it much easier for blind people to read and understand your web page. The underlying code allows screen readers to present the page in such a way as to allow the user to skip up and down the page picking out just the sections of interest in a similar fashion to that used by sighted users.

Screen readers load the page headings into a cached memory area so that they can be processed and presented to the user whenever required. To work reliably these headings need to be short and context rich. Try to limit headings to just one line and do not include images within the heading command. Even quite modern screen reader software can seize up if the memory cache gets overloaded with images. If your web site. causes problems that require the user to re-boot their computer they are most unlikely to return!

Why sequencing headings helps blind users

Imagine that you are a blind user and you are interrupted whilst reading a document (or perhaps just not paying full attention at the time). You hear the screen reader read out aloud "heading level 3 i minus sensys 5360 ". This makes no sense to you. If you were sighted you could quickly scan the page to remind yourself of the context, there might be a picture nearby or you could see the page title at the top. A blind person only has his or her memory to help them put this heading into context. If the page has been constructed correctly a blind person can press a button and hear the preceding level heading (in this case a level 2 heading). They would hear the screen reader say "heading level 2 laser printers for home and business" and immediately remember that they were looking for a printer to buy as a present for their daughter going to University. Without the preceding heading they would have to cursor up and down the page listening to the surrounding text until something jogged their memory.

Paragraphs

The paragraph element surrounds a block of text that has a common theme or purpose within the context of the preceding heading. Ideally a paragraph should contain no more than four or five sentences so that the reader or listener has a chance to digest the content before proceeding to the next paragraph. As explained in lesson 2, each sentence should be short, perhaps no more than one or two lines long. As a result you should be aiming for paragraphs that are normally be between four and ten lines long.

Soft Line Breaks

If you need to create a special line ending within a paragraph you can use the
(soft break) element without breaking out of the paragraph block. This is particularly useful if you are writing a poem or song and want to make sure that each verse stays as a single paragraph. However the use of this soft return code is purely for styling, it has no effect on the semantics of the page. For this reason you should never use two consecutive soft returns to create a new paragraph. Screen readers and search engines do not stop for soft returns in the same way that they do for paragraph endings.

Data tables

Tables are designed to show collections of data such as numbers, schedules or statistics in a way that they can be read in a two dimensional form. By having rows and columns, it is easy to make references either by row or column. Visually it is faster to scan across a table of data then to constantly repeat references used throughout a list. For example a school time table, has the days of the week and times of the day. Placing this into a data table, with the times across the top and days down the left, all the lessons can be placed in each of the remaining spaces of the grid.

This concept of tabular data is used extensively in computing with things like spreadsheets and databases. However, for a blind person having this information converted to speech, they won’t get the visual reference and will either have to memorise all the references or have them repeated.

Table element

In HTML, the table element is used to mark-up tabular data into the two dimensional grid of rows and columns. Each row and column can be treated as a block of inter-related information. Individually these bits of data may not make any sense if read out of context, but within the table format their relationship to each other is clear.

Summary attribute

Because tables are complex structures the opening tag should always have a summary attribute that can be read out by screen readers or other assistive technology to help explain the table.

The summary text is not displayed in regular browsers but is available to assistive technology like screen readers, so that it can be read out. This summary text should be used to describe the primary purpose of the table and give an indication of its overall structure. Most assistive output technologies will read the summary first to provide the user with information to help them interpret and use the table. It becomes very important with more complex tables, offering additional information to help people understand the tabular data that follows.

 

Long Data Tables

For long tables, ie tables that might, when printed, go over one or more pages it is possible to set the table heading and footer rows so that they repeat on the top and bottom of each new printed page. This option might work on some browsers (but not all) so best to be avoided where possible by placing such data into documents such as PDF files.

Avoid Columns of Empty Table Cells

Developers sometimes use columns of empty header and data cells to provide a space between the columns in a table. This doesn't follow correct semantic code and may cause problems to assistive technologies.

JAWS screen reader, for example, voices the word "blank" every time it encounters an empty cell and this can reduce both the usability and accessibility of data tables. By using CSS padding or margins rather than empty cells to control the presentation, it follows correct semantic code and helps assistive technologies like screen readers.

Table presentation

When presenting tabular data, it is important to consider how people may be viewing it. Having large areas of white space between columns can give problems to people who use screen magnifiers as typically only a very small of the screen is visible at any one time. Also having things to close together may make it difficult for people with literacy related problems.

In some cases, having light shading on alternate rows can assist people in scanning across each row of data. This is most helpful when a table spans across a wide screen or has some rows with large areas of white space between columns.

The caption element is the most accessible and semantically correct way of providing a table with an identifying title. By default, ‘caption‘ will place the title in the centre immediately above the table. However, CSS can be used to change the style and on screen position of the ‘caption‘ element. For example, the title (caption) can be put underneath the table as is commonly done in scientific and academic publications.

When coding a table, the caption element should come immediately after the opening table element and before anything else.

Example of a data table

The table below shows the meals available at certain times on certain days. If we look at the Wednesday column and the Lunch row we can tell that we will have roast beef for lunch on Wednesday. This is easy to understand visually as the column and row headings are at the top and far left of the table, and printed in bold.

Weekly menus for residents
Meal Monday Tuesday Wednesday Thursday
Breakfast Boiled egg Fried egg Bacon Poached egg
Lunch Lamb Stew Grilled fish Roast Beef Lamb Chops
Supper Pasta Pizza Hamburgers
Spaghetti

Blind people using screen readers, do not get the benefit of seeing the whole table, they only hear a string of text read from left to right, one row at a time. For assistive software such as screen readers to be able to understand the relationships within the table and present these effectively to the user it is important to use semantic code. So every table heading cell should be marked up with the TH element and every table data cell use the TD element. The software can then store these headings in memory and reference them to the relevant data cells within the table. A blind person using a screen reader is thus able to pause in the Roast Beef cell and check the column heading (Wednesday) and the row heading (Lunch).

 

Without these headings a blind person would have to remember the sequence of column headings and the title of the current row as they moved through the table. This becomes virtually impossible with larger tables.

Scoping the headings

Because we use the same heading element for both column and row headings it is advisable to add the scope attribute in order to clarify if the heading is a column or row heading.

We have added scope="col" to the column headings so that the assistive software knows that these headings relate to the data in the cells directly beneath. We have also added scope="row" to the row headings so that the assistive software knows that these headings relate to the current row (horizontally).

Abbreviations for headings

Because we have included the "scope" attribute to the table headings, assistive software such as a screen reader may now automatically read out both the row and column headings for each table data cell as the user moves around the table. For example as a screen reader user reads across the second row of the previous table, they would hear something like:
"Heading row 2 Breakfast, Monday, boiled egg, Tuesday, Fried Egg, Wednesday, Bacon, Thursday, Poached egg".
Exactly what is read out and how it is done will vary from one assistive software to another but the point is by placing this information in the code it gives the best chance for the assistive technology to work effectively.

It can get a bit tedious for blind people to hear the full title of long column or row headings, so HTML provides an abbreviation attribute for the

element. In our case above we might use the abbreviations "mon", "tue" and "wed" for the days of the week. When deciding on an abbreviation for a heading please bear in mind what it will sound like when processed by a screen reader. These abbreviations are not seen by sighted users, so it is good practice to write them phonetically (i.e. spelling is not as important as what it sounds like).

 

Complex data tables

The scope attribute really comes into its own when we have more complex tables as shown below.

Meal Type Monday Tuesday Wednesday
Breakfast Normal Boiled egg Fried egg Bacon
Vegetarian Cereal Cereal Cereal
Nut Free Boiled egg Fried egg Bacon
Lunch Normal Lamb Stew Grilled fish Roast Beef
Vegetarian Lentil Chili Fennel Bake Tofu Paella
Nut Free Lamb Stew Grilled fish Roast Beef
Supper Normal Pasta Pizza Hamburgers
Vegetarian Couscous Savoury Crumble Tofu Burgers
Nut Free Pasta Pasta Fish cakes

Each row has two headings, the mealtime and the type of meal. The mealtime (breakfast, lunch, supper) is only written once, but they each span three separate rows so they need to be repeated for each of the meal types as the screen reader or other assistive technology works through the table.

For really complex tables - where the relationship between the column headings and the data is not clear - it is possible to tie individual cells to individual headings using the "id" and "headers" attributes. Here each element is given a unique "id", for example heading data

The List Elements

The list elements (ul, ol and dl) group related items of information together so that they can be treated as a single block. There are three types of list structure –

  1. Unordered lists are simple lists usually referred to as “bulleted lists” because each item in the list can start with a graphic such as a bullet icon.
  2. Ordered lists are sequentially numbered lists of items or comments as used here.
  3. Definition Lists are more complex than the other two as they have two parts to the list item. The first part is the item title and the second part is some explanatory text.

Ordered and unordered lists

Assistive software and robots handle unordered and numbered lists in a special way. They do not expect lists to contain sentences (or phrases) that make sense when single items are taken out of the list context. Assistive software expects a list to contain a number of common items, such as the colours of a rainbow, or members of a class, parts of a washing machine or a group of links to other pages on the site. Because assistive software treats a list as a single block it can let the blind user to scroll up and down the list quickly, the software can even tell the user when they have reached the end of the list. If the list had been constructed using a series of paragraphs or soft returns the screen reader would not be able to do this.

This is an important concept to bear in mind because these tools do not expect to find paragraphs of text within a list. A paragraph is a complete entity in itself and will make sense to a reader when read out of context. A list item is part of a sequence and only makes sense when read as part of that sequence. If you need to force a new line within a list item you should use the line break
element. If you use the paragraph

element within a list, some software will assume that you have finished the list. This will destroy the meaning of the list for automated software and be confusing to blind users.

Because all the items in the list will share a common theme it is good practice to precede a list with a relevant section heading. It is possible to hide the heading from visual users using a style sheet. For example we could define a class .hidden {display:none} that would prevent browsers from showing the heading, but still enable screen readers to announce it audibly.

Screen Output

  1. Paul
  2. Mary
  3. John

Audio output

"Heading level 2 List of students in the junior class. list item one Paul, list item two Mary, list item 3 John

The xhtml code for the list shown above would read as :-

<h2 class="hidden">List of students in the Junior class</h2>
<ol>
<li> Paul</li>
<li> Mary</li>
<li> John </li>
</ol>

The CSS code for this is written as :-

#lesson .hidden {display: none;}

Because screen readers ignore the screen's style sheet they will read the list heading out aloud. Standard browsers such as Internet explorer will not display the heading "List of students.." because the style sheet has told them not to display it. Adding these little bits of (hidden) text just for blind users can make their experience of your website a whole lot better. You have already seen how we can hide the expansion of an abbreviation and in later lessons we shall look at how to provide hidden text to explain images and other "non-text" items.

Nested lists

It is possible to "nest" one list inside another to produce a sub-list. This is done by defining the new list within an existing list item element as shown below.

HTML code

<h2>Office Stuff</h2>

<ul>

<li>Paper
<ul>  <li>Notepaper</li>
<li>Letterhead</li>
<li>Photocopy</li>  </ul>
</li>
</i>Pens
<ul>  <li>Blue</li>
<li>Black</li>
<li>Red</li>   </ul>
<li>Staplers</li>
<li>Paper Clips</li>

</ul>

Screen output

Office Stuff
  • Paper:
    • Notepaper
    • Letterhead
    • Photocopy
  • Pens
    • Blue
    • Black
    • Red
  • Staplers
  • Paper Clips

Note that the originating list item becomes the heading for the nested list. Thus the Item "Paper" becomes the heading for the three types (Notepaper, Letterhead and Photocopy).

For visual users nested lists make sense because the nested part is indented further from the page margin. For blind users and text-only browsers these visual clues are not available. The result is that the screen reader tends to follow on after the last sub item so quickly that the user is not aware that the sub list has finished. The way to overcome this is to add and "end of list" phrase at the end of each nested list so that the user knows that the following item will be the next part of the main list. In the above example the solution would be to add the phrase "End of paper list" immediately after the word "Photocopy" and hide it from visual browsers using a CSS class.


<ul>
<li>Notepaper</li>
<li>Letterhead</li>
<li>Photocopy  <span class="hidden"> End of paper list</span> </li>
</ul>

Note the space () coded before the phrase itself, this is to make the screen reader take a breath before reading out the phrase.

Definition lists

Each item of a definition list has two parts, the definition heading and the definition content. A good example of a definition list would be a list of an organisation’s departments, where the definition title is the name of the department and the definition description is a summary of the department’s responsibilities. Another use would be a summary of news items as shown below.

<dl>
<dt>New help on the way for Merthyr job  seekers</dt>
<dd>Merthyr job seekers are set to benefit from a  new team of specialist employment advisors coming to the  area</dd>
<dt>New Recycling project  announced</dt>

<dd>A major new initiative is now underway to  develop state-of-the-art facilities for recycling food waste and other residual  waste in a beneficial </dd>
<dt>Counterfeit vodka found in  Merthyr Tydfil</dt>

<dd>Merthyr Tydfil County Borough Council was  notified by the Food Standards Agency that there is counterfeit vodka currently  in circulation. </dd>
</dl>

The output of the above code is shown in the following box.

New help on the way for Merthyr job seekers
Merthyr job seekers are set to benefit from a new team of specialist employment advisors coming to the area
New Recycling project announced
A major new initiative is now underway to develop state-of-the-art facilities for recycling food waste and other residual waste in a beneficial
Counterfeit vodka found in Merthyr Tydfil
Merthyr Tydfil County Borough Council was notified by the Food Standards Agency that there is counterfeit vodka currently in circulation.


The definition data text is indented as a block by default, but this can be over-ridden by the style sheet if you want. We can include images in the definition data element (dd) safely, but it is best if we do not include images in the title as these are cached by screen readers. However it is perfectly acceptable to make the title a link to a page containing more relevant information. If you provide a library of PDF documents for downloading then the title would link to the PDF file with the definition providing a short summary of the document so that visitors can make an informed choice before downloading any large documents.

 

Page structure for more complex pages

In this session of notes we have used a very simple model of a web page to explain how the various components of a page could be organised in HTML. We have placed the main content of the page as near to the top of the page as practicable to help users of assistive software and text-only browsers to get to the content they want fairly quickly. This is perfectly acceptable for linear content such as these lesson notes. However most websites are a lot more complicated and you will want to offer your users a more practical method of navigation.

Users who come from a search engine may well find that they are not really interested in the page content and want to go elsewhere quite quickly. If all your navigation links are at the bottom of the page the user has to read the whole page to find them, or. more probably, they will use their "back" button to return to the search engine (i.e. leave your site). To keep these "casual visitors" on your site you need to offer them some quick alternatives to show that you have more useful content elsewhere on the site.

The traditional approach is to start each page with a small set of "top level" links that includes links to the home page and the site map. This format has become so common that it is considered to be an intuitive system (i.e. users feel comfortable with it). We could therefore improve the page structure by making it more "intuitive" so that it now looks like this :-

  1. Top level navigation bar
  2. Page content
  3. Secondary navigation bar
  4. Anything else (e.g. advertisements)
  5. Footer navigation bar

The problem now is there may be several links in the top level navigation, objects to tab through the content, then more links in the second navigation before the links in the footer. This giving many tabs through the entire page. This may give keyboard users, people who use older screen readers or specialist keyboards, such as touch screens or hand-held mobile devices a hard time getting through your site.

A solution is to use named page anchors. These are links in your page that take you to somewhere else in your page. These are sometimes called skip to links.

An example of the HTML code folows:

<a name="content"></a>

Then at the top of the page we can provide a link to that anchor such as :

<a href="#content">Skip to content</a>

This will allow returning users to skip straight to the content and ignore the top level navigation links.

We can also use this technique to allow users to "Skip to" the secondary navigation by placing a named anchor at the beginning of the secondary navigation list and then providing a link to this anchor. Most assistive technology and Text-only browsers present the page content in the order in which it is written, so you should always try to write your secondary navigation after your main content. By doing this, it will also help search engines better index your pages and avoid the need for people to first scroll passed all your secondary navigation before seeing any content, which should be the main focus of the page.

Exactly how you structure your pages will depend upon how complex or detailed the page content is, or how the page fits into the overall structure of the website. The web is a flexible medium that allows you to adopt whichever structure best suits your needs, but it is important that you always remember to make sure that visitors using assistive technologies can obtain a quick overview of the page and jump quickly to the content or links that interest them on the page.

What next

Theoretically you could now create all the pages of your website and publish them for people to see. You are able to write the material you need for your content. You can structure pages in a semantic way so that browsers and assistive software can understand the page logic. You just need to add a site map with links to each page and you will be well on your way to having an accessible website. However your pages would only contain text, be presented in a linear fashion (line by line down the page) and be rather dull. We need to add some style to the pages to make them look more appealing. We should also add some images to help make our text easier to understand and generally make it a better user experience.

In the next few lessons we shall learn how to create an interesting "look and feel" for our pages without introducing anything that makes them harder for disabled people to use. In fact, by making our content more visually appealing, we are helping many people have a better experience using the site, making it easier and more enjoyable to use.