Markup Languages
[this page | pdf | back links]
HTML is the
main ‘markup’ language used for web pages and web applications. By a (digital)
markup language we mean a way of creating and interpreting a digital document
in which the document contains tags (and their attributes) that the software
rendering the document interprets in a specific way (but with the tags
themselves and their attributes not typically directly appearing in the output
transmitted to the user). In what follows we will describe how this concept
works with documents concentrating on textual output, although the same concepts
are also applicable to documents containing other types of material (such as
pictures or sounds).
There are many different
mark-up languages used in different contexts. For example, LaTeX (and TeX, the
underlying mark-language on which LaTeX is based) is a tool for preparing
mathematically orientated documents. It uses the backslash character (“\”) and
braces (“{” and “}”) to tell the software rendering the document that relevant
text needs to be interpreted in a specific manner. Text of the form “E &=
\frac{mc^2}{\sqrt{1-\frac{v^2}{c^2}}}” is rendered by a TeX viewer roughly
along the lines of the following:

Here the
“\frac{numerator}{denominator}” tells the software to render the text formed by
the numerator and the denominator as a fraction, and \sqrt{argument} tells the
software to render the text formed by the argument as a square root. Markup can
be nested.
Certain features are
shared by virtually all digital mark-up languages, including HTML. These are:
(a)
The mark-up language needs to be specified and interpreted in a
consistent fashion. This is harder to arrange than it looks for languages that
develop through time, since the equivalent of different dialects can then be
created.
At the
time of writing, the latest formally adopted version of HTML is HTML 4.01
although the World Wide Web Consortium (W3C) issued HTML 5 as a formal
recommendation in October 2014 and has also developed a parallel XML based
language, XHTML 5.1. XML stands for “eXtensible Mark-up Language”. Most leading
browsers will interpret an HTML document using HTML 5 conventions, but some
older browsers may not. Modern browsers can be instructed to use older versions
of the language if necessary by including a suitable document-level tag. HTML 4
itself comes in three different versions, i.e. Strict, Transitional and
Frameset. These loosely-speaking correspond to how closely the document adheres
to the specific requirements of HTML 4.
(b)
The language generally needs to be able to nest tags within other tags.
This requires the language to have the concept of opening a tag and then
closing it, with the text in-between the opening and closing elements being
interpreted in a specific manner. With TeX, the nesting process makes use of
open and close braces (“{” and “}” respectively). With HTML, tags (more
commonly called ‘elements’) generally take a form akin to <xxx> … </xxx>, where the <xxx> opens the tag, the </xxx> closes the tag and the xxx represents the type of tag
involved. More sophisticated tags take the form:
<xxx yyy> … </xxx>
where
the yyy defines the tag’s
attributes, i.e. provides added information on (i.e. attributes for) the
element / tag.
For
example, any text in a webpage between an opening <script> tag and the corresponding closing </script> is generally
interpreted as JavaScript code. Any text between an opening <a> and a closing </a> is the text used when
rendering a hyperlink. The address of the document to which the hyperlink
points is included as an element attribute, e.g. the full tag might involve:
<a
href=“http://www.nematrian.com/Introduction.aspx”> Introduction to Nematrian
website </a>).
Some
mark-up languages such as XML require all opened tags to be explicitly closed,
e.g. with any <x>
ultimately closed by a </x>
(or in XML it is possible to open and close a tag at the same time, using a
format such as <x />).
Others, like HTML, do not require this convention, if the tag never contains
anything. For example, in HTML the tag <br>
means insert a carriage break, i.e. start a new line, and does not need to be
followed by a </br>.
NAVIGATION LINKS
Contents | Prev | Next