Current rant of the month, which might become a talk or article if it matures: The history of encoding messages and the ways we transmit them are intertwined in really fascinating ways, touching on linguistics and human travel and the discover of electricity and computers.
Case in point: a fascinating quality of the Hangul language (explicitly designed to build literacy by Sejong the Great, nerd king among nerd kings) is that it explicitly maps letters to pronounced phonemes instead of concepts.
That made the boundary between “encoding meaning into words” and “encoding words for transmission via paper/etc” more explicit, and reduced the set of symbols learners needed to memorize. Conveniently, it made Hangul a better fit for movable type than many neighboring languages.
Jumping around in time a bit, the idea of using puffs of smoke or torches in the dark to send messages over long distances has been with us for a looooong time — but most systems used specific colors or patterns to represent specific messages (enemies incoming, all is well, etc)
The Greek historian Polybius came up with a set of positions torches could be held in that, taken together, could “encode” the entire Greek alphabet. Suddenly, pre-arranged signals were a convenience, not a necessity. You could signal anything, limited only by the time it took.
With the arrival of electricity, telegraph shorthand followed similar patterns, eventually producing standards like Morse code. Once we got past punch cards, computers handled human I/O by sending electronic pulses to teletype terminals…
…But different manufacturers bolted their own extra codes and symbols onto the bare minimum for the Roman alphabet, leading to the development of standards like ASCII. “ASCII plaintext” became a useful baseline for cross-computer transmission of … well, anything.
The idea of ‘plain text’ was the baseline for markup languages like SCRIPT, developed at IBM in the 60s to simplify document formatting. Instead of embedding invisible codes to represent formatting changes, it used ASCII right in the doc, like .ce
for ‘center this line’.
That led to arguments about what KINDS of ASCII codes should be used in the document — formatting and presentation stuff, or structural notes about the meaning of the document that could be mapped to formatting later? Sound familiar? That’s how “GML” was born in 1969.
GML — Generalized Markup Language — was just a sec of macros for SCRIPT that let document editors prefix lines of text with codes designating their structural purpose. :p. for paragraphs, :ol. for ordered lists, :li. for list items… spooky music plays
GML eventually gave way to SGML, the STANDARD Generalized Markup Language, which among other things replaced the fussy colon-letters-period format of the codes with the now-familiar angle bracketed <tags, <closures/>, and <tag properties=“stuff”>…
At this point, you can probably see the roots of HTML (which was originally defined as a particular type of SGML document) and XML (which was an attempt to simplify SGML’s… wild feature set and make it easier to parse and process, among other things)
Now, where does this squirrelly digression bring us? A lot of the challenges we face today are (IMO at least) either due to or exacerbated by language mismatches; using a content or design tool for a task that its inherent ‘vocabulary’ is a poor match for.
HTML, remember, wasn’t designed as a generalized markup language to describe any kind of content. It was a description of one particular kind of document you could produce with SGML. It encoded the broad conventions of cross-referenced academic papers.
As it grew in popularity it was extended with new tags to support things like images, tables, and so on — but its fundamental nature as “a way of markup up a particular kind of document” remained. Something like (say) an image gallery has no direct semantic representation.
So you get statements like “Image galleries should be ordered lists” which are less about actual meaning, and more about trying to carve out agreements about how to say certain things in a given language without adding new words. i.e. 99% of "semantic markup’ arguments in the 00s
And huge swaths of design work that was implemented in HTML during the pre-CSS era suffers from the tension. An old school book like “Making Killer Web Sites” is the markup equivalent of writing a song in French that, when sung, rhymes with what you want to say in German.