The ASCII character encoding
may now be somewhat obsolete,
but its influences on computing
There are also
interesting things to learn
about how it was designed.
All of this, of course,
is not at all new.
ASCII is now over fifty years old,
but I had never before given it much thought.
When I actually looked into it,
I was pleasantly surprised.
This is a classic ASCII code chart,
which is laid out in a way
that shows many of its design niceties.
An obvious property
is the overall ordering of the characters.
The characters are laid out such that
common separators appear first,
e.g. space, hyphen, full stop and slash,
followed by numbers,
and finally minuscule letters.
This layout means that
ASCII strings can easily be sorted
by comparing their numerical values.
The resulting order is called ASCIIbetical.
Hurray for lazy programmers!
ASCII for humans
The layout of the chart also reveals
how human-readable ASCII can be.
By looking at the first
two or three most significant bits,
many of the characters become obvious.
Values starting with
represent the digit character
corresponding to the last four bits.
0110011 is the character “3”.
Values starting with
the shifted symbol corresponding to the digit.
This was based on the layouts of electric typewriters at the time,
so doesn’t hold up as much today.
It is still accurate for 1, 3, 4 and 5 on US QWERTY layouts.
0100011 is the character “#”.
Values starting with
represent the capital letter at
the position in the alphabet of the last 5 bits.
1000011 is the character “C”.
Similarly, values starting with
represent the minuscule letter at
that position in the alphabet.
1100011 is the character “c”.
There are also other aligned shifted characters,
which have their third most significant bit flipped.
| also correspond to
with their second most significant bit flipped.
This human-readability would have been
much more useful than it is now
back when ASCII text was often stored on punched paper tape.
Paper tape is also the reason
there is one control-code
that doesn’t appear in the first 32 positions
with the rest.
At position 127,
DEL is represented in binary as
On paper tape,
this means that any previously punched character
can be replaced with
DEL by punching the remaining holes.
This clever bit of design means that
characters can be “erased” even when represented physically.
Although this clever design is no longer relevant,
there are some less clever decisions that have had
a lot of influence.
The caret, backtick, and tilde characters
were included in ASCII as
a sort of work-around,
intended to be used as diacritics
rather than including accented letters themselves.
But even though we now have proper accented characters
thanks to Unicode,
the ASCII diacritics are still commonly used in programming,
and are still standard on keyboard layouts.
I sometimes wonder if a non-programmer
has ever purposely typed a backtick.
By far the most interesting thing
I discovered when reading about ASCII
was the origin of caret notation
and many common terminal key sequences.
For example, the control-C sequence, or
On old systems,
the control key was used literally to enter control-codes.
The way this worked was
that each character in the
(the capital letters and some punctuation),
corresponded to a control-code in the
These sequences were represented with caret notation,
SOH, and so on.
1111111 was represented by
This means that the control-code
for a key could be calculated by
inverting the most significant bit.
Most of the ASCII control-codes are no longer relevant,
but some of the key sequences
are still very common in the terminal.
The most common of these is,
used to interrupt program execution.
The corresponding control-code is
end of text.
which indicates end-of-file,
EOT, end of transmission, in ASCII.
Another somewhat common sequence is
for suspending processes.
This comes from the control-code
Many terminals still clear the screen on
which corresponds to
FF, form feed.
This originally caused printers to load the next page.
Probably less well known
are the sequences
which can be used to pause and resume output.
In ASCII, these were simply
two of the four device-specific control-codes.
The Teletype Model 33 implemented these
which turned the tape reader off and on.
This definition stuck around.
Those are basically what I found
to be the interesting parts of
the good old 7-bit character encoding
from a modern perspective.
It might not be useful information day-to-day,
but it hopefully makes for fun trivia.