Tiffany Shar's Bears Know Best

A word from our sponsor:

1200-320-max.jpg
Printer-friendly version

Author: 

Taxonomy upgrade extras: 

I cleaned up Tiffany Shar's Bear's Know Best story from all of those little OSC squares from 2010-2012 when BCTS was NOT as stable as it is now. Also, I removed the lulu.com links and placed the amazon store page address there for you to purchase from as well. The author will have to change the included text for ea chapter to reflect this as well.

Man this was a heck of a lot of work. Removing those squares requires a special program and constant back and forth of copy/paste, copy/paste.

Sephrena

Comments

Thank you!

Thank you Sephrena! I guess I haven't looked through it for a while, and hadn't seen those. I appreciate everything! (hug)

Culprit

Teek's picture

So I found the culprit that brought all those Bears Know Best chapters to the top of my Tracker. Thank You. I haven't read that story for probably a decade and now I get to go back and read it.

Keep Smiling, Keep Writing
Teek

Character corruption in older postings

Thank you, Sephrena, for your long efforts behind the scenes to clean up the “character corruptions” in older posts.

These character corruptions are a legacy of a decision made very early on in the development of electronic computers. Unfortunately most of the designers of electronic computers were based in a country with a language dialect and culture that is rather more of the “everybody should speak and do as we do” variety. Because of that, and with the USA English having only 26 letters in its alphabet, the initial 7-bit character encoding (with 128 code points) was deemed more than sufficient. With both upper and lower case letters, plus the decimal numeric characters, you use up 62 of those code points. That supposedly left more than enough code points for punctuation, control characters and some special use characters.

But the official German alphabet has 30 letters, with 29 also having an upper case variant. The official Spanish alphabet even has 33 letters in both upper and lower case. And only one of those “extra” letters in common with German. Not to mention that French (as far as I know) has at least 37 letters, while Portuguese has at least 38 letters. Not to mention the additional letters used in the Nordic, Scandinavian or Slavic languages. So it turns out that very soon you actually required at least another 28 letters for the western European languages.

The computer industry quickly standardized on the 8-bit character or Byte, that has 256 code points for character encoding. The lower seven bits (code points 0 to 127) have universally become known as the ASCII encoding. While the upper 128 code points initially became a free for all to define their own special characters. For example the British English had virtually no use for the Dollar symbol »$« but desperately needed the Pound symbol »£«. Thus emerged many incompatible character encoding tables. Not to mention the incompatibilities caused by proprietary encodings used by some hardware manufacturers such as IBM and Apple at that time.

By the 1980s serious efforts to standardize the character tables were underway to reduce the multitude to a manageable level. But even so many stakeholders insisted [at least initially] on their own interpretation (aka variation) of the standards. So we had the IBM PC with 16 distinct “Code Pages” and Microsoft with 9 distinct “Character Sets” alongside the ISO-8859 effort with 15 distinct encoding tables. By the mid 1990s the ISO-8859 encoding started to prevail, mainly due to user demand for data exchange compatibility.

But with the emergence of the Asian economies, it quickly became clear that an 8-bit character encoding was insufficient. And with the need of implementing “wide character” handling of text (mainly for Chinese, Japanese and Korean), some visionaries in the early 1980s had started working on a 16-bit universal character encoding that could be used for any and all world languages. But the Unicode Consortium established in January 1991 quickly realized the need for more code points to cover not only a multitude of languages but also scripts. So by 1996 the Unicode Standard had widened to 32-bit, which enables over 4.000.000 code points.

By the turn of the century Unicode with the UTF-8 encoding had become the virtual industry standard. But it took until the early 2010s for the older encoding standards to disappear from general use as older hardware and software became obsolete.

As far as I know, WinXP (released in October 2001) was the first operating system by Microsoft that used the Unicode Standard for all internal character handling. Thus marking the start of Unicode UTF-8 becoming the de-facto standard for information interchange on the Internet and the Information Technology industry in general.

So unless some authors are still using some truly Jurassic software, the “problem“ of character encoding corruption should have disappeared by the early to mid 2010s. But we all know that in practice theory and practice are further apart than in theory.