Weird character in some posted stories

A word from our sponsor:

Printer-friendly version

Author: 

Blog About: 

I sometimes download stories so I can read them even when I don't have internet access, and I noticed that some of them have the Unicode character #009d scattered through them. Usually it's after a close double quote (#201d).

According to the table in the Wikipedia article on Unicode code points 0-0FFF, this is the control character "Operating System Command."

I've just been deleting them, since they show up as some sort of ugly "not in font" glyph. But does anyone know why those characters are there? It's only some posts, and sometimes only some posts in a series. My inclination is to blame MicroSloth, since a number of people mention using MS Word, and MS is responsible for all manner of abominations, but I really don't know. (Maybe MS hopes that on some non-MS systems, displaying it will reformat the disk? ☺)

Comments

Turn off smart quotes

You really don't need them on-line. Microsoft, in its infinite wisdom (well, infinite something anyway) decided that everyone who uses Word should have the bejazus formatted out of everything they write whether they wanted it or not.

In practice only the Marketing Department would even notice that you used smart quotes or not. On-line, when you don't even know the typeface the end reader has installed, you have no idea how the things will be rendered. Much better to go with default ASCII double quotes which are rendered and recognised all over the world.

Penny

Nothing to turn off

Neither my browser nor the BigCloset website seem to have anything to turn on or off for "smart quotes" (whatever they are.) I get whatever BigCloset sends me, and some stories seem to have plain ascii quotes and some have the left and right variety (Unicode #201c #201d) It's pretty clearly coming from the original posting. And I obviously have no control over what the authors use.

The question wasn't about the left and right quote characters, which I actually like (when they're the correct ones!), it was about this apparently unnecessary control character that litters the page.

BTW, long ago, when I had access to MS-Word (at work), I tried saving something as HTML, and the HTML that MS-Word produced was so ugly and bloated that ever since then, I've just written my web pages directly in HTML using a text editor (e.g., notepad.)

Its a feature

crash's picture

Smart quotes are a feature of the editor you are using to prepare your text for posting. Turning it on or off will be done there rather than in your browser or in the website.

Silly technical details like this can be a royal pain in the petutie.

I love your work
Crescenda

AKA

Your friend
Crash

There was a time

Patricia Marie Allen's picture

There was a time the word did OK in producing HTML. When I did my web site (back in the 80s or 90s) I was using Word and did my web pages with their HTML feature. Since then, I've had two upgrades of Word and the HTML it produces now is nearly unreadable on my browser. You have to be pretty fluent in HTML to even clean it up so that it will display in a decent format.

Note to Microsoft. IF IT AIN'T BROKE; DON'T FIX IT.

Hugs
Patricia

Happiness is being all dressed up and HAVING some place to go.
Semper in femineo gerunt

Word and HTML (sort of)

Using Word to generate HTML is painful, once you look at the result. I once had a need to use MS Word to create .rtf files and I was astonished by the appalling amount of micromanagement Word did on the files, ostensibly to "preserve" the original formatting - which I didn't care about anyway. A 10K text file bloated to several MB, as I recall.

I suspect that .html is handled much the same way, to ensure that the result is rendered identical to what the author sees, even if the result is to be used somewhere else. I have written elsewhere about those who abuse on-line file formats when the author has no idea how the result will be rendered.

However, I do use LibreOffice Writer to write my stories, as it happens. If I select HTML as the original file format and .html as the file extension and save it as such, then there is reasonably little cr*p that has to be cleaned up before posting here. There are caveats - inserting "<<" and ">>" around Norse speech screws with the (poor) formatting defaults and I have to go round and remove all the unintentional junk. 'Replace All' fixes most of them and the rest stand out easy to find.

But it can be done. Just don't go near MS Word, though these days that is easy enough if you don't wished to get sucked into their subscription model of Office 365.

What I'd like to find is a simple, Linux-compatible, wysiwyg HTML notepad-alike editor that I could write in, without having to jump through all the hoops.

Penny

7p's

jacquimac's picture

Or you could apply the 7 P's to microsoft
Proper Prior Planning Prevents Piss Poor Performance.

I hear Windows 11 is out, hope it's better than 10

Appears harmless, but has a scary name.

I dug around for while for both "0009d" and "OSC", and I came up empty, other than it is a UNICODE character ... which you knew. It does have a scary-sounding name ... which you knew. I couldn't find that it 'does' anything.

I can't see that this character is any kind of a threat or problem. It seems perfectly safe to leave in, or to delete, your choice.

As to "why". We have a lot of natural languages, and it's nice for everybody to be able to read, enter, write and print in their own language.

Some codes, especially those less than 128 or 256 (decimal) are control characters. Line feed, end of line, ring a bell, form-feed - things that make sense on a printer, or for controlling a teleprinter (teletypewriter). I'll guess there are some codes to specify printing right-to-left (Arabic and such) or maybe top-to-bottom (Chinese?, Japanese?)

Enjoy the (downloaded) stories!

It's ugly

I presume that the #009d character is in the Unicode standard because someone requested it; they would presumably have some idea what they wanted to use it for. Most of the control characters seem to be intended to be used by various serial communications protocols and don't really belong in a text file.

The thing is, my browser and other display programs render this character as something ugly, which is why I edit the downloaded file to remove it.

MS End Quote

Piper's picture

Having fixed this for many things, it's usually the Microsoft Smart Quotes "end quote" char. They use a diff char for start of quote and end of quote. IT's annoying and I have tried many times to filter them from BC.


"She was like a butterfly, full of color and vibrancy when she chose to open her wings, yet hardly visible when she closed them."
— Geraldine Brooks


Just a vote is not enough.

crash's picture

Just a vote is not enough. I had to comment on how I love your self depricating humor. Almost as much as I love your autobiographical stories.

Keep in a cool dry place.
Crescenda

AKA

Your friend
Crash