Author:
Blog About:
So this error pops up for people every once in a while, and I always forget to make a post about it. I often forget it's even a thing and spend hours trying to diagnose posting issues when it happens, only to remember, "Oh yeah, that's a thing."
The version of Drupal and the Database system we are currently using can't understand EMOJI! The charset doesn't allow for it.
We are working on fixing this with some patchwork as well as general upgrades to the system that will fix it, and it's only become an issue as more and more people are posting from phones and devices with built-in emoji keysets. But if you ever try to post something and it lets you preview and still it dies when you "submit," check it for emoji (or possibly other special characters, but emoji are the usual culprit) because that's likely what's causing you issue. Preview only previews the text and never submits it to the database. It's not till you save/publish that it hits the database, and the database says, "what the fuck is that?" and then you get a 503 Service Unavailable or Guru Meditation error.
So, for now, please avoid emojis, and I promise I will work on fixing this!
-Piper
Comments
I prefer
I prefer to simply use ASCII emojis. ;o) Like we did in the early days.
Hugs
Patricia
Happiness is being all dressed up and HAVING some place to go.
Semper in femineo gerunt
Ich bin eine Mann
Using entity encoding
With entity encoding emojis should work fine as the database (and the rest of Drupal) only gets to see plain ASCII characters.
So you can use 😀 to get đ - more codes can be found in the "Full Emoji List" at unicode.org:
https://unicode.org/emoji/charts/full-emoji-list.html
Download
I clicked on it to see the list. There were 1910 listed. It downloaded at a lightning 80 Kbs.
Yes! Kbs. Reminded me of the good old days
Rigid
Hi, that post shows to me the code result as a blank square icon, is that what it is supposed to be? (pc screen Chrome W7)
People send me emojis in texts from their smart phones, but if those go to my little Nokia it shows the same blank square....
Teri Ann
"Reach for the sun."
Little Rectangles
I've had that problem on here for years. I would get a full story text listing (instead of a chapter by chapter download) and when I'd try to see the entire story I'd have those little empty rectangles everywhere. At that time there weren't any knowledgeable people here that could explain what was happening or how to fix it. Several stories ended up scrapped as a result. Note: these stories didn't have emoji so it was something else in the text that, when converted to plain text, screwed everything up. Some other characters would come out as garbage characters also.
Since that sounds like the same problem you are having, perhaps I shall learn something from your misfortune.
KJT
"Life is not measured by the breaths you take, but by the moments that take your breath away.â
George Carlin
Character encoding is the culprit
The windows operating system has held out for the longest time from using the UTF-8 encoding as the default standard for all document files. Thus you often had different code pages or encodings for different languages and sometimes even for different variants of the same language. Because different countries need different symbols for their currency.
Case in point: More often than not the windows in the UK used a different encoding than the windows in the USA. Because the UK users needed the ÂŁ (Pound Sterling) symbol for their currency.
Why is this happening? Glad you asked.
Back in the 1950s to 1970s when the electronic computers were developed the first punch cards used a seven binary bit encoding for control, letters, digits and punctuation. Using binary math that gives a range from 0 to 127. And these 128 character codes are known as the ASCII character set.
But pretty soon the computers got âstandardizedâ on 8-bit bytes. Which can store up 256 discreet values. So there were another 128 code points available for more characters. And this is where it gets messy and confusing, because each country defined their own encodings in the 128 to 255 range. So one country would define the code point 192 to be Ă, while another would have Ă, a third would have ÂŁ and other would leave it undefined.
When the 1990s and the Internet rolled around, the exchange of digital computer documents across country and language borders started to become more common. And all of a sudden recipients were seeing weird garbage instead of clean text. (Who can still remember the âgood old daysâ of e-mail exchange when foreign correspondents used their accustomed accents and make the message hard to read for you?) Not to mention the languages with non-roman scripts. That was when Unicode was initially conceived. They started out with 2-byte (16 bit) characters, but quickly realized that that was still not enough code points. Especially with the Japanese, Korean and Chinese characters. So they expanded to 4-byte (32 bit) characters.
But the developers soon realized that about 70% of all text data was wasting a lot of precious space. Remember, at that time a 250MB hard disk was massive in size and hugely expensive! So by the late 1990s the Unicode Transformation Format â 8-bit or UTF-8 encoding was established as the default standard on the Internet. However Microsoft did not support UTF-8 in their internal interface and documentation until May 2019.
The benefit of UTF-8 is that it encodes each character in either 1, 2, 3 or 4 bytes. So if your text is mostly in English or west European languages you are not wasting a lot of storage space. But that is also the reason why you sometimes get those weird multiple characters in place of accented letters, special symbols or even emojis. Because the author saved the document in their local encoding instead of in UTF-8.
Especially on older systems or with older fonts you might see a white rectangle with a question mark to indicate that the system is unable to render that specific character. Meanwhile the black diamond with a question mark (ďż˝) indicates that there is a problem with the encoding and the âcharacterâ has an invalid code point.
Sorry for geeking out on you. But I had just recently performed a deep dive into the whole character encoding mess history, and just wanted to share it with somebody.
Gift set of rakes in grass.
For certain languages, encodings were even more "fun". There were 5(?) different cyrillic encodings. One on old apple machines (MacCyrillic), two different ones on DOS (cp866, and one more. cp866 was much more popular), One that commercial unix systems used (ISO8559-5), the one that non commercial unix-like systems used (KOI8-R), The one that windows used before Unicode (cp1251).
In windows, that joy is multiplied - you have 3 different encodings (cp866 in DOS, cp1251 in old windows apps, pre-unicode, unicode in new apps). It is whole set of rakes in grass and small interoperability nightmares.
PS. before UTF-8, windows internally used 16-bit version, UTF-16 (as far as I know - incompatible).