Someone wrote in [personal profile] t_fischer 2019-08-10 08:10 am (UTC)

UTF-16

Agreed, Qt uses UTF-16 but can still encode all possible codepoints. After all, UTF-8 operates bytewise and can also still cover the full range.

For some more reading, see:
* https://utf8everywhere.org/
* https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/
* https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings
* https://stackoverflow.com/questions/496321/utf-8-utf-16-and-utf-32

Unfortunately, Qt's String APIs are really bad in the sense that e.g. `length` doesn't return the number of codepoints, it returns the number of 2-byte blocks; QChar cannot represent any codepoint (it's just 2 bytes wide); and so on. This is similar to JavaScript and means many things break when strings contain codepoints that need 4 bytes in UTF-16 due to a surrogate pair (e.g., emoji). I really hope they fix this mess with some future Qt version, but it does not seem to be on their roadmap. :(

Post a comment in response:

If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org