Add support for Unicode surrogate characters and planes above zero. #213

Merged
MicheleC merged 1 commits from feat/unicode-planes-support into master 1 month ago
Owner

This adds support for emoticons and unicode planes above 0, but requires using a font that support such characters.
The changes are internal and can be backported to R14.1.x.
A more complete solution involves searching for an alternative suitable font when unsupported characters are encountered. This requires API changes and therefore will only be part of R14.2.0 and will be addressed in subsequent PRs.

This adds support for emoticons and unicode planes above 0, but requires using a font that support such characters. The changes are internal and can be backported to R14.1.x. A more complete solution involves searching for an alternative suitable font when unsupported characters are encountered. This requires API changes and therefore will only be part of R14.2.0 and will be addressed in subsequent PRs.
MicheleC added this to the R14.1.4 release milestone 1 month ago
MicheleC added 1 commit 1 month ago
8bf8b1d9cc
Add support for Unicode surrogate characters and planes above zero.
If the selected font supports the required characters, the text will be displayed correctly.
If the selected font does not support such characters, empty boxes will be displayed in place of the expected text.

Signed-off-by: Michele Calgaro <michele.calgaro@yahoo.it>
MicheleC requested review from Core 1 month ago
MicheleC requested review from Owners 1 month ago
MicheleC force-pushed feat/unicode-planes-support from 8bf8b1d9cc to e0a38072cf 1 month ago
Poster
Owner

Solved TDE/tde#218 and partially addresses TDE/tqt3#162.

Solved TDE/tde#218 and partially addresses TDE/tqt3#162.
SlavekB approved these changes 1 month ago
SlavekB left a comment
Owner

I did a test building on R14.1.x and everything seems to be fine.
I do not see side effects, I see unicode smileys in Kopete history.

I did a test building on R14.1.x and everything seems to be fine. I do not see side effects, I see unicode smileys in Kopete history.
MicheleC merged commit e0a38072cf into master 1 month ago
MicheleC deleted branch feat/unicode-planes-support 1 month ago
Collaborator

I'm a bit late but I can confirm it works with KWrite.

  • First screenshot is using default monospace font. Emoji does not show indeed but the correct number of tofu boxes is shown.

  • Second screenshot is using the Noto Color Emoji font.

There seems to be a bug, it seems like the editbox still treats the emoji as two characters (I assume this is due to the character not being treated as a surrogate pair). The first one renders as the emoji correctly and the second one renders as a white space. You can highlight any one of those two characters but then the emoji disappears unless you highlight both. Pressing backspace deletes the second character first and then the first one is still left and renders itself as white space.

I assume this bug is related to the widget as Konqueror seems to render the emoji correctly. Konqueror also seems to treat it as two characters but somehow renders it correctly as one?

I'm a bit late but I can confirm it works with KWrite. - First screenshot is using default monospace font. Emoji does not show indeed but the correct number of tofu boxes is shown. - Second screenshot is using the Noto Color Emoji font. There seems to be a bug, it seems like the editbox still treats the emoji as two characters (I assume this is due to the character not being treated as a surrogate pair). The first one renders as the emoji correctly and the second one renders as a white space. You can highlight any one of those two characters but then the emoji disappears unless you highlight both. Pressing backspace deletes the second character first and then the first one is still left and renders itself as white space. ~~I assume this bug is related to the widget as Konqueror seems to render the emoji correctly.~~ Konqueror also seems to treat it as two characters but somehow renders it correctly as one?
17 KiB
17 KiB
Poster
Owner

Hi Philippe,
thanks for testing and for the valuable feedback. It definitely seems there is more work to do as you highlighted.
For info, here is how surrogates work at the moment:

  • they are represented as two TQChar, so a string containing a single emoji will consist of two TQChar and the length will be 2. This is the same in Qt4.
  • when rendering a surrogate character, the font engine checks whether we have a high surrogate TQChar and if so, it checks if the following TQChar is a low surrogate. If this happens, the two TQChar are represented as a single glyph (for example an emoji).

I suspect we will need some more changes to handle:

  • deletion of a surrogate character should remove two TQChars from a string. This should fix the problem that currently shows up when pressing backspace in the editor right after an emoji
  • the widget displaying the text is clearly displaying things in a wrong way now, so likely there is some logic that needs to be changed there.

I will be looking at both issues in coming days.

Hi Philippe, thanks for testing and for the valuable feedback. It definitely seems there is more work to do as you highlighted. For info, here is how surrogates work at the moment: - they are represented as two TQChar, so a string containing a single emoji will consist of two TQChar and the length will be 2. This is the same in Qt4. - when rendering a surrogate character, the font engine checks whether we have a high surrogate TQChar and if so, it checks if the following TQChar is a low surrogate. If this happens, the two TQChar are represented as a single glyph (for example an emoji). I suspect we will need some more changes to handle: - deletion of a surrogate character should remove two TQChars from a string. This should fix the problem that currently shows up when pressing backspace in the editor right after an emoji - the widget displaying the text is clearly displaying things in a wrong way now, so likely there is some logic that needs to be changed there. I will be looking at both issues in coming days.
Poster
Owner

I did some investigation on Qt4/KDE4 and Qt5/KDE5 and the situation is as follow.

  • Qt4/KDE4: deletion of a surrogate char only deletes the low surrogate QChar. The high surrogate is then rendered as a tofu box. Selection of characters with mouse or keyboard works fine.

  • Qt5/KDE5: deletion of a surrogate char correctly removes both QChar and no tofu boxes are displayed. Selection of characters with mouse or keyboard also works fine.

Looking at the code, QString does not do any special handling, which means it is possible to delete the low surrogate without deleting the high surrogate if someone wants to do it. Instead the logic for selection/deletion seems to be handled by the widget through the use of the charStop (Qt4) or graphemeBoundary (Qt5, see here) in the QCharAttributes struct.
TQt3 also has such structure, so it seems to me the right approach is to fix the widget classes without modifying TQString logic on char deletion.

What do you think?

I did some investigation on Qt4/KDE4 and Qt5/KDE5 and the situation is as follow. - Qt4/KDE4: deletion of a surrogate char only deletes the low surrogate QChar. The high surrogate is then rendered as a tofu box. Selection of characters with mouse or keyboard works fine. - Qt5/KDE5: deletion of a surrogate char correctly removes both QChar and no tofu boxes are displayed. Selection of characters with mouse or keyboard also works fine. Looking at the code, QString does not do any special handling, which means it is possible to delete the low surrogate without deleting the high surrogate if someone wants to do it. Instead the logic for selection/deletion seems to be handled by the widget through the use of the `charStop` (Qt4) or `graphemeBoundary` (Qt5, see [here]) in the `QCharAttributes` struct. TQt3 also has such structure, so it seems to me the right approach is to fix the widget classes without modifying `TQString` logic on char deletion. What do you think? [here]: https://codebrowser.dev/qt5/qtbase/src/widgets/widgets/qwidgetlinecontrol.cpp.html#_ZN18QWidgetLineControl3delEv
Collaborator

This indeed looks like an acceptable solution to me.

This indeed looks like an acceptable solution to me.
Poster
Owner

This indeed looks like an acceptable solution to me.

Ok, I will need some time to study the code related to TQCharAttributes, so it may take a while to get it fixed, but I will proceed in this direction.

> This indeed looks like an acceptable solution to me. Ok, I will need some time to study the code related to TQCharAttributes, so it may take a while to get it fixed, but I will proceed in this direction.
Poster
Owner

Ok, I will need some time to study the code related to TQCharAttributes, so it may take a while to get it fixed, but I will proceed in this direction.

PR #217 fixes editing in TQt3, for example TQLineEdit and TQTextEdit. KEdit also benefits from this PR, while KWrite and Kate are likely to need more work in tdelibs/tdebase.

> Ok, I will need some time to study the code related to TQCharAttributes, so it may take a while to get it fixed, but I will proceed in this direction. PR #217 fixes editing in TQt3, for example TQLineEdit and TQTextEdit. KEdit also benefits from this PR, while KWrite and Kate are likely to need more work in tdelibs/tdebase.

Reviewers

SlavekB approved these changes 1 month ago
TDE/Core was requested for review 1 month ago
The pull request has been merged as e0a38072cf.
Sign in to join this conversation.
No reviewers
TDE/Core
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: TDE/tqt3#213
Loading…
There is no content yet.