Add surrogate pairs support for TQChar and TQString #162

Open
opened 8 months ago by obache · 5 comments
Collaborator

Basic information

  • TDE version:
  • Distribution:
  • Hardware:

Description

TQChar class uses 16-bit ushort to represent its character code (UCS-2).
But Unicode != 16-bit since Unicode 2.0 and over 16-bit chars are really
used (Supplementary Planes) since Unicode 3.1.

There are two way to support Supplementary Planes:

  1. change TQChar to use 32-bit code (UTF-32)
  2. keep TQChar to use 16-bits, and use surrogate pairs for Supplementary Planes (UTF-16)

TQt3 contains QT_QSTRING_UCS_4 switch. In addition to ucs, it introduce grp
16-bits member (probably Plane number). But it seems that such support code is incompleted, and it will introduce binary incompatibility.

We can manually create two TQChar objects using surrogate pairs and create
TQString obejct with conjunction of such two TQChar.
But TQString will recognize it as "Two characters" and not handle correctly.

Qt4 and later introduce surrogate pair support to QChar and keep it as 16-bits.
So TQt3 should also support surrogate pairs.

Steps to reproduce

  1. try to display chars in Supplementary Planes, like "Emoji" 😀
  2. it will be displayed as two tofu chars

Screenshots

<!-- This is a comment. Please fill in the required fields below. The comments provide instructions on how to do so. Note: You do not need to remove comments. --> ## Basic information - TDE version: <!-- such as R14.0.12 - see tde-config -v --> - Distribution: <!-- such as Debian Bullseye - see lsb_release -sd --> - Hardware: <!-- amd64 / i386 / ppc64el / armhf / ... --> <!-- Use SL/* labels to set the severity level. Please do not set a milestone. --> ## Description `TQChar` class uses 16-bit `ushort` to represent its character code (UCS-2). But Unicode != 16-bit since Unicode 2.0 and over 16-bit chars are really used (Supplementary Planes) since Unicode 3.1. There are two way to support Supplementary Planes: 1. change `TQChar` to use 32-bit code (UTF-32) 2. keep `TQChar` to use 16-bits, and use surrogate pairs for Supplementary Planes (UTF-16) TQt3 contains `QT_QSTRING_UCS_4` switch. In addition to `ucs`, it introduce `grp` 16-bits member (probably Plane number). But it seems that such support code is incompleted, and it will introduce binary incompatibility. We can manually create two `TQChar` objects using surrogate pairs and create `TQString` obejct with conjunction of such two `TQChar`. But `TQString` will recognize it as "Two characters" and not handle correctly. Qt4 and later introduce surrogate pair support to QChar and keep it as 16-bits. So TQt3 should also support surrogate pairs. ## Steps to reproduce 1. try to display chars in Supplementary Planes, like "Emoji" 😀 2. it will be displayed as two tofu chars ## Screenshots <!-- If it seems useful, please provide provide one or more screenshots. -->
obache added the
SL/wishlist
label 8 months ago
Owner

Hi Obata-san, this is indeed a good idea. We would have to explore what is the best way to do this, whether backport from Qt4 or whether do some local developed solution. Probably keep TQChar at 16 bits and implement surrogate pairs seems to be the favorable way to go.
In any case it would probably go for R14.2.0, since this would be quite a change.

Hi Obata-san, this is indeed a good idea. We would have to explore what is the best way to do this, whether backport from Qt4 or whether do some local developed solution. Probably keep TQChar at 16 bits and implement surrogate pairs seems to be the favorable way to go. In any case it would probably go for R14.2.0, since this would be quite a change.
Collaborator

I have been digging into the relevant code on numerous occasions. Surrogates shouldn't be hard to implement, but in the relevant Qt code there has been a very big refactoring of QString and QChar, so we can't just backport the relevant changes.

I have been digging into the relevant code on numerous occasions. Surrogates shouldn't be hard to implement, but in the relevant Qt code there has been a very big refactoring of QString and QChar, so we can't just backport the relevant changes.
Owner

PR TDE/tqt3#213 adds initial support for surrogate characters and planes above 0, but requires the font to support such characters. If fonts that do not support those characters are used, empty boxes will be displayed (in the correct number now). This will be backported to R14.1.x too.

A more extensive solution requires several more changes:

  • extension of TQChar/TQString classes
  • rework of unicode table support
  • searching for a suitable alternative font if an unsupported character is enountered. This also requires splitting a string in more script items, so that each item is drawn with a unique font.

Therefore this it will go into R14.2.0.

PR TDE/tqt3#213 adds initial support for surrogate characters and planes above 0, but requires the font to support such characters. If fonts that do not support those characters are used, empty boxes will be displayed (in the correct number now). This will be backported to R14.1.x too. A more extensive solution requires several more changes: - extension of TQChar/TQString classes - rework of unicode table support - searching for a suitable alternative font if an unsupported character is enountered. This also requires splitting a string in more script items, so that each item is drawn with a unique font. Therefore this it will go into R14.2.0.
MicheleC added this to the R14.2.0 release milestone 1 month ago
Owner

PR #216 adds methods for surrogate pair support to TQChar. More work is required for full resolution of this issue, this will come in future PRs.

PR #216 adds methods for surrogate pair support to `TQChar`. More work is required for full resolution of this issue, this will come in future PRs.
Owner

PR #217 (still WIP) addresses the issues with editing text that contains surrogate pairs.

PR #217 (still WIP) addresses the issues with editing text that contains surrogate pairs.
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: TDE/tqt3#162
Loading…
There is no content yet.