From d796c9dd933ab96ec83b9a634feedd5d32e1ba3f Mon Sep 17 00:00:00 2001 From: Timothy Pearson Date: Tue, 8 Nov 2011 12:31:36 -0600 Subject: Test conversion to TQt3 from Qt3 8c6fc1f8e35fd264dd01c582ca5e7549b32ab731 --- doc/html/qtextcodec.html | 611 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 611 insertions(+) create mode 100644 doc/html/qtextcodec.html (limited to 'doc/html/qtextcodec.html') diff --git a/doc/html/qtextcodec.html b/doc/html/qtextcodec.html new file mode 100644 index 000000000..b23455321 --- /dev/null +++ b/doc/html/qtextcodec.html @@ -0,0 +1,611 @@ + + + + + +TQTextCodec Class + + + + + + + +
+ +Home + | +All Classes + | +Main Classes + | +Annotated + | +Grouped Classes + | +Functions +

TQTextCodec Class Reference

+ +

The TQTextCodec class provides conversion between text encodings. +More... +

Almost all the functions in this class are reentrant when TQt is built with thread support. The exceptions are ~TQTextCodec(), setCodecForTr(), setCodecForCStrings(), and TQTextCodec(). +

#include <qtextcodec.h> +

Inherited by TQBig5Codec, TQBig5hkscsCodec, TQEucJpCodec, TQEucKrCodec, TQGb18030Codec, TQJisCodec, TQHebrewCodec, TQSjisCodec, and TQTsciiCodec. +

List of all member functions. +

Public Members

+ +

Static Public Members

+ +

Protected Members

+ +

Static Protected Members

+ +

Detailed Description

+ + +The TQTextCodec class provides conversion between text encodings. + + +

TQt uses Unicode to store, draw and manipulate strings. In many +situations you may wish to deal with data that uses a different +encoding. For example, most Japanese documents are still stored in +Shift-JIS or ISO2022, while Russian users often have their +documents in KOI8-R or CP1251. +

TQt provides a set of TQTextCodec classes to help with converting +non-Unicode formats to and from Unicode. You can also create your +own codec classes (see later). +

The supported encodings are: +

+

TQTextCodecs can be used as follows to convert some locally encoded +string to Unicode. Suppose you have some string encoded in Russian +KOI8-R encoding, and want to convert it to Unicode. The simple way +to do this is: +

+    TQCString locallyEncoded = "..."; // text to convert
+    TQTextCodec *codec = TQTextCodec::codecForName("KOI8-R"); // get the codec for KOI8-R
+    TQString unicodeString = codec->toUnicode( locallyEncoded );
+    
+ +

After this, unicodeString holds the text converted to Unicode. +Converting a string from Unicode to the local encoding is just as +easy: +

+    TQString unicodeString = "..."; // any Unicode text
+    TQTextCodec *codec = TQTextCodec::codecForName("KOI8-R"); // get the codec for KOI8-R
+    TQCString locallyEncoded = codec->fromUnicode( unicodeString );
+    
+ +

Some care must be taken when trying to convert the data in chunks, +for example, when receiving it over a network. In such cases it is +possible that a multi-byte character will be split over two +chunks. At best this might result in the loss of a character and +at worst cause the entire conversion to fail. +

The approach to use in these situations is to create a TQTextDecoder +object for the codec and use this TQTextDecoder for the whole +decoding process, as shown below: +

+    TQTextCodec *codec = TQTextCodec::codecForName( "Shift-JIS" );
+    TQTextDecoder *decoder = codec->makeDecoder();
+
+    TQString unicodeString;
+    while( receiving_data ) {
+        TQByteArray chunk = new_data;
+        unicodeString += decoder->toUnicode( chunk.data(), chunk.length() );
+    }
+    
+ +

The TQTextDecoder object maintains state between chunks and therefore +works correctly even if a multi-byte character is split between +chunks. +

+

Creating your own Codec class +

+

Support for new text encodings can be added to TQt by creating +TQTextCodec subclasses. +

Built-in codecs can be overridden by custom codecs since more +recently created TQTextCodec objects take precedence over earlier +ones. +

You may find it more convenient to make your codec class available +as a plugin; see the plugin + documentation for more details. +

The abstract virtual functions describe the encoder to the +system and the coder is used as retquired in the different +text file formats supported by TQTextStream, and under X11, for the +locale-specific character input and output. +

To add support for another 8-bit encoding to TQt, make a subclass +of TQTextCodec and implement at least the following methods: +

+    const char* name() const
+    
+ +Return the official name for the encoding. +

+    int mibEnum() const
+    
+ +Return the MIB enum for the encoding if it is listed in the +IANA character-sets encoding file. +

If the encoding is multi-byte then it will have "state"; that is, +the interpretation of some bytes will be dependent on some preceding +bytes. For such encodings, you must implement: +

+    TQTextDecoder* makeDecoder() const
+    
+ +Return a TQTextDecoder that remembers incomplete multi-byte sequence +prefixes or other retquired state. +

If the encoding does not retquire state, you should implement: +

+    TQString toUnicode(const char* chars, int len) const
+    
+ +Converts len characters from chars to Unicode. +

The base TQTextCodec class has default implementations of the above +two functions, but they are mutually recursive, so you must +re-implement at least one of them, or both for improved efficiency. +

For conversion from Unicode to 8-bit encodings, it is rarely necessary +to maintain state. However, two functions similar to the two above +are used for encoding: +

+    TQTextEncoder* makeEncoder() const
+    
+ +Return a TQTextEncoder. +

+    TQCString fromUnicode(const TQString& uc, int& lenInOut ) const
+    
+ +Converts lenInOut characters (of type TQChar) from the start of +the string uc, returning a TQCString result, and also returning +the length of the result in +lenInOut. +

Again, these are mutually recursive so only one needs to be implemented, +or both if greater efficiency is possible. +

Finally, you must implement: +

+    int heuristicContentMatch(const char* chars, int len) const
+    
+ +Gives a value indicating how likely it is that len characters +from chars are in the encoding. +

A good model for this function is the +TQWindowsLocalCodec::heuristicContentMatch function found in the TQt +sources. +

A TQTextCodec subclass might have improved performance if you also +re-implement: +

+    bool canEncode( TQChar ) const
+    
+ +Test if a Unicode character can be encoded. +

+    bool canEncode( const TQString& ) const
+    
+ +Test if a string of Unicode characters can be encoded. +

+    int heuristicNameMatch(const char* hint) const
+    
+ +Test if a possibly non-standard name is referring to the codec. +

Codecs can also be created as plugins. +

See also Internationalization with TQt. + +


Member Function Documentation

+

TQTextCodec::TQTextCodec () [protected] +

Warning: This function is not reentrant.

+ + +

Constructs a TQTextCodec, and gives it the highest precedence. The +TQTextCodec should always be constructed on the heap (i.e. with new). TQt takes ownership and will delete it when the application +terminates. + +

TQTextCodec::~TQTextCodec () [virtual] +

Warning: This function is not reentrant.

+ + +

Destroys the TQTextCodec. Note that you should not delete codecs +yourself: once created they become TQt's responsibility. + +

bool TQTextCodec::canEncode ( TQChar ch ) const [virtual] +

+Returns TRUE if the Unicode character ch can be fully encoded +with this codec; otherwise returns FALSE. The default +implementation tests if the result of toUnicode(fromUnicode(ch)) +is the original ch. Subclasses may be able to improve the +efficiency. + +

bool TQTextCodec::canEncode ( const TQString & s ) const [virtual] +

+This is an overloaded member function, provided for convenience. It behaves essentially like the above function. +

s contains the string being tested for encode-ability. + +

TQTextCodec * TQTextCodec::codecForCStrings () [static] +

+ +

Returns the codec used by TQString to convert to and from const +char* and TQCStrings. If this function returns 0 (the default), +TQString assumes Latin-1. +

See also setCodecForCStrings(). + +

TQTextCodec * TQTextCodec::codecForContent ( const char * chars, int len ) [static] +

+Searches all installed TQTextCodec objects, returning the one which +most recognizes the given content. May return 0. +

Note that this is often a poor choice, since character encodings +often use most of the available character sequences, and so only +by linguistic analysis could a true match be made. +

chars contains the string to check, and len contains the +number of characters in the string to use. +

See also heuristicContentMatch(). + +

Example: qwerty/qwerty.cpp. +

TQTextCodec * TQTextCodec::codecForIndex ( int i ) [static] +

+Returns the TQTextCodec i positions from the most recently +inserted codec, or 0 if there is no such TQTextCodec. Thus, +codecForIndex(0) returns the most recently created TQTextCodec. + +

Example: qwerty/qwerty.cpp. +

TQTextCodec * TQTextCodec::codecForLocale () [static] +

Returns a pointer to the codec most suitable for this locale. +

Example: qwerty/qwerty.cpp. +

TQTextCodec * TQTextCodec::codecForMib ( int mib ) [static] +

+Returns the TQTextCodec which matches the MIBenum mib. + +

TQTextCodec * TQTextCodec::codecForName ( const char * name, int accuracy = 0 ) [static] +

+Searches all installed TQTextCodec objects and returns the one +which best matches name; the match is case-insensitive. Returns +0 if no codec's heuristicNameMatch() reports a match better than +accuracy, or if name is a null string. +

See also heuristicNameMatch(). + +

TQTextCodec * TQTextCodec::codecForTr () [static] +

+ +

Returns the codec used by TQObject::tr() on its argument. If this +function returns 0 (the default), tr() assumes Latin-1. +

See also setCodecForTr(). + +

void TQTextCodec::deleteAllCodecs () [static] +

+Deletes all the created codecs. +

Warning: Do not call this function. +

TQApplication calls this function just before exiting to delete +any TQTextCodec objects that may be lying around. Since various +other classes hold pointers to TQTextCodec objects, it is not safe +to call this function earlier. +

If you are using the utility classes (like TQString) but not using +TQApplication, calling this function at the very end of your +application may be helpful for chasing down memory leaks by +eliminating any TQTextCodec objects. + +

TQCString TQTextCodec::fromUnicode ( const TQString & uc, int & lenInOut ) const [virtual] +

+TQTextCodec subclasses must reimplement either this function or +makeEncoder(). It converts the first lenInOut characters of uc from Unicode to the encoding of the subclass. If lenInOut is +negative or too large, the length of uc is used instead. +

Converts lenInOut characters (not bytes) from uc, producing +a TQCString. lenInOut will be set to the length of the result (in bytes). +

The default implementation makes an encoder with makeEncoder() and +converts the input with that. Note that the default makeEncoder() +implementation makes an encoder that simply calls this function, +hence subclasses must reimplement one function or the other to +avoid infinite recursion. + +

Reimplemented in TQHebrewCodec. +

TQCString TQTextCodec::fromUnicode ( const TQString & uc ) const +

+This is an overloaded member function, provided for convenience. It behaves essentially like the above function. +

uc is the unicode source string. + +

int TQTextCodec::heuristicContentMatch ( const char * chars, int len ) const [pure virtual] +

+ +

TQTextCodec subclasses must reimplement this function. It examines +the first len bytes of chars and returns a value indicating +how likely it is that the string is a prefix of text encoded in +the encoding of the subclass. A negative return value indicates +that the text is detectably not in the encoding (e.g. it contains +characters undefined in the encoding). A return value of 0 +indicates that the text should be decoded with this codec rather +than as ASCII, but there is no particular evidence. The value +should range up to len. Thus, most decoders will return -1, 0, +or -len. +

The characters are not null terminated. +

See also codecForContent(). + +

int TQTextCodec::heuristicNameMatch ( const char * hint ) const [virtual] +

+Returns a value indicating how likely it is that this decoder is +appropriate for decoding some format that has the given name. The +name is compared with the hint. +

A good match returns a positive number around the length of the +string. A bad match is negative. +

The default implementation calls simpleHeuristicNameMatch() with +the name of the codec. + +

TQTextCodec * TQTextCodec::loadCharmap ( TQIODevice * iod ) [static] +

+Reads a POSIX2 charmap definition from iod. +The parser recognizes the following lines: +

+  <code_set_name> name
+  <escape_char> character
+  % alias alias
+  CHARMAP
+  <token> /xhexbyte <Uunicode> ...
+  <token> /ddecbyte <Uunicode> ...
+  <token> /octbyte <Uunicode> ...
+  <token> /any/any... <Uunicode> ...
+  END CHARMAP
+
+

The resulting TQTextCodec is returned (and also added to the global +list of codecs). The name() of the result is taken from the +code_set_name. +

Note that a codec constructed in this way uses much more memory +and is slower than a hand-written TQTextCodec subclass, since +tables in code are kept in memory shared by all TQt applications. +

See also loadCharmapFile(). + +

Example: qwerty/qwerty.cpp. +

TQTextCodec * TQTextCodec::loadCharmapFile ( TQString filename ) [static] +

+A convenience function for loadCharmap() that loads the charmap +definition from the file filename. + +

const char * TQTextCodec::locale () [static] +

+Returns a string representing the current language and +sublanguage, e.g. "pt" for Portuguese, or "pt_br" for Portuguese/Brazil. + +

Example: i18n/main.cpp. +

TQTextDecoder * TQTextCodec::makeDecoder () const [virtual] +

+Creates a TQTextDecoder which stores enough state to decode chunks +of char* data to create chunks of Unicode data. The default +implementation creates a stateless decoder, which is only +sufficient for the simplest encodings where each byte corresponds +to exactly one Unicode character. +

The caller is responsible for deleting the returned object. + +

TQTextEncoder * TQTextCodec::makeEncoder () const [virtual] +

+Creates a TQTextEncoder which stores enough state to encode chunks +of Unicode data as char* data. The default implementation creates +a stateless encoder, which is only sufficient for the simplest +encodings where each Unicode character corresponds to exactly one +character. +

The caller is responsible for deleting the returned object. + +

int TQTextCodec::mibEnum () const [pure virtual] +

+ +

Subclasses of TQTextCodec must reimplement this function. It +returns the MIBenum (see the + IANA character-sets encoding file for more information). +It is important that each TQTextCodec subclass returns the correct +unique value for this function. + +

Reimplemented in TQEucJpCodec. +

const char * TQTextCodec::mimeName () const [virtual] +

+Returns the preferred mime name of the encoding as defined in the +IANA character-sets encoding file. + +

Reimplemented in TQEucJpCodec, TQEucKrCodec, TQJisCodec, TQHebrewCodec, and TQSjisCodec. +

const char * TQTextCodec::name () const [pure virtual] +

+ +

TQTextCodec subclasses must reimplement this function. It returns +the name of the encoding supported by the subclass. When choosing +a name for an encoding, consider these points: +

+ +

Example: qwerty/qwerty.cpp. +

void TQTextCodec::setCodecForCStrings ( TQTextCodec * c ) [static] +

Warning: This function is not reentrant.

+ + + +

Sets the codec used by TQString to convert to and from const char* +and TQCStrings. If c is 0 (the default), TQString assumes Latin-1. +

Warning: Some codecs do not preserve the characters in the ascii +range (0x00 to 0x7f). For example, the Japanese Shift-JIS +encoding maps the backslash character (0x5a) to the Yen character. +This leads to unexpected results when using the backslash +character to escape characters in strings used in e.g. regular +expressions. Use TQString::fromLatin1() to preserve characters in +the ascii range when needed. +

See also codecForCStrings() and setCodecForTr(). + +

void TQTextCodec::setCodecForLocale ( TQTextCodec * c ) [static] +

+Set the codec to c; this will be returned by codecForLocale(). +This might be needed for some applications that want to use their +own mechanism for setting the locale. +

See also codecForLocale(). + +

void TQTextCodec::setCodecForTr ( TQTextCodec * c ) [static] +

Warning: This function is not reentrant.

+ + + +

Sets the codec used by TQObject::tr() on its argument to c. If +c is 0 (the default), tr() assumes Latin-1. +

If the literal quoted text in the program is not in the Latin-1 +encoding, this function can be used to set the appropriate +encoding. For example, software developed by Korean programmers +might use eucKR for all the text in the program, in which case the +main() function might look like this: +

+    int main(int argc, char** argv)
+    {
+        TQApplication app(argc, argv);
+        ... install any additional codecs ...
+        TQTextCodec::setCodecForTr( TQTextCodec::codecForName("eucKR") );
+        ...
+    }
+    
+ +

Note that this is not the way to select the encoding that the user has chosen. For example, to convert an application containing +literal English strings to Korean, all that is needed is for the +English strings to be passed through tr() and for translation +files to be loaded. For details of internationalization, see the +TQt internationalization documentation. +

See also codecForTr() and setCodecForCStrings(). + +

int TQTextCodec::simpleHeuristicNameMatch ( const char * name, const char * hint ) [static protected] +

+A simple utility function for heuristicNameMatch(): it does some +very minor character-skipping so that almost-exact matches score +high. name is the text we're matching and hint is used for +the comparison. + +

TQString TQTextCodec::toUnicode ( const char * chars, int len ) const [virtual] +

+TQTextCodec subclasses must reimplement this function or +makeDecoder(). It converts the first len characters of chars +to Unicode. +

The default implementation makes a decoder with makeDecoder() and +converts the input with that. Note that the default makeDecoder() +implementation makes a decoder that simply calls +this function, hence subclasses must reimplement one function or +the other to avoid infinite recursion. + +

TQString TQTextCodec::toUnicode ( const TQByteArray & a, int len ) const +

+This is an overloaded member function, provided for convenience. It behaves essentially like the above function. +

a contains the source characters; len contains the number of +characters in a to use. + +

TQString TQTextCodec::toUnicode ( const TQByteArray & a ) const +

+This is an overloaded member function, provided for convenience. It behaves essentially like the above function. +

a contains the source characters. + +

TQString TQTextCodec::toUnicode ( const TQCString & a, int len ) const +

+This is an overloaded member function, provided for convenience. It behaves essentially like the above function. +

a contains the source characters; len contains the number of +characters in a to use. + +

TQString TQTextCodec::toUnicode ( const TQCString & a ) const +

+This is an overloaded member function, provided for convenience. It behaves essentially like the above function. +

a contains the source characters. + +

TQString TQTextCodec::toUnicode ( const char * chars ) const +

+This is an overloaded member function, provided for convenience. It behaves essentially like the above function. +

chars contains the source characters. + + +


+This file is part of the TQt toolkit. +Copyright © 1995-2007 +Trolltech. All Rights Reserved.


+ +
Copyright © 2007 +TrolltechTrademarks +
TQt 3.3.8
+
+ -- cgit v1.2.1