From 4aed2c8219774f5d797760606b8489a92ddc5163 Mon Sep 17 00:00:00 2001 From: toma Date: Wed, 25 Nov 2009 17:56:58 +0000 Subject: Copy the KDE 3.5 branch to branches/trinity for new KDE 3.5 features. BUG:215923 git-svn-id: svn://anonsvn.kde.org/home/kde/branches/trinity/kdebase@1054174 283d02a7-25f6-0310-bc7c-ecb5cbfe19da --- doc/kate/highlighting.docbook | 931 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 931 insertions(+) create mode 100644 doc/kate/highlighting.docbook (limited to 'doc/kate/highlighting.docbook') diff --git a/doc/kate/highlighting.docbook b/doc/kate/highlighting.docbook new file mode 100644 index 000000000..76952d26a --- /dev/null +++ b/doc/kate/highlighting.docbook @@ -0,0 +1,931 @@ + + + + + + + +Working with Syntax Highlighting + + + +Overview + +Syntax Highlighting is what makes the editor automatically +display text in different styles/colors, depending on the function of +the string in relation to the purpose of the file. In program source +code for example, control statements may be rendered bold, while data +types and comments get different colors from the rest of the +text. This greatly enhances the readability of the text, and thus +helps the author to be more efficient and productive. + + + +A Perl function, rendered with syntax +highlighting. +A Perl function, rendered with syntax highlighting. + + + + + +The same Perl function, without +highlighting. +The same Perl function, without highlighting. + + +Of the two examples, which is easiest to read? + +&kate; comes with a flexible, configurable and capable system +for doing syntax highlighting, and the standard distribution provides +definitions for a wide range of programming, scripting and markup +languages and other text file formats. In addition you can +provide your own definitions in simple &XML; files. + +&kate; will automatically detect the right syntax rules when you +open a file, based on the &MIME; Type of the file, determined by its +extension, or, if it has none, the contents. Should you experience a +bad choice, you can manually set the syntax to use from the +DocumentsHighlight +Mode menu. + +The styles and colors used by each syntax highlight definition +can be configured using the Appearance page of the +Config Dialog, while the &MIME; Types +it should be used for, are handeled by the Highlight +page. + + +Syntax highlighting is there to enhance the readability of +correct text, but you cannot trust it to validate your text. Marking +text for syntax is difficult depending on the format you are using, +and in some cases the authors of the syntax rules will be proud if 98% +of text gets correctly rendered, though most often you need a rare +style to see the incorrect 2%. + + + +You can download updated or additional syntax highlight +definitions from the &kate; website by clicking the +Download button in the Highlight Page of the Config Dialog. + + + + + + +The &kate; Syntax Highlight System + +This section will discuss the &kate; syntax highlighting +mechanism in more detail. It is for you if you want to know about +it, or if you want to change or create syntax definitions. + + + +How it Works + +Whenever you open a file, one of the first things the &kate; +editor does is detect which syntax definition to use for the +file. While reading the text of the file, and while you type away in +it, the syntax highlighting system will analyze the text using the +rules defined by the syntax definition and mark in it where different +contexts and styles begin and end. + +When you type in the document, the new text is analyzed and marked on the +fly, so that if you delete a character that is marked as the beginning or end +of a context, the style of surrounding text changes accordingly. + +The syntax definitions used by the &kate; Syntax Highlighting System are +&XML; files, containing + +Rules for detecting the role of text, organized into context blocks +Keyword lists +Style Item definitions + + + +When analyzing the text, the detection rules are evaluated in +the order in which they are defined, and if the beginning of the +current string matches a rule, the related context is used. The start +point in the text is moved to the final point at which that rule +matched and a new loop of the rules begins, starting in the context +set by the matched rule. + + + + +Rules + +The detection rules are the heart of the highlighting detection +system. A rule is a string, character or regular expression against which +to match the text being analyzed. It contains information about which +style to use for the matching part of the text. It may switch the +working context of the system either to an explicitly mentioned +context or to the previous context used by the text. + +Rules are organized in context groups. A context group is used +for main text concepts within the format, for example quoted text +strings or comment blocks in program source code. This ensures that +the highlighting system does not need to loop through all rules when +it is not necessary, and that some character sequences in the text can +be treated differently depending on the current context. + + +Contexts may be generated dynamically to allow the usage of instance +specific data in rules. + + + + +Context Styles and Keywords + +In some programming languages, integer numbers are treated +differently than floating point ones by the compiler (the program that +converts the source code to a binary executable), and there may be +characters having a special meaning within a quoted string. In such +cases, it makes sense to render them differently from the surroundings +so that they are easy to identify while reading the text. So even if +they do not represent special contexts, they may be seen as such by +the syntax highlighting system, so that they can be marked for +different rendering. + +A syntax definition may contain as many styles as required to +cover the concepts of the format it is used for. + +In many formats, there are lists of words that represent a +specific concept. For example in programming languages, the control +statements is one concept, data type names another, and built in +functions of the language a third. The &kate; Syntax Highlighting +System can use such lists to detect and mark words in the text to +emphasize concepts of the text formats. + + + + +Default Styles + +If you open a C++ source file, a &Java; source file and an +HTML document in &kate;, you will see that even +though the formats are different, and thus different words are chosen +for special treatment, the colors used are the same. This is because +&kate; has a predefined list of Default Styles which are employed by +the individual syntax definitions. + +This makes it easy to recognize similar concepts in different +text formats. For example comments are present in almost any +programming, scripting or markup language, and when they are rendered +using the same style in all languages, you do not have to stop and +think to identify them within the text. + + +All styles in a syntax definition use one of the default +styles. A few syntax definitions use more styles that there are +defaults, so if you use a format often, it may be worth launching the +configuration dialog to see if some concepts are using the same +style. For example there is only one default style for strings, but as +the Perl programming language operates with two types of strings, you +can enhance the highlighting by configuring those to be slightly +different. All available default styles +will be explained later. + + + + + + + +The Highlight Definition &XML; Format + + +Overview + +This section is an overview of the Highlight Definition &XML; +format. Based on a small example it will describe the main components +and their meaning and usage. The next section will go into detail with +the highlight detection rules. + +The formal definition, aka the DTD is stored +in the file language.dtd which should be +installed on your system in the folder +$KDEDIR/share/apps/katepart/syntax. + + + +Main sections of &kate; Highlight Definition files + + +A highlighting file contains a header that sets the XML version and the doctype: + + +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE language SYSTEM "language.dtd"> + + + + + +The root of the definition file is the element language. +Available attributes are: + + +Required attributes: +name sets the name of the language. It appears in the menus and dialogs afterwards. +section specifies the category. +extensions defines file extensions, like "*.cpp;*.h" + +Optional attributes: +mimetype associates files &MIME; Type based. +version specifies the current version of the definition file. +kateversion specifies the latest supported &kate; version. +casesensitive defines, whether the keywords are casesensitiv or not. +priority is necessary if another highlight definition file uses the same extensions. The higher priority will win. +author contains the name of the author and his email-address. +license contains the license, usually LGPL, Artistic, GPL and others. +hidden defines, whether the name should appear in &kate;'s menus. +So the next line may look like this: + +<language name="C++" version="1.00" kateversion="2.4" section="Sources" extensions="*.cpp;*.h" /> + + + + + + +Next comes the highlighting element, which +contains the optional element list and the required +elements contexts and itemDatas. + +list elements contain a list of keywords. In +this case the keywords are class and const. +You can add as many lists as you need. +The contexts element contains all contexts. +The first context is by default the start of the highlighting. There are +two rules in the context Normal Text, which match +the list of keywords with the name somename and a +rule that detects a quote and switches the context to string. +To learn more about rules read the next chapter. +The third part is the itemDatas element. It +contains all color and font styles needed by the contexts and rules. +In this example, the itemData Normal Text, +String and Keyword are used. + + + <highlighting> + <list name="somename"> + <item> class </item> + <item> const </item> + </list> + <contexts> + <context attribute="Normal Text" lineEndContext="#pop" name="Normal Text" > + <keyword attribute="Keyword" context="#stay" String="somename" /> + <DetectChar attribute="String" context="string" char="&quot;" /> + </context> + <context attribute="String" lineEndContext="#stay" name="string" > + <DetectChar attribute="String" context="#pop" char="&quot;" /> + </context> + </contexts> + <itemDatas> + <itemData name="Normal Text" defStyleNum="dsNormal" /> + <itemData name="Keyword" defStyleNum="dsKeyword" /> + <itemData name="String" defStyleNum="dsString" /> + </itemDatas> + </highlighting> + + + + + +The last part of a highlight definition is the optional +general section. It may contain information +about keywords, code folding, comments and indentation. + + +The comment section defines with what +string a single line comment is introduced. You also can define a +multiline comments using multiLine with the +additional attribute end. This is used if the +user presses the corresponding shortcut for comment/uncomment. +The keywords section defines whether +keyword lists are casesensitive or not. Other attributes will be +explained later. + + <general> + <comments> + <comment name="singleLine" start="#"/> + </comments> + <keywords casesensitive="1"/> + </general> +</language> + + + + + + + + + + +The Sections in Detail +This part will describe all available attributes for contexts, +itemDatas, keywords, comments, code folding and indentation. + + + +The element context belongs into the group +contexts. A context itself defines context specific +rules like what should happen if the highlight system reaches the end of a +line. Available attributes are: + + + +name the context name. Rules will use this name +to specify the context to switch to if the rule matches. +lineEndContext defines the context the highlight +system switches to if it reaches the end of a line. This may either be a name +of another context, #stay to not switch the context +(eg. do nothing) or #pop which will cause to leave this +context. It is possible to use for example #pop#pop#pop +to pop three times. +lineBeginContext defines the context if a begin +of a line is encountered. Default: #stay. +fallthrough defines if the highlight system switches +to the context specified in fallthroughContext if no rule matches. +Default: false. +fallthroughContext specifies the next context +if no rule matches. +dynamic if true, the context +remembers strings/placeholders saved by dynamic rules. This is needed for HERE +documents for example. Default: false. + + + + + +The element itemData is in the group +itemDatas. It defines the font style and colors. +So it is possible to define your own styles and colors, however we +recommend to stick to the default styles if possible so that the user +will always see the same colors used in different languages. Though, +sometimes there is no other way and it is necessary to change color +and font attributes. The attributes name and defStyleNum are required, +the other optional. Available attributes are: + + +name sets the name of the itemData. +Contexts and rules will use this name in their attribute +attribute to reference an itemData. +defStyleNum defines which default style to use. +Available default styles are explained in detail later. +color defines a color. Valid formats are +'#rrggbb' or '#rgb'. +selColor defines the selection color. +italic if true, the text will be italic. +bold if true, the text will be bold. +underline if true, the text will be underlined. +strikeout if true, the text will be stroked out. + + + + + +The element keywords in the group +general defines keyword properties. Available attributes are: + + +casesensitive may be true +or false. If true, all keywords +are matched casesensitive +weakDeliminator is a list of characters that +do not act as word delimiters. For example the dot '.' +is a word delimiter. Assume a keyword in a list contains +a dot, it will only match if you specify the dot as a weak delimiter. +additionalDeliminator defines additional delimiters. +wordWrapDeliminator defines characters after which a +line wrap may occur. +Default delimiters and word wrap delimiters are the characters +.():!+,-<=>%&*/;?[]^{|}~\, space (' ') +and tabulator ('\t'). + + + + + +The element comment in the group +comments defines comment properties which are used +for ToolsComment and +ToolsUncomment. +Available attributes are: + + +name is either singleLine +or multiLine. If you choose multiLine +the attributes end and region are +required. +start defines the string used to start a comment. +In C++ this would be "/*". +end defines the string used to close a comment. +In C++ this would be "*/". +region should be the name of the the foldable +multiline comment. Assume you have beginRegion="Comment" +... endRegion="Comment" in your rules, you should use +region="Comment". This way uncomment works even if you +do not select all the text of the multiline comment. The cursor only must be +in the multiline comment. + + + + + +The element folding in the group +general defines code folding properties. +Available attributes are: + + +indentationsensitive if true, the code folding markers +will be added indentation based, like in the scripting language Python. Usually you +do not need to set it, as it defaults to false. + + + + + +The element indentation in the group +general defines which indenter will be used, however we strongly +recommend to omit this element, as the indenter usually will be set by either defining +a File Type or by adding a mode line to the text file. If you specify an indenter though, +you will force a specific indentation on the user, which he might not like at all. +Available attributes are: + + +mode is the name of the indenter. Available indenters +right now are: normal, cstyle, csands, xml, python and +varindent. + + + + + + + + + + +Available Default Styles +Default Styles were already explained, +as a short summary: Default styles are predefined font and color styles. + + +So here only the list of available default styles: + +dsNormal, used for normal text. +dsKeyword, used for keywords. +dsDataType, used for data types. +dsDecVal, used for decimal values. +dsBaseN, used for values with a base other than 10. +dsFloat, used for float values. +dsChar, used for a character. +dsString, used for strings. +dsComment, used for comments. +dsOthers, used for 'other' things. +dsAlert, used for warning messages. +dsFunction, used for function calls. +dsRegionMarker, used for region markers. +dsError, used for error highlighting and wrong syntax. + + + + + + + + + +Highlight Detection Rules + +This section describes the syntax detection rules. + +Each rule can match zero or more characters at the beginning of +the string they are test against. If the rule matches, the matching +characters are assigned the style or attribute +defined by the rule, and a rule may ask that the current context is +switched. + +A rule looks like this: + +<RuleName attribute="(identifier)" context="(identifier)" [rule specific attributes] /> + +The attribute identifies the style to use +for matched characters by name, and the context +identifies the context to use from here. + +The context can be identified by: + + + +An identifier, which is the name of the other +context. + + +An order telling the engine to stay in the +current context (#stay), or to pop back to a +previous context used in the string (#pop). +To go back more steps, the #pop keyword can be repeated: +#pop#pop#pop + + + +Some rules can have child rules which are +then evaluated only if the parent rule matched. The entire matched +string will be given the attribute defined by the parent rule. A rule +with child rules looks like this: + + +<RuleName (attributes)> + <ChildRuleName (attributes) /> + ... +</RuleName> + + + +Rule specific attributes varies and are described in the +following sections. + + + +Common attributes +All rules have the following attributes in common and are +available whenever (common attributes) appears. +attribute and context +are required attributes, all others are optional. + + + +attribute: An attribute maps to a defined itemData. + + +context: Specify the context to which the highlighting system switches if the rule matches. + + +beginRegion: Start a code folding block. Default: unset. + + +endRegion: Close a code folding block. Default: unset. + + +lookAhead: If true, the +highlighting system will not process the matches length. +Default: false. + + +firstNonSpace: Match only, if the string is +the first non-whitespace in the line. Default: false. + + +column: Match only, if the column matches. Default: unset. + + + + +Dynamic rules +Some rules allow the optional attribute dynamic +of type boolean that defaults to false. If dynamic is +true, a rule can use placeholders representing the text +matched by a regular expression rule that switched to the +current context in its string or +char attributes. In a string, +the placeholder %N (where N is a number) will be +replaced with the corresponding capture N +from the calling regular expression. In a +char the placeholer must be a number +N and it will be replaced with the first character of +the corresponding capture N from the calling regular +expression. Whenever a rule allows this attribute it will contain a +(dynamic). + + +dynamic: may be (true|false). + + + + +The Rules in Detail + + + +DetectChar + +Detect a single specific character. Commonly used for example to +find the ends of quoted strings. +<DetectChar char="(character)" (common attributes) (dynamic) /> +The char attribute defines the character +to match. + + + + +Detect2Chars + +Detect two specific characters in a defined order. +<Detect2Chars char="(character)" char1="(character)" (common attributes) (dynamic) /> +The char attribute defines the first character to match, +char1 the second. + + + + +AnyChar + +Detect one character of a set of specified characters. +<AnyChar String="(string)" (common attributes) /> +The String attribute defines the set of +characters. + + + + +StringDetect + +Detect an exact string. +<StringDetect String="(string)" [insensitive="true|false"] (common attributes) (dynamic) /> +The String attribute defines the string +to match. The insensitive attribute defaults to +false and is passed to the string comparison +function. If the value is true insensitive +comparing is used. + + + + +RegExpr + +Matches against a regular expression. +<RegExpr String="(string)" [insensitive="true|false"] [minimal="true|false"] (common attributes) (dynamic) /> +The String attribute defines the regular +expression. +insensitive defaults to +false and is passed to the regular expression +engine. +minimal defaults to +false and is passed to the regular expression +engine. +Because the rules are always matched against the beginning of +the current string, a regular expression starting with a caret +(^) indicates that the rule should only be +matched against the start of a line. +See Regular Expressions +for more information on those. + + + + +keyword + +Detect a keyword from a specified list. +<keyword String="(list name)" (common attributes) /> +The String attribute identifies the +keyword list by name. A list with that name must exist. + + + + +Int + +Detect an integer number. +<Int (common attributes) (dynamic) /> +This rule has no specific attributes. Child rules are typically +used to detect combinations of L and +U after the number, indicating the integer type +in program code. Actually all rules are allowed as child rules, though, +the DTD only allowes the child rule StringDetect. +The following example matches integer numbers follows by the character 'L'. + +<Int attribute="Decimal" context="#stay" > + <StringDetect attribute="Decimal" context="#stay" String="L" insensitive="true"/> +</Int> + + + + + + +Float + +Detect a floating point number. +<Float (common attributes) /> +This rule has no specific attributes. AnyChar is +allowed as a child rules and typically used to detect combinations, see rule +Int for reference. + + + + +HlCOct + +Detect an octal point number representation. +<HlCOct (common attributes) /> +This rule has no specific attributes. + + + + +HlCHex + +Detect a hexadecimal number representation. +<HlCHex (common attributes) /> +This rule has no specific attributes. + + + + +HlCStringChar + +Detect an escaped character. +<HlCStringChar (common attributes) /> +This rule has no specific attributes. + +It matches literal representations of characters commonly used in +program code, for example \n +(newline) or \t (TAB). + +The following characters will match if they follow a backslash +(\): +abefnrtv"'?\. Additionally, escaped +hexadecimal numbers like for example \xff and +escaped octal numbers, for example \033 will +match. + + + + + +HlCChar + +Detect an C character. +<HlCChar (common attributes) /> +This rule has no specific attributes. + +It matches C characters enclosed in a tick (Example: 'c'). +So in the ticks may be a simple character or an escaped character. +See HlCStringChar for matched escaped character sequences. + + + + + +RangeDetect + +Detect a string with defined start and end characters. +<RangeDetect char="(character)" char1="(character)" (common attributes) /> +char defines the character starting the range, +char1 the character ending the range. +Usefull to detect for example small quoted strings and the like, but +note that since the highlighting engine works on one line at a time, this +will not find strings spanning over a line break. + + + + +LineContinue + +Matches at end of line. +<LineContinue (common attributes) /> +This rule has no specific attributes. +This rule is useful for switching context at end of line, if the last +character is a backslash ('\'). This is needed for +example in C/C++ to continue macros or strings. + + + + +IncludeRules + +Include rules from another context or language/file. +<IncludeRules context="contextlink" [includeAttrib="true|false"] /> + +The context attribute defines which context to include. +If it a simple string it includes all defined rules into the current context, example: +<IncludeRules context="anotherContext" /> + + +If the string begins with ## the highlight system +will look for another language definition with the given name, example: +<IncludeRules context="##C++" /> +If includeAttrib attribute is +true, change the destination attribute to the one of +the source. This is required to make for example commenting work, if text +matched by the included context is a different highlight than the host +context. + + + + + + +DetectSpaces + +Detect whitespaces. +<DetectSpaces (common attributes) /> + +This rule has no specific attributes. +Use this rule if you know that there can several whitespaces ahead, +for example in the beginning of indented lines. This rule will skip all +whitespace at once, instead of testing multiple rules and skipping one at the +time due to no match. + + + + + +DetectIdentifier + +Detect identifier strings (as a regular expression: [a-zA-Z_][a-zA-Z0-9_]*). +<DetectIdentifier (common attributes) /> + +This rule has no specific attributes. +Use this rule to skip a string of word characters at once, rather than +testing with multiple rules and skipping one at the time due to no match. + + + + + + + +Tips & Tricks + + +Once you have understood how the context switching works it will be +easy to write highlight definitions. Though you should carefully check what +rule you choose in what situation. Regular expressions are very mighty, but +they are slow compared to the other rules. So you may consider the following +tips. + + + +If you only match two characters use Detect2Chars +instead of StringDetect. The same applies to +DetectChar. + + +Regular expressions are easy to use but often there is another much +faster way to achieve the same result. Consider you only want to match +the character '#' if it is the first character in the +line. A regular expression based solution would look like this: +<RegExpr attribute="Macro" context="macro" String="^\s*#" /> +You can achieve the same much faster in using: +<DetectChar attribute="Macro" context="macro" char="#" firstNonSpace="true" /> +If you want to match the regular expression '^#' you +can still use DetectChar with the attribute column="0". +The attribute column counts character based, so a tabulator still is only one character. + + + +You can switch contexts without processing characters. Assume that you +want to switch context when you meet the string */, but +need to process that string in the next context. The below rule will match, and +the lookAhead attribute will cause the highlighter to +keep the matched string for the next context. +<Detect2Chars attribute="Comment" context="#pop" char="*" char1="/" lookAhead="true" /> + + + +Use DetectSpaces if you know that many whitespaces occur. + + +Use DetectIdentifier instead of the regular expression '[a-zA-Z_]\w*'. + + +Use default styles whenever you can. This way the user will find a familiar environment. + + +Look into other XML-files to see how other people implement tricky rules. + + +You can validate every XML file by using the command +xmllint --dtdvalid language.dtd mySyntax.xml. + + +If you repeat complex regular expression very often you can use +ENTITIES. Example: + +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE language SYSTEM "language.dtd" +[ + <!ENTITY myref "[A-Za-z_:][\w.:_-]*"> +]> + +Now you can use &myref; instead of the regular +expression. + + + + + + + -- cgit v1.2.1