diff options
Diffstat (limited to 'doc/kate/highlighting.docbook')
-rw-r--r-- | doc/kate/highlighting.docbook | 931 |
1 files changed, 931 insertions, 0 deletions
diff --git a/doc/kate/highlighting.docbook b/doc/kate/highlighting.docbook new file mode 100644 index 000000000..3a64d9d2c --- /dev/null +++ b/doc/kate/highlighting.docbook @@ -0,0 +1,931 @@ +<appendix id="highlight"> +<appendixinfo> +<authorgroup> +<author><personname><firstname></firstname></personname></author> +<!-- TRANS:ROLES_OF_TRANSLATORS --> +</authorgroup> +</appendixinfo> +<title>Working with Syntax Highlighting</title> + +<sect1 id="highlight-overview"> + +<title>Overview</title> + +<para>Syntax Highlighting is what makes the editor automatically +display text in different styles/colors, depending on the function of +the string in relation to the purpose of the file. In program source +code for example, control statements may be rendered bold, while data +types and comments get different colors from the rest of the +text. This greatly enhances the readability of the text, and thus +helps the author to be more efficient and productive.</para> + +<mediaobject> +<imageobject><imagedata format="PNG" fileref="highlighted.png"/></imageobject> +<textobject><phrase>A Perl function, rendered with syntax +highlighting.</phrase></textobject> +<caption><para>A Perl function, rendered with syntax highlighting.</para> +</caption> +</mediaobject> + +<mediaobject> +<imageobject><imagedata format="PNG" fileref="unhighlighted.png"/></imageobject> +<textobject><phrase>The same Perl function, without +highlighting.</phrase></textobject> +<caption><para>The same Perl function, without highlighting.</para></caption> +</mediaobject> + +<para>Of the two examples, which is easiest to read?</para> + +<para>&kate; comes with a flexible, configurable and capable system +for doing syntax highlighting, and the standard distribution provides +definitions for a wide range of programming, scripting and markup +languages and other text file formats. In addition you can +provide your own definitions in simple &XML; files.</para> + +<para>&kate; will automatically detect the right syntax rules when you +open a file, based on the &MIME; Type of the file, determined by its +extension, or, if it has none, the contents. Should you experience a +bad choice, you can manually set the syntax to use from the +<menuchoice><guimenu>Documents</guimenu><guisubmenu>Highlight +Mode</guisubmenu></menuchoice> menu.</para> + +<para>The styles and colors used by each syntax highlight definition +can be configured using the <link +linkend="config-dialog-editor-appearance">Appearance</link> page of the +<link linkend="config-dialog">Config Dialog</link>, while the &MIME; Types +it should be used for, are handeled by the <link +linkend="config-dialog-editor-highlighting">Highlight</link> +page.</para> + +<note> +<para>Syntax highlighting is there to enhance the readability of +correct text, but you cannot trust it to validate your text. Marking +text for syntax is difficult depending on the format you are using, +and in some cases the authors of the syntax rules will be proud if 98% +of text gets correctly rendered, though most often you need a rare +style to see the incorrect 2%.</para> +</note> + +<tip> +<para>You can download updated or additional syntax highlight +definitions from the &kate; website by clicking the +<guibutton>Download</guibutton> button in the <link +linkend="config-dialog-editor-highlighting">Highlight Page</link> of the <link +linkend="config-dialog">Config Dialog</link>.</para> +</tip> + +</sect1> + +<sect1 id="katehighlight-system"> + +<title>The &kate; Syntax Highlight System</title> + +<para>This section will discuss the &kate; syntax highlighting +mechanism in more detail. It is for you if you want to know about +it, or if you want to change or create syntax definitions.</para> + +<sect2 id="katehighlight-howitworks"> + +<title>How it Works</title> + +<para>Whenever you open a file, one of the first things the &kate; +editor does is detect which syntax definition to use for the +file. While reading the text of the file, and while you type away in +it, the syntax highlighting system will analyze the text using the +rules defined by the syntax definition and mark in it where different +contexts and styles begin and end.</para> + +<para>When you type in the document, the new text is analyzed and marked on the +fly, so that if you delete a character that is marked as the beginning or end +of a context, the style of surrounding text changes accordingly.</para> + +<para>The syntax definitions used by the &kate; Syntax Highlighting System are +&XML; files, containing +<itemizedlist> +<listitem><para>Rules for detecting the role of text, organized into context blocks</para></listitem> +<listitem><para>Keyword lists</para></listitem> +<listitem><para>Style Item definitions</para></listitem> +</itemizedlist> +</para> + +<para>When analyzing the text, the detection rules are evaluated in +the order in which they are defined, and if the beginning of the +current string matches a rule, the related context is used. The start +point in the text is moved to the final point at which that rule +matched and a new loop of the rules begins, starting in the context +set by the matched rule.</para> + +</sect2> + +<sect2 id="highlight-system-rules"> +<title>Rules</title> + +<para>The detection rules are the heart of the highlighting detection +system. A rule is a string, character or <link +linkend="regular-expressions">regular expression</link> against which +to match the text being analyzed. It contains information about which +style to use for the matching part of the text. It may switch the +working context of the system either to an explicitly mentioned +context or to the previous context used by the text.</para> + +<para>Rules are organized in context groups. A context group is used +for main text concepts within the format, for example quoted text +strings or comment blocks in program source code. This ensures that +the highlighting system does not need to loop through all rules when +it is not necessary, and that some character sequences in the text can +be treated differently depending on the current context. +</para> + +<para>Contexts may be generated dynamically to allow the usage of instance +specific data in rules.</para> + +</sect2> + +<sect2 id="highlight-context-styles-keywords"> +<title>Context Styles and Keywords</title> + +<para>In some programming languages, integer numbers are treated +differently than floating point ones by the compiler (the program that +converts the source code to a binary executable), and there may be +characters having a special meaning within a quoted string. In such +cases, it makes sense to render them differently from the surroundings +so that they are easy to identify while reading the text. So even if +they do not represent special contexts, they may be seen as such by +the syntax highlighting system, so that they can be marked for +different rendering.</para> + +<para>A syntax definition may contain as many styles as required to +cover the concepts of the format it is used for.</para> + +<para>In many formats, there are lists of words that represent a +specific concept. For example in programming languages, the control +statements is one concept, data type names another, and built in +functions of the language a third. The &kate; Syntax Highlighting +System can use such lists to detect and mark words in the text to +emphasize concepts of the text formats.</para> + +</sect2> + +<sect2 id="kate-highlight-system-default-styles"> +<title>Default Styles</title> + +<para>If you open a C++ source file, a &Java; source file and an +<acronym>HTML</acronym> document in &kate;, you will see that even +though the formats are different, and thus different words are chosen +for special treatment, the colors used are the same. This is because +&kate; has a predefined list of Default Styles which are employed by +the individual syntax definitions.</para> + +<para>This makes it easy to recognize similar concepts in different +text formats. For example comments are present in almost any +programming, scripting or markup language, and when they are rendered +using the same style in all languages, you do not have to stop and +think to identify them within the text.</para> + +<tip> +<para>All styles in a syntax definition use one of the default +styles. A few syntax definitions use more styles that there are +defaults, so if you use a format often, it may be worth launching the +configuration dialog to see if some concepts are using the same +style. For example there is only one default style for strings, but as +the Perl programming language operates with two types of strings, you +can enhance the highlighting by configuring those to be slightly +different. All <link linkend="kate-highlight-default-styles">available default styles</link> +will be explained later.</para> +</tip> + +</sect2> + +</sect1> + +<sect1 id="katehighlight-xml-format"> +<title>The Highlight Definition &XML; Format</title> + +<sect2> +<title>Overview</title> + +<para>This section is an overview of the Highlight Definition &XML; +format. Based on a small example it will describe the main components +and their meaning and usage. The next section will go into detail with +the highlight detection rules.</para> + +<para>The formal definition, aka the <acronym>DTD</acronym> is stored +in the file <filename>language.dtd</filename> which should be +installed on your system in the folder +<filename>$<envar>TDEDIR</envar>/share/apps/katepart/syntax</filename>. +</para> + +<variablelist> +<title>Main sections of &kate; Highlight Definition files</title> + +<varlistentry> +<term>A highlighting file contains a header that sets the XML version and the doctype:</term> +<listitem> +<programlisting> +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE language SYSTEM "language.dtd"> +</programlisting> +</listitem> +</varlistentry> + +<varlistentry> +<term>The root of the definition file is the element <userinput>language</userinput>. +Available attributes are:</term> + +<listitem> +<para>Required attributes:</para> +<para><userinput>name</userinput> sets the name of the language. It appears in the menus and dialogs afterwards.</para> +<para><userinput>section</userinput> specifies the category.</para> +<para><userinput>extensions</userinput> defines file extensions, like "*.cpp;*.h"</para> + +<para>Optional attributes:</para> +<para><userinput>mimetype</userinput> associates files &MIME; Type based.</para> +<para><userinput>version</userinput> specifies the current version of the definition file.</para> +<para><userinput>kateversion</userinput> specifies the latest supported &kate; version.</para> +<para><userinput>casesensitive</userinput> defines, whether the keywords are casesensitiv or not.</para> +<para><userinput>priority</userinput> is necessary if another highlight definition file uses the same extensions. The higher priority will win.</para> +<para><userinput>author</userinput> contains the name of the author and his email-address.</para> +<para><userinput>license</userinput> contains the license, usually LGPL, Artistic, GPL and others.</para> +<para><userinput>hidden</userinput> defines, whether the name should appear in &kate;'s menus.</para> +<para>So the next line may look like this:</para> +<programlisting> +<language name="C++" version="1.00" kateversion="2.4" section="Sources" extensions="*.cpp;*.h" /> +</programlisting> +</listitem> +</varlistentry> + + +<varlistentry> +<term>Next comes the <userinput>highlighting</userinput> element, which +contains the optional element <userinput>list</userinput> and the required +elements <userinput>contexts</userinput> and <userinput>itemDatas</userinput>.</term> +<listitem> +<para><userinput>list</userinput> elements contain a list of keywords. In +this case the keywords are <emphasis>class</emphasis> and <emphasis>const</emphasis>. +You can add as many lists as you need.</para> +<para>The <userinput>contexts</userinput> element contains all contexts. +The first context is by default the start of the highlighting. There are +two rules in the context <emphasis>Normal Text</emphasis>, which match +the list of keywords with the name <emphasis>somename</emphasis> and a +rule that detects a quote and switches the context to <emphasis>string</emphasis>. +To learn more about rules read the next chapter.</para> +<para>The third part is the <userinput>itemDatas</userinput> element. It +contains all color and font styles needed by the contexts and rules. +In this example, the <userinput>itemData</userinput> <emphasis>Normal Text</emphasis>, +<emphasis>String</emphasis> and <emphasis>Keyword</emphasis> are used. +</para> +<programlisting> + <highlighting> + <list name="somename"> + <item> class </item> + <item> const </item> + </list> + <contexts> + <context attribute="Normal Text" lineEndContext="#pop" name="Normal Text" > + <keyword attribute="Keyword" context="#stay" String="somename" /> + <DetectChar attribute="String" context="string" char="&quot;" /> + </context> + <context attribute="String" lineEndContext="#stay" name="string" > + <DetectChar attribute="String" context="#pop" char="&quot;" /> + </context> + </contexts> + <itemDatas> + <itemData name="Normal Text" defStyleNum="dsNormal" /> + <itemData name="Keyword" defStyleNum="dsKeyword" /> + <itemData name="String" defStyleNum="dsString" /> + </itemDatas> + </highlighting> +</programlisting> +</listitem> +</varlistentry> + +<varlistentry> +<term>The last part of a highlight definition is the optional +<userinput>general</userinput> section. It may contain information +about keywords, code folding, comments and indentation.</term> + +<listitem> +<para>The <userinput>comment</userinput> section defines with what +string a single line comment is introduced. You also can define a +multiline comments using <emphasis>multiLine</emphasis> with the +additional attribute <emphasis>end</emphasis>. This is used if the +user presses the corresponding shortcut for <emphasis>comment/uncomment</emphasis>.</para> +<para>The <userinput>keywords</userinput> section defines whether +keyword lists are casesensitive or not. Other attributes will be +explained later.</para> +<programlisting> + <general> + <comments> + <comment name="singleLine" start="#"/> + </comments> + <keywords casesensitive="1"/> + </general> +</language> +</programlisting> +</listitem> +</varlistentry> + +</variablelist> + + +</sect2> + +<sect2 id="kate-highlight-sections"> +<title>The Sections in Detail</title> +<para>This part will describe all available attributes for contexts, +itemDatas, keywords, comments, code folding and indentation.</para> + +<variablelist> +<varlistentry> +<term>The element <userinput>context</userinput> belongs into the group +<userinput>contexts</userinput>. A context itself defines context specific +rules like what should happen if the highlight system reaches the end of a +line. Available attributes are:</term> + + +<listitem> +<para><userinput>name</userinput> the context name. Rules will use this name +to specify the context to switch to if the rule matches.</para> +<para><userinput>lineEndContext</userinput> defines the context the highlight +system switches to if it reaches the end of a line. This may either be a name +of another context, <userinput>#stay</userinput> to not switch the context +(eg. do nothing) or <userinput>#pop</userinput> which will cause to leave this +context. It is possible to use for example <userinput>#pop#pop#pop</userinput> +to pop three times.</para> +<para><userinput>lineBeginContext</userinput> defines the context if a begin +of a line is encountered. Default: #stay.</para> +<para><userinput>fallthrough</userinput> defines if the highlight system switches +to the context specified in fallthroughContext if no rule matches. +Default: <emphasis>false</emphasis>.</para> +<para><userinput>fallthroughContext</userinput> specifies the next context +if no rule matches.</para> +<para><userinput>dynamic</userinput> if <emphasis>true</emphasis>, the context +remembers strings/placeholders saved by dynamic rules. This is needed for HERE +documents for example. Default: <emphasis>false</emphasis>.</para> +</listitem> +</varlistentry> + + +<varlistentry> +<term>The element <userinput>itemData</userinput> is in the group +<userinput>itemDatas</userinput>. It defines the font style and colors. +So it is possible to define your own styles and colors, however we +recommend to stick to the default styles if possible so that the user +will always see the same colors used in different languages. Though, +sometimes there is no other way and it is necessary to change color +and font attributes. The attributes name and defStyleNum are required, +the other optional. Available attributes are:</term> + +<listitem> +<para><userinput>name</userinput> sets the name of the itemData. +Contexts and rules will use this name in their attribute +<emphasis>attribute</emphasis> to reference an itemData.</para> +<para><userinput>defStyleNum</userinput> defines which default style to use. +Available default styles are explained in detail later.</para> +<para><userinput>color</userinput> defines a color. Valid formats are +'#rrggbb' or '#rgb'.</para> +<para><userinput>selColor</userinput> defines the selection color.</para> +<para><userinput>italic</userinput> if <emphasis>true</emphasis>, the text will be italic.</para> +<para><userinput>bold</userinput> if <emphasis>true</emphasis>, the text will be bold.</para> +<para><userinput>underline</userinput> if <emphasis>true</emphasis>, the text will be underlined.</para> +<para><userinput>strikeout</userinput> if <emphasis>true</emphasis>, the text will be stroked out.</para> +</listitem> +</varlistentry> + + +<varlistentry> +<term>The element <userinput>keywords</userinput> in the group +<userinput>general</userinput> defines keyword properties. Available attributes are:</term> + +<listitem> +<para><userinput>casesensitive</userinput> may be <emphasis>true</emphasis> +or <emphasis>false</emphasis>. If <emphasis>true</emphasis>, all keywords +are matched casesensitive</para> +<para><userinput>weakDeliminator</userinput> is a list of characters that +do not act as word delimiters. For example the dot <userinput>'.'</userinput> +is a word delimiter. Assume a keyword in a <userinput>list</userinput> contains +a dot, it will only match if you specify the dot as a weak delimiter.</para> +<para><userinput>additionalDeliminator</userinput> defines additional delimiters.</para> +<para><userinput>wordWrapDeliminator</userinput> defines characters after which a +line wrap may occur.</para> +<para>Default delimiters and word wrap delimiters are the characters +<userinput>.():!+,-<=>%&*/;?[]^{|}~\</userinput>, space (<userinput>' '</userinput>) +and tabulator (<userinput>'\t'</userinput>).</para> +</listitem> +</varlistentry> + + +<varlistentry> +<term>The element <userinput>comment</userinput> in the group +<userinput>comments</userinput> defines comment properties which are used +for <menuchoice><guimenu>Tools</guimenu><guimenuitem>Comment</guimenuitem></menuchoice> and +<menuchoice><guimenu>Tools</guimenu><guimenuitem>Uncomment</guimenuitem></menuchoice>. +Available attributes are:</term> + +<listitem> +<para><userinput>name</userinput> is either <emphasis>singleLine</emphasis> +or <emphasis>multiLine</emphasis>. If you choose <emphasis>multiLine</emphasis> +the attributes <emphasis>end</emphasis> and <emphasis>region</emphasis> are +required.</para> +<para><userinput>start</userinput> defines the string used to start a comment. +In C++ this would be "/*".</para> +<para><userinput>end</userinput> defines the string used to close a comment. +In C++ this would be "*/".</para> +<para><userinput>region</userinput> should be the name of the the foldable +multiline comment. Assume you have <emphasis>beginRegion="Comment"</emphasis> +... <emphasis>endRegion="Comment"</emphasis> in your rules, you should use +<emphasis>region="Comment"</emphasis>. This way uncomment works even if you +do not select all the text of the multiline comment. The cursor only must be +in the multiline comment.</para> +</listitem> +</varlistentry> + + +<varlistentry> +<term>The element <userinput>folding</userinput> in the group +<userinput>general</userinput> defines code folding properties. +Available attributes are:</term> + +<listitem> +<para><userinput>indentationsensitive</userinput> if <emphasis>true</emphasis>, the code folding markers +will be added indentation based, like in the scripting language Python. Usually you +do not need to set it, as it defaults to <emphasis>false</emphasis>.</para> +</listitem> +</varlistentry> + + +<varlistentry> +<term>The element <userinput>indentation</userinput> in the group +<userinput>general</userinput> defines which indenter will be used, however we strongly +recommend to omit this element, as the indenter usually will be set by either defining +a File Type or by adding a mode line to the text file. If you specify an indenter though, +you will force a specific indentation on the user, which he might not like at all. +Available attributes are:</term> + +<listitem> +<para><userinput>mode</userinput> is the name of the indenter. Available indenters +right now are: <emphasis>normal, cstyle, csands, xml, python</emphasis> and +<emphasis>varindent</emphasis>.</para> +</listitem> +</varlistentry> + + +</variablelist> + + +</sect2> + +<sect2 id="kate-highlight-default-styles"> +<title>Available Default Styles</title> +<para>Default Styles were <link linkend="kate-highlight-system-default-styles">already explained</link>, +as a short summary: Default styles are predefined font and color styles.</para> +<variablelist> +<varlistentry> +<term>So here only the list of available default styles:</term> +<listitem> +<para><userinput>dsNormal</userinput>, used for normal text.</para> +<para><userinput>dsKeyword</userinput>, used for keywords.</para> +<para><userinput>dsDataType</userinput>, used for data types.</para> +<para><userinput>dsDecVal</userinput>, used for decimal values.</para> +<para><userinput>dsBaseN</userinput>, used for values with a base other than 10.</para> +<para><userinput>dsFloat</userinput>, used for float values.</para> +<para><userinput>dsChar</userinput>, used for a character.</para> +<para><userinput>dsString</userinput>, used for strings.</para> +<para><userinput>dsComment</userinput>, used for comments.</para> +<para><userinput>dsOthers</userinput>, used for 'other' things.</para> +<para><userinput>dsAlert</userinput>, used for warning messages.</para> +<para><userinput>dsFunction</userinput>, used for function calls.</para> +<para><userinput>dsRegionMarker</userinput>, used for region markers.</para> +<para><userinput>dsError</userinput>, used for error highlighting and wrong syntax.</para> +</listitem> +</varlistentry> +</variablelist> + +</sect2> + +</sect1> + +<sect1 id="kate-highlight-rules-detailled"> +<title>Highlight Detection Rules</title> + +<para>This section describes the syntax detection rules.</para> + +<para>Each rule can match zero or more characters at the beginning of +the string they are test against. If the rule matches, the matching +characters are assigned the style or <emphasis>attribute</emphasis> +defined by the rule, and a rule may ask that the current context is +switched.</para> + +<para>A rule looks like this:</para> + +<programlisting><RuleName attribute="(identifier)" context="(identifier)" [rule specific attributes] /></programlisting> + +<para>The <emphasis>attribute</emphasis> identifies the style to use +for matched characters by name, and the <emphasis>context</emphasis> +identifies the context to use from here.</para> + +<para>The <emphasis>context</emphasis> can be identified by:</para> + +<itemizedlist> +<listitem> +<para>An <emphasis>identifier</emphasis>, which is the name of the other +context.</para> +</listitem> +<listitem> +<para>An <emphasis>order</emphasis> telling the engine to stay in the +current context (<userinput>#stay</userinput>), or to pop back to a +previous context used in the string (<userinput>#pop</userinput>).</para> +<para>To go back more steps, the #pop keyword can be repeated: +<userinput>#pop#pop#pop</userinput></para> +</listitem> +</itemizedlist> + +<para>Some rules can have <emphasis>child rules</emphasis> which are +then evaluated only if the parent rule matched. The entire matched +string will be given the attribute defined by the parent rule. A rule +with child rules looks like this:</para> + +<programlisting> +<RuleName (attributes)> + <ChildRuleName (attributes) /> + ... +</RuleName> +</programlisting> + + +<para>Rule specific attributes varies and are described in the +following sections.</para> + + +<itemizedlist> +<title>Common attributes</title> +<para>All rules have the following attributes in common and are +available whenever <userinput>(common attributes)</userinput> appears. +<emphasis>attribute</emphasis> and <emphasis>context</emphasis> +are required attributes, all others are optional. +</para> + +<listitem> +<para><emphasis>attribute</emphasis>: An attribute maps to a defined <emphasis>itemData</emphasis>.</para> +</listitem> +<listitem> +<para><emphasis>context</emphasis>: Specify the context to which the highlighting system switches if the rule matches.</para> +</listitem> +<listitem> +<para><emphasis>beginRegion</emphasis>: Start a code folding block. Default: unset.</para> +</listitem> +<listitem> +<para><emphasis>endRegion</emphasis>: Close a code folding block. Default: unset.</para> +</listitem> +<listitem> +<para><emphasis>lookAhead</emphasis>: If <emphasis>true</emphasis>, the +highlighting system will not process the matches length. +Default: <emphasis>false</emphasis>.</para> +</listitem> +<listitem> +<para><emphasis>firstNonSpace</emphasis>: Match only, if the string is +the first non-whitespace in the line. Default: <emphasis>false</emphasis>.</para> +</listitem> +<listitem> +<para><emphasis>column</emphasis>: Match only, if the column matches. Default: unset.</para> +</listitem> +</itemizedlist> + +<itemizedlist> +<title>Dynamic rules</title> +<para>Some rules allow the optional attribute <userinput>dynamic</userinput> +of type boolean that defaults to <emphasis>false</emphasis>. If dynamic is +<emphasis>true</emphasis>, a rule can use placeholders representing the text +matched by a <emphasis>regular expression</emphasis> rule that switched to the +current context in its <userinput>string</userinput> or +<userinput>char</userinput> attributes. In a <userinput>string</userinput>, +the placeholder <replaceable>%N</replaceable> (where N is a number) will be +replaced with the corresponding capture <replaceable>N</replaceable> +from the calling regular expression. In a +<userinput>char</userinput> the placeholer must be a number +<replaceable>N</replaceable> and it will be replaced with the first character of +the corresponding capture <replaceable>N</replaceable> from the calling regular +expression. Whenever a rule allows this attribute it will contain a +<emphasis>(dynamic)</emphasis>.</para> + +<listitem> +<para><emphasis>dynamic</emphasis>: may be <emphasis>(true|false)</emphasis>.</para> +</listitem> +</itemizedlist> + +<sect2 id="highlighting-rules-in-detail"> +<title>The Rules in Detail</title> + +<variablelist> +<varlistentry> +<term>DetectChar</term> +<listitem> +<para>Detect a single specific character. Commonly used for example to +find the ends of quoted strings.</para> +<programlisting><DetectChar char="(character)" (common attributes) (dynamic) /></programlisting> +<para>The <userinput>char</userinput> attribute defines the character +to match.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>Detect2Chars</term> +<listitem> +<para>Detect two specific characters in a defined order.</para> +<programlisting><Detect2Chars char="(character)" char1="(character)" (common attributes) (dynamic) /></programlisting> +<para>The <userinput>char</userinput> attribute defines the first character to match, +<userinput>char1</userinput> the second.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>AnyChar</term> +<listitem> +<para>Detect one character of a set of specified characters.</para> +<programlisting><AnyChar String="(string)" (common attributes) /></programlisting> +<para>The <userinput>String</userinput> attribute defines the set of +characters.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>StringDetect</term> +<listitem> +<para>Detect an exact string.</para> +<programlisting><StringDetect String="(string)" [insensitive="true|false"] (common attributes) (dynamic) /></programlisting> +<para>The <userinput>String</userinput> attribute defines the string +to match. The <userinput>insensitive</userinput> attribute defaults to +<emphasis>false</emphasis> and is passed to the string comparison +function. If the value is <emphasis>true</emphasis> insensitive +comparing is used.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>RegExpr</term> +<listitem> +<para>Matches against a regular expression.</para> +<programlisting><RegExpr String="(string)" [insensitive="true|false"] [minimal="true|false"] (common attributes) (dynamic) /></programlisting> +<para>The <userinput>String</userinput> attribute defines the regular +expression.</para> +<para><userinput>insensitive</userinput> defaults to +<emphasis>false</emphasis> and is passed to the regular expression +engine.</para> +<para><userinput>minimal</userinput> defaults to +<emphasis>false</emphasis> and is passed to the regular expression +engine.</para> +<para>Because the rules are always matched against the beginning of +the current string, a regular expression starting with a caret +(<literal>^</literal>) indicates that the rule should only be +matched against the start of a line.</para> +<para>See <link linkend="regular-expressions">Regular Expressions</link> +for more information on those.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>keyword</term> +<listitem> +<para>Detect a keyword from a specified list.</para> +<programlisting><keyword String="(list name)" (common attributes) /></programlisting> +<para>The <userinput>String</userinput> attribute identifies the +keyword list by name. A list with that name must exist.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>Int</term> +<listitem> +<para>Detect an integer number.</para> +<para><programlisting><Int (common attributes) (dynamic) /></programlisting></para> +<para>This rule has no specific attributes. Child rules are typically +used to detect combinations of <userinput>L</userinput> and +<userinput>U</userinput> after the number, indicating the integer type +in program code. Actually all rules are allowed as child rules, though, +the <acronym>DTD</acronym> only allowes the child rule <userinput>StringDetect</userinput>.</para> +<para>The following example matches integer numbers follows by the character 'L'. +<programlisting> +<Int attribute="Decimal" context="#stay" > + <StringDetect attribute="Decimal" context="#stay" String="L" insensitive="true"/> +</Int> +</programlisting></para> + +</listitem> +</varlistentry> + +<varlistentry> +<term>Float</term> +<listitem> +<para>Detect a floating point number.</para> +<para><programlisting><Float (common attributes) /></programlisting></para> +<para>This rule has no specific attributes. <userinput>AnyChar</userinput> is +allowed as a child rules and typically used to detect combinations, see rule +<userinput>Int</userinput> for reference.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>HlCOct</term> +<listitem> +<para>Detect an octal point number representation.</para> +<para><programlisting><HlCOct (common attributes) /></programlisting></para> +<para>This rule has no specific attributes.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>HlCHex</term> +<listitem> +<para>Detect a hexadecimal number representation.</para> +<para><programlisting><HlCHex (common attributes) /></programlisting></para> +<para>This rule has no specific attributes.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>HlCStringChar</term> +<listitem> +<para>Detect an escaped character.</para> +<para><programlisting><HlCStringChar (common attributes) /></programlisting></para> +<para>This rule has no specific attributes.</para> + +<para>It matches literal representations of characters commonly used in +program code, for example <userinput>\n</userinput> +(newline) or <userinput>\t</userinput> (TAB).</para> + +<para>The following characters will match if they follow a backslash +(<literal>\</literal>): +<userinput>abefnrtv"'?\</userinput>. Additionally, escaped +hexadecimal numbers like for example <userinput>\xff</userinput> and +escaped octal numbers, for example <userinput>\033</userinput> will +match.</para> + +</listitem> +</varlistentry> + +<varlistentry> +<term>HlCChar</term> +<listitem> +<para>Detect an C character.</para> +<para><programlisting><HlCChar (common attributes) /></programlisting></para> +<para>This rule has no specific attributes.</para> + +<para>It matches C characters enclosed in a tick (Example: <userinput>'c'</userinput>). +So in the ticks may be a simple character or an escaped character. +See HlCStringChar for matched escaped character sequences.</para> + +</listitem> +</varlistentry> + +<varlistentry> +<term>RangeDetect</term> +<listitem> +<para>Detect a string with defined start and end characters.</para> +<programlisting><RangeDetect char="(character)" char1="(character)" (common attributes) /></programlisting> +<para><userinput>char</userinput> defines the character starting the range, +<userinput>char1</userinput> the character ending the range.</para> +<para>Usefull to detect for example small quoted strings and the like, but +note that since the highlighting engine works on one line at a time, this +will not find strings spanning over a line break.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>LineContinue</term> +<listitem> +<para>Matches at end of line.</para> +<programlisting><LineContinue (common attributes) /></programlisting> +<para>This rule has no specific attributes.</para> +<para>This rule is useful for switching context at end of line, if the last +character is a backslash (<userinput>'\'</userinput>). This is needed for +example in C/C++ to continue macros or strings.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>IncludeRules</term> +<listitem> +<para>Include rules from another context or language/file.</para> +<programlisting><IncludeRules context="contextlink" [includeAttrib="true|false"] /></programlisting> + +<para>The <userinput>context</userinput> attribute defines which context to include.</para> +<para>If it a simple string it includes all defined rules into the current context, example: +<programlisting><IncludeRules context="anotherContext" /></programlisting></para> + +<para> +If the string begins with <userinput>##</userinput> the highlight system +will look for another language definition with the given name, example: +<programlisting><IncludeRules context="##C++" /></programlisting></para> +<para>If <userinput>includeAttrib</userinput> attribute is +<emphasis>true</emphasis>, change the destination attribute to the one of +the source. This is required to make for example commenting work, if text +matched by the included context is a different highlight than the host +context. +</para> +</listitem> +</varlistentry> + + +<varlistentry> +<term>DetectSpaces</term> +<listitem> +<para>Detect whitespaces.</para> +<programlisting><DetectSpaces (common attributes) /></programlisting> + +<para>This rule has no specific attributes.</para> +<para>Use this rule if you know that there can several whitespaces ahead, +for example in the beginning of indented lines. This rule will skip all +whitespace at once, instead of testing multiple rules and skipping one at the +time due to no match.</para> +</listitem> +</varlistentry> + + +<varlistentry> +<term>DetectIdentifier</term> +<listitem> +<para>Detect identifier strings (as a regular expression: [a-zA-Z_][a-zA-Z0-9_]*).</para> +<programlisting><DetectIdentifier (common attributes) /></programlisting> + +<para>This rule has no specific attributes.</para> +<para>Use this rule to skip a string of word characters at once, rather than +testing with multiple rules and skipping one at the time due to no match.</para> +</listitem> +</varlistentry> + +</variablelist> +</sect2> + +<sect2> +<title>Tips & Tricks</title> + +<itemizedlist> +<para>Once you have understood how the context switching works it will be +easy to write highlight definitions. Though you should carefully check what +rule you choose in what situation. Regular expressions are very mighty, but +they are slow compared to the other rules. So you may consider the following +tips. +</para> + +<listitem> +<para>If you only match two characters use <userinput>Detect2Chars</userinput> +instead of <userinput>StringDetect</userinput>. The same applies to +<userinput>DetectChar</userinput>.</para> +</listitem> +<listitem> +<para>Regular expressions are easy to use but often there is another much +faster way to achieve the same result. Consider you only want to match +the character <userinput>'#'</userinput> if it is the first character in the +line. A regular expression based solution would look like this: +<programlisting><RegExpr attribute="Macro" context="macro" String="^\s*#" /></programlisting> +You can achieve the same much faster in using: +<programlisting><DetectChar attribute="Macro" context="macro" char="#" firstNonSpace="true" /></programlisting> +If you want to match the regular expression <userinput>'^#'</userinput> you +can still use <userinput>DetectChar</userinput> with the attribute <userinput>column="0"</userinput>. +The attribute <userinput>column</userinput> counts character based, so a tabulator still is only one character. +</para> +</listitem> +<listitem> +<para>You can switch contexts without processing characters. Assume that you +want to switch context when you meet the string <userinput>*/</userinput>, but +need to process that string in the next context. The below rule will match, and +the <userinput>lookAhead</userinput> attribute will cause the highlighter to +keep the matched string for the next context. +<programlisting><Detect2Chars attribute="Comment" context="#pop" char="*" char1="/" lookAhead="true" /></programlisting> +</para> +</listitem> +<listitem> +<para>Use <userinput>DetectSpaces</userinput> if you know that many whitespaces occur.</para> +</listitem> +<listitem> +<para>Use <userinput>DetectIdentifier</userinput> instead of the regular expression <userinput>'[a-zA-Z_]\w*'</userinput>.</para> +</listitem> +<listitem> +<para>Use default styles whenever you can. This way the user will find a familiar environment.</para> +</listitem> +<listitem> +<para>Look into other XML-files to see how other people implement tricky rules.</para> +</listitem> +<listitem> +<para>You can validate every XML file by using the command +<command>xmllint --dtdvalid language.dtd mySyntax.xml</command>.</para> +</listitem> +<listitem> +<para>If you repeat complex regular expression very often you can use +<emphasis>ENTITIES</emphasis>. Example:</para> +<programlisting> +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE language SYSTEM "language.dtd" +[ + <!ENTITY myref "[A-Za-z_:][\w.:_-]*"> +]> +</programlisting> +<para>Now you can use <emphasis>&myref;</emphasis> instead of the regular +expression.</para> +</listitem> +</itemizedlist> +</sect2> + +</sect1> + +</appendix> |