summaryrefslogtreecommitdiffstats
path: root/tde-i18n-en_GB/docs/tdebase/kate/highlighting.docbook
blob: 2785ea10b0952117fefb59d4f763827b7d8cc855 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
<appendix id="highlight">
<appendixinfo>
<authorgroup>
<author><personname><firstname></firstname></personname></author>
<othercredit role="translator"><firstname>Malcolm</firstname><surname>Hunter</surname><affiliation><address><email>[email protected]</email></address></affiliation><contrib>Conversion to British English</contrib></othercredit> 
</authorgroup>
</appendixinfo>
<title>Working with Syntax Highlighting</title>

<sect1 id="highlight-overview">

<title>Overview</title>

<para>Syntax Highlighting is what makes the editor automatically display text in different styles/colours, depending on the function of the string in relation to the purpose of the file. In program source code for example, control statements may be rendered bold, while data types and comments get different colours from the rest of the text. This greatly enhances the readability of the text, and thus helps the author to be more efficient and productive.</para>

<mediaobject>
<imageobject><imagedata format="PNG" fileref="highlighted.png"/></imageobject>
<textobject><phrase>A perl function, rendered with syntax highlighting.</phrase></textobject>
<caption><para>A perl function, rendered with syntax highlighting.</para>
</caption>
</mediaobject>

<mediaobject>
<imageobject><imagedata format="PNG" fileref="unhighlighted.png"/></imageobject>
<textobject><phrase>The same perl function, without highlighting.</phrase></textobject>
<caption><para>The same perl function, without highlighting.</para></caption>
</mediaobject>

<para>Of the two examples, which is easiest to read?</para>

<para>&kate; comes with a flexible, configurable and capable system for doing syntax highlighting, and the standard distribution provides definitions for a wide range of programming languages, markup and scripting languages and other text file formats. In addition you can provide your own definitions in simple &XML; files.</para>

<para>&kate; will automatically detect the right syntax rules when you open a file, based on the &MIME; Type of the file, determined by its extension, or, if it has none, the contents. Should you experience a bad choice, you can manually set the syntax to use from the <menuchoice><guimenu>Documents</guimenu><guisubmenu>Highlight Mode</guisubmenu></menuchoice> menu.</para>

<para>The styles and colours used by each syntax highlight definition, as well as which &MIME;types it should be used for, can be configured using the <link linkend="config-dialog-editor-hl"> Highlight</link> page of the <link linkend="config-dialog">Config Dialogue</link>.</para>

<note>
<para>Syntax highlighting is there to enhance the readability of correct text, but you cannot trust it to validate your text. Marking text for syntax is difficult depending on the format you are using, and in some cases the authors of the syntax rules will be proud if 98% of text gets correctly rendered, though most often you need a rare style to see the incorrect 2%.</para>
</note>

<tip>
<para>You can download updated or additional syntax highlight definitions from the &kate; website by clicking the <guibutton>Download</guibutton> button in the <link linkend="config-dialog-editor-hl">Highlight Page</link> of the <link linkend="config-dialog">Config Dialogue</link>.</para>
</tip>

</sect1>

<sect1 id="katehighlight-system">

<title>The &kate; Syntax Highlight System</title>

<para>This section will discuss the &kate; syntax highlighting mechanism in more detail. It is for you if you want to know know about it, or if you want to change or create syntax definitions.</para>

<sect2 id="katehighlight-howitworks">

<title>How it Works</title>

<para>Whenever you open a file, one of the first things the &kate; editor does is detect which syntax definition to use for the file. While reading the text of the file, and while you type away in it, the syntax highlighting system will analyse the text using the rules defined by the syntax definition and mark in it where different contexts and styles begin and end.</para>

<para>When you type in the document, the new text is analysed and marked on the fly, so that if you delete a character that is marked as the beginning or end of a context, the style of surrounding text changes accordingly.</para>

<para>The syntax definitions used by the &kate; syntax highlighting system are &XML; files, containing <itemizedlist>
<listitem><para>Rules for detecting the role of text, organised into context blocks</para></listitem>
<listitem><para>Keyword lists</para></listitem>
<listitem><para>Style Item definitions</para></listitem>
</itemizedlist>
</para>

<para>When analysing the text, the detection rules are evaluated in the order in which they are defined, and if the beginning of the current string matches a rule, the related context is used. The start point in the text is moved to the final point at which that rule matched and a new loop of the rules begins, starting in the context set by the matched rule.</para>

</sect2>

<sect2 id="highlight-system-rules">
<title>Rules</title>

<para>The detection rules are the heart of the highlighting detection system. A rule is a string, character or <link linkend="regular-expressions">regular expression</link> against which to match the text being analysed. It contains information about which style to use for the matching part of the text. It may switch the working context of the system either to an explicitly mentioned context or to the previous context used by the text.</para>

<para>Rules are organised in context groups. A context group is used for main text concepts within the format, for example quoted text strings or comment blocks in program source code. This ensures that the highlighting system does not need to loop through all rules when it is not necessary, and that some character sequences in the text can be treated differently depending on the current context. </para>

</sect2>

<sect2 id="highlight-context-styles-keywords">
<title>Context Styles and Keywords</title>

<para>In some programming languages, integer numbers are treated differently than floating point ones by the compiler (the program that converts the source code to a binary executable), and there may be characters having a special meaning within a quoted string. In such cases, it makes sense to render them differently from the surroundings so that they are easy to identify while reading the text. So even if they do not represent special contexts, they may be seen as such by the syntax highlighting system, so that they can be marked for different rendering.</para>

<para>A syntax definition may contain as many styles as required to cover the concepts of the format it is used for.</para>

<para>In many formats, there are lists of words that represent a specific concept. For example in programming languages, the control statements is one concept, data type names another, and built in functions of the language a third. The &kate; Syntax Highlighting System can use such lists to detect and mark words in the text to emphasise concepts of the text formats.</para>

</sect2>

<sect2 id="kate-highlight-system-default-styles">
<title>Default Styles</title>

<para>If you open a C++ source file, a &Java; source file and an <acronym>HTML</acronym> document in &kate;, you will see that even though the formats are different, and thus different words are chosen for special treatment, the colours used are the same. This is because &kate; has a predefined list of Default Styles, that are employed by the individual syntax definitions.</para>

<para>This makes it easy to recognise similar concepts in different text formats. For example comments are present in almost any programming, scripting or markup language, and when they are rendered using the same style in all languages, you do not have to stop and think to identify them within the text.</para>

<tip>
<para>All styles in a syntax definition use one of the default styles. A few syntax definitions use more styles that there are defaults, so if you use a format often, it may be worth launching the configuration dialogue to see if some concepts are using the same style. For example there is only one default style for strings, but as the perl programming language operates with two types of strings, you can enhance the highlighting by configuring those to be slightly different.</para>
</tip>

</sect2>

</sect1>

<sect1 id="katehighlight-xml-format">
<title>The Highlight Definition &XML; Format</title>

<sect2>
<title>Overview</title>

<para>This section is an overview of the Highlight Definition &XML; format. It will describe the main components and their meaning and usage, and go into detail with the detection rules.</para>

<para>The formal definition, aka the <acronym>DTD</acronym> is stored in the file <filename>language.dtd</filename> which should be installed on your system in the folder <filename>$<envar>TDEDIR</envar>/share/apps/kate/syntax</filename>.</para>

<variablelist>
<title>Main components of &kate; Highlight Definitions</title>

<varlistentry>
<term>The General Section</term>
<listitem>
<para>The General Section contains information on the comment format of the described language, and defines whether keywords are case sensitive.</para> 
</listitem>
</varlistentry>

<varlistentry>
<term>Highlighting</term>
<listitem>
<para>The Highlighting section contains all data required to analyse and render the text. This includes:</para>

<variablelist>
<varlistentry>
<term>ItemDatas</term>
<listitem><para>Contains ItemData elements, each defining a style.</para></listitem> 
</varlistentry>

<varlistentry>
<term>Keyword lists</term>
<listitem>
<para>Each list has a name, and may contain any number of items.</para>
</listitem>
</varlistentry>

<varlistentry>
<term>Contexts</term>
<listitem>
<para>Contains contexts, which again contain the syntax detection rules.</para>
</listitem>
</varlistentry>
</variablelist>

</listitem>
</varlistentry>

</variablelist>

</sect2>

</sect1>

<sect1 id="kate-highlight-rules-detailled">
<title>Highlight Detection Rules</title>

<para>This section describes the syntax detection rules.</para>

<para>Each rule can match zero or more characters at the beginning of the string they are asked to test. If the rule matches, the matching characters are assigned the style or <emphasis>attribute</emphasis> defined by the rule, and a rule may ask that the current context is switched.</para>

<para>The <emphasis>attribute</emphasis> and <emphasis>context</emphasis> attributes are common to all rules.</para>

<para>A rule looks like this:</para>

<programlisting>&lt;RuleName attribute=&quot;(identifier)&quot; context=&quot;(identifier|order)&quot; [rule specific attributes] /&gt;</programlisting>

<para>The <emphasis>attribute</emphasis> identifies the style to use for matched characters by name or index, and the <emphasis>context</emphasis> identifies the context to use from here.</para>

<para>The <emphasis>attribute</emphasis> can be identified either by name, or by its zero-based index in the ItemDatas group.</para>

<para>The <emphasis>context</emphasis> can be identified by:</para>

<itemizedlist>
<listitem>
<para>An <emphasis>identifier</emphasis>, currently only its zero-based index in the contexts group.</para>
</listitem>
<listitem>
<para>An <emphasis>order</emphasis> telling the engine to stay in the current context (<userinput>#stay</userinput>), or to pop back to a previous context used in the string (<userinput>#pop</userinput>).</para> 
<para>To go back more steps, the #pop keyword can be repeated: <userinput>#pop#pop#pop</userinput></para> 
</listitem>
</itemizedlist>

<para>Some rules can have <emphasis>child rules</emphasis> which are then evaluated if and only if the parent rule matched. The entire matched string will be given the attribute defined by the parent rule. A rule with child rules looks like this:</para>

<programlisting>&lt;RuleName (attributes)&gt;
  &lt;ChildRuleName (attributes) /&gt;
  ...
&lt;/RuleName&gt;
</programlisting>


<para>Rule specific attributes varies and are described in the following list.</para> 

<variablelist>
<title>The Rules in Detail</title>

<varlistentry>
<term>DetectChar</term>
<listitem>
<para>Detect a single specific character. Commonly used for example to find the ends of quoted strings.</para> 
<programlisting>&lt;DetectChar char=&quot;(character)&quot; (common attributes) /&gt;</programlisting>
<para>The <userinput>char</userinput> attribute defines the character to match.</para> 
</listitem>
</varlistentry>

<varlistentry>
<term>Detect2Chars</term>
<listitem>
<para>Detect two specific characters in a defined order.</para>
<programlisting>&lt;Detect2Chars char=&quot;(character)&quot; char1=&quot;(character)&quot; (common attributes) /&gt;</programlisting>
<para>The <userinput>char</userinput> attribute defines the first character to match, <userinput>char1</userinput> the second.</para>
</listitem>
</varlistentry>

<varlistentry>
<term>AnyChar</term>
<listitem>
<para>Detect one character of a set of specified characters.</para>
<programlisting>&lt;AnyChar String=&quot;(string)&quot; (common attributes) /&gt;</programlisting>
<para>The <userinput>String</userinput> attribute defines the set of characters.</para>
</listitem>
</varlistentry>

<varlistentry>
<term>StringDetect</term>
<listitem>
<para>Detect an exact string.</para>
<programlisting>&lt;StringDetect String=&quot;(string)&quot; [insensitive=&quot;TRUE|FALSE;&quot;] (common attributes) /&gt;</programlisting>
<para>The <userinput>String</userinput> attribute defines the string to match. The <userinput>insensitive</userinput> attribute defaults to <userinput>FALSE</userinput> and is fed to the string comparison function. If the value is <userinput>TRUE</userinput> insensitive comparing is used.</para> 
</listitem>
</varlistentry>

<varlistentry>
<term>RegExpr</term>
<listitem>
<para>Matches against a regular expression.</para>
<programlisting>&lt;RegExpr String=&quot;(string)&quot; [insensitive=&quot;TRUE|FALSE;&quot;] [minimal=&quot;TRUE|FALSE&quot;] (common attributes) /&gt;</programlisting>
<para>The <userinput>String</userinput> attribute defines the regular expression.</para>
<para><userinput>insensitive</userinput> defaults to <userinput>FALSE</userinput> and is fed to the regular expression engine.</para>
<para><userinput>minimal</userinput> defaults to <userinput>FALSE</userinput> and is fed to the regular expression engine.</para>
<para>Because the rules are always matched against the beginning of the current string, a regular expression starting with a caret (<literal>^</literal>) indicates that the rule should only be matched against the start of a line.</para>
<para>See <link linkend="regular-expressions">Regular Expressions</link> for more information on those.</para> 
</listitem>
</varlistentry>

<varlistentry>
<term>Keyword</term>
<listitem>
<para>Detect a keyword from a specified list.</para>
<programlisting>&lt;keyword String=&quot;(list name)&quot; (common attributes) /&gt;</programlisting>
<para>The <userinput>String</userinput> attribute identifies the keyword list by name. A list with that name must exist.</para>
</listitem>
</varlistentry>

<varlistentry>
<term>Int</term>
<listitem>
<para>Detect an integer number.</para>
<para><programlisting>&lt;Int (common attributes) /&gt;</programlisting></para>
<para>This rule has no specific attributes. Child rules are typically used to detect combinations of <userinput>L</userinput> and <userinput>U</userinput> after the number, indicating the integer type in program code.</para>
</listitem>
</varlistentry>

<varlistentry>
<term>Float</term>
<listitem>
<para>Detect a floating point number.</para>
<para><programlisting>&lt;Float (common attributes)
/&gt;</programlisting></para> 
<para>This rule has no specific attributes.</para>
</listitem>
</varlistentry>

<varlistentry>
<term>HlCOct</term>
<listitem>
<para>Detect an octal point number representation.</para>
<para><programlisting>&lt;HlCOct (common attributes) /&gt;</programlisting></para>
<para>This rule has no specific attributes.</para>
</listitem>
</varlistentry>

<varlistentry>
<term>HlCHex</term>
<listitem>
<para>Detect a hexadecimal number representation.</para>
<para><programlisting>&lt;Int (common attributes) /&gt;</programlisting></para>
<para>This rule has no specific attributes.</para>
</listitem>
</varlistentry>

<varlistentry>
<term>HlCStringChar</term>
<listitem>
<para>Detect an escaped character.</para>
<para><programlisting>&lt;HlCStringChar (common attributes)
/&gt;</programlisting></para>
<para>This rule has no specific attributes.</para>

<para>It matches letteral representations of invisible characters commonly used in program code, for example <userinput>\n</userinput> (newline) or <userinput>\t</userinput> (TAB).</para>

<para>The following characters will match if they follow a backslash (<literal>\</literal>): <userinput>abefnrtv&quot;'?</userinput>. Additionally, escaped hexadecimal numbers like for example <userinput>\xff</userinput> and escaped octal numbers, for example <userinput>\033</userinput> will match.</para>

</listitem>
</varlistentry>

<varlistentry>
<term>RangeDetect</term>
<listitem>
<para>Detect a string with defined start and end characters.</para>
<programlisting>&lt;RangeDetect char=&quot;(character)&quot;  char1=&quot;(character)&quot; (common attributes) /&gt;</programlisting>
<para><userinput>char</userinput> defines the character starting the range, <userinput>char2</userinput> the character ending the range.</para>
<para>Useful to detect for example small quoted strings and the like, but note that since the hl engine works on one line at a time, this will not find strings spanning over a line break.</para>
</listitem>
</varlistentry>

<varlistentry>
<term>LineContinue</term>
<listitem>
<para>Matches at end of line.</para>
<programlisting>&lt;LineContinue (common attributes) /&gt;</programlisting>
<para>This rule has no specific attributes.</para>
<para>This rule is useful for switching context at end of line.</para>
</listitem>
</varlistentry>

</variablelist>

</sect1>

</appendix>