summaryrefslogtreecommitdiffstats
path: root/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook
diff options
context:
space:
mode:
Diffstat (limited to 'tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook')
-rw-r--r--tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook1167
1 files changed, 217 insertions, 950 deletions
diff --git a/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook b/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook
index c692da92cd5..5adc38a3f0c 100644
--- a/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook
+++ b/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook
@@ -1,491 +1,162 @@
<appendix id="regular-expressions">
<appendixinfo>
<authorgroup>
-<author
->&Anders.Lund; &Anders.Lund.mail;</author>
-<othercredit role="translator"
-><firstname
->Malcolm</firstname
-><surname
->Hunter</surname
-><affiliation
-><address
-><email
-></address
-></affiliation
-><contrib
->Conversion to British English</contrib
-></othercredit
->
+<author>&Anders.Lund; &Anders.Lund.mail;</author>
+<othercredit role="translator"><firstname>Malcolm</firstname><surname>Hunter</surname><affiliation><address><email>[email protected]</email></address></affiliation><contrib>Conversion to British English</contrib></othercredit>
</authorgroup>
</appendixinfo>
-<title
->Regular Expressions</title>
+<title>Regular Expressions</title>
-<synopsis
->This Appendix contains a brief but hopefully sufficient and
-covering introduction to the world of <emphasis
->regular
-expressions</emphasis
->. It documents regular expressions in the form
+<synopsis>This Appendix contains a brief but hopefully sufficient and
+covering introduction to the world of <emphasis>regular
+expressions</emphasis>. It documents regular expressions in the form
available within &kate;, which is not compatible with the regular
expressions of perl, nor with those of for example
-<command
->grep</command
->.</synopsis>
+<command>grep</command>.</synopsis>
<sect1>
-<title
->Introduction</title>
-
-<para
-><emphasis
->Regular Expressions</emphasis
-> provides us with a way to describe some possible contents of a text string in a way understood by a small piece of software, so that it can investigate if a text matches, and also in the case of advanced applications with the means of saving pieces or the matching text.</para>
-
-<para
->An example: Say you want to search a text for paragraphs that starts with either of the names <quote
->Henrik</quote
-> or <quote
->Pernille</quote
-> followed by some form of the verb <quote
->say</quote
->.</para>
-
-<para
->With a normal search, you would start out searching for the first name, <quote
->Henrik</quote
-> maybe followed by <quote
->sa</quote
-> like this: <userinput
->Henrik sa</userinput
->, and while looking for matches, you would have to discard those not being the beginning of a paragraph, as well as those in which the word starting with the letters <quote
->sa</quote
-> was not either <quote
->says</quote
->, <quote
->said</quote
-> or so. And then of cause repeat all of that with the next name...</para>
-
-<para
->With Regular Expressions, that task could be accomplished with a single search, and with a larger degree of preciseness.</para>
-
-<para
->To achieve this, Regular Expressions defines rules for expressing in details a generalisation of a string to match. Our example, which we might literally express like this: <quote
->A line starting with either <quote
->Henrik</quote
-> or <quote
->Pernille</quote
-> (possibly following up to 4 blanks or tab characters) followed by a whitespace followed by <quote
->sa</quote
-> and then either <quote
->ys</quote
-> or <quote
->id</quote
-></quote
-> could be expressed with the following regular expression:</para
-> <para
-><userinput
->^[ \t]{0,4}(Henrik|Pernille) sa(ys|id)</userinput
-></para>
-
-<para
->The above example demonstrates all four major concepts of modern Regular Expressions, namely:</para>
+<title>Introduction</title>
+
+<para><emphasis>Regular Expressions</emphasis> provides us with a way to describe some possible contents of a text string in a way understood by a small piece of software, so that it can investigate if a text matches, and also in the case of advanced applications with the means of saving pieces or the matching text.</para>
+
+<para>An example: Say you want to search a text for paragraphs that starts with either of the names <quote>Henrik</quote> or <quote>Pernille</quote> followed by some form of the verb <quote>say</quote>.</para>
+
+<para>With a normal search, you would start out searching for the first name, <quote>Henrik</quote> maybe followed by <quote>sa</quote> like this: <userinput>Henrik sa</userinput>, and while looking for matches, you would have to discard those not being the beginning of a paragraph, as well as those in which the word starting with the letters <quote>sa</quote> was not either <quote>says</quote>, <quote>said</quote> or so. And then of cause repeat all of that with the next name...</para>
+
+<para>With Regular Expressions, that task could be accomplished with a single search, and with a larger degree of preciseness.</para>
+
+<para>To achieve this, Regular Expressions defines rules for expressing in details a generalisation of a string to match. Our example, which we might literally express like this: <quote>A line starting with either <quote>Henrik</quote> or <quote>Pernille</quote> (possibly following up to 4 blanks or tab characters) followed by a whitespace followed by <quote>sa</quote> and then either <quote>ys</quote> or <quote>id</quote></quote> could be expressed with the following regular expression:</para> <para><userinput>^[ \t]{0,4}(Henrik|Pernille) sa(ys|id)</userinput></para>
+
+<para>The above example demonstrates all four major concepts of modern Regular Expressions, namely:</para>
<itemizedlist>
-<listitem
-><para
->Patterns</para
-></listitem>
-<listitem
-><para
->Assertions</para
-></listitem>
-<listitem
-><para
->Quantifiers</para
-></listitem>
-<listitem
-><para
->Back references</para
-></listitem>
+<listitem><para>Patterns</para></listitem>
+<listitem><para>Assertions</para></listitem>
+<listitem><para>Quantifiers</para></listitem>
+<listitem><para>Back references</para></listitem>
</itemizedlist>
-<para
->The caret (<literal
->^</literal
->) starting the expression is an assertion, being true only if the following matching string is at the start of a line.</para>
-
-<para
->The stings <literal
->[ \t]</literal
-> and <literal
->(Henrik|Pernille) sa(ys|id)</literal
-> are patterns. The first one is a <emphasis
->character class</emphasis
-> that matches either a blank or a (horizontal) tab character; the other pattern contains first a subpattern matching either <literal
->Henrik</literal
-> <emphasis
->or</emphasis
-> <literal
->Pernille</literal
->, then a piece matching the exact string <literal
-> sa</literal
-> and finally a subpattern matching either <literal
->ys</literal
-> <emphasis
->or</emphasis
-> <literal
->id</literal
-></para>
-
-<para
->The string <literal
->{0,4}</literal
-> is a quantifier saying <quote
->anywhere from 0 up to 4 of the previous</quote
->.</para>
-
-<para
->Because regular expression software supporting the concept of <emphasis
->back references</emphasis
-> saves the entire matching part of the string as well as sub-patterns enclosed in parentheses, given some means of access to those references, we could get our hands on either the whole match (when searching a text document in an editor with a regular expression, that is often marked as selected) or either the name found, or the last part of the verb.</para>
-
-<para
->All together, the expression will match where we wanted it to, and only there.</para>
-
-<para
->The following sections will describe in details how to construct and use patterns, character classes, assertions, quantifiers and back references, and the final section will give a few useful examples.</para>
+<para>The caret (<literal>^</literal>) starting the expression is an assertion, being true only if the following matching string is at the start of a line.</para>
+
+<para>The stings <literal>[ \t]</literal> and <literal>(Henrik|Pernille) sa(ys|id)</literal> are patterns. The first one is a <emphasis>character class</emphasis> that matches either a blank or a (horizontal) tab character; the other pattern contains first a subpattern matching either <literal>Henrik</literal> <emphasis>or</emphasis> <literal>Pernille</literal>, then a piece matching the exact string <literal> sa</literal> and finally a subpattern matching either <literal>ys</literal> <emphasis>or</emphasis> <literal>id</literal></para>
+
+<para>The string <literal>{0,4}</literal> is a quantifier saying <quote>anywhere from 0 up to 4 of the previous</quote>.</para>
+
+<para>Because regular expression software supporting the concept of <emphasis>back references</emphasis> saves the entire matching part of the string as well as sub-patterns enclosed in parentheses, given some means of access to those references, we could get our hands on either the whole match (when searching a text document in an editor with a regular expression, that is often marked as selected) or either the name found, or the last part of the verb.</para>
+
+<para>All together, the expression will match where we wanted it to, and only there.</para>
+
+<para>The following sections will describe in details how to construct and use patterns, character classes, assertions, quantifiers and back references, and the final section will give a few useful examples.</para>
</sect1>
<sect1 id="regex-patterns">
-<title
->Patterns</title>
+<title>Patterns</title>
-<para
->Patterns consists of literal strings and character classes. Patterns may contain sub-patterns, which are patterns enclosed in parentheses.</para>
+<para>Patterns consists of literal strings and character classes. Patterns may contain sub-patterns, which are patterns enclosed in parentheses.</para>
<sect2>
-<title
->Escaping characters</title>
+<title>Escaping characters</title>
-<para
->In patterns as well as in character classes, some characters have a special meaning. To literally match any of those characters, they must be marked or <emphasis
->escaped</emphasis
-> to let the regular expression software know that it should interpret such characters in their literal meaning.</para>
+<para>In patterns as well as in character classes, some characters have a special meaning. To literally match any of those characters, they must be marked or <emphasis>escaped</emphasis> to let the regular expression software know that it should interpret such characters in their literal meaning.</para>
-<para
->This is done by prepending the character with a backslash (<literal
->\</literal
->).</para>
+<para>This is done by prepending the character with a backslash (<literal>\</literal>).</para>
-<para
->The regular expression software will silently ignore escaping a character that does not have any special meaning in the context, so escaping for example a <quote
->j</quote
-> (<userinput
->\j</userinput
->) is safe. If you are in doubt whether a character could have a special meaning, you can therefore escape it safely.</para>
+<para>The regular expression software will silently ignore escaping a character that does not have any special meaning in the context, so escaping for example a <quote>j</quote> (<userinput>\j</userinput>) is safe. If you are in doubt whether a character could have a special meaning, you can therefore escape it safely.</para>
-<para
->Escaping of cause includes the backslash character it self, to literally match a such, you would write <userinput
->\\</userinput
->.</para>
+<para>Escaping of cause includes the backslash character it self, to literally match a such, you would write <userinput>\\</userinput>.</para>
</sect2>
<sect2>
-<title
->Character Classes and abbreviations</title>
-
-<para
->A <emphasis
->character class</emphasis
-> is an expression that matches one of a defined set of characters. In Regular Expressions, character classes are defined by putting the legal characters for the class in square brackets, <literal
->[]</literal
->, or by using one of the abbreviated classes described below.</para>
-
-<para
->Simple character classes just contains one or more literal characters, for example <userinput
->[abc]</userinput
-> (matching either of the letters <quote
->a</quote
->, <quote
->b</quote
-> or <quote
->c</quote
->) or <userinput
->[0123456789]</userinput
-> (matching any digit).</para>
-
-<para
->Because letters and digits have a logical order, you can abbreviate those by specifying ranges of them: <userinput
->[a-c]</userinput
-> is equal to <userinput
->[abc]</userinput
-> and <userinput
->[0-9]</userinput
-> is equal to <userinput
->[0123456789]</userinput
->. Combining these constructs, for example <userinput
->[a-fynot1-38]</userinput
-> is completely legal (the last one would match, of cause, either of <quote
->a</quote
->,<quote
->b</quote
->,<quote
->c</quote
->,<quote
->d</quote
->, <quote
->e</quote
->,<quote
->f</quote
->,<quote
->y</quote
->,<quote
->n</quote
->,<quote
->o</quote
->,<quote
->t</quote
->, <quote
->1</quote
->,<quote
->2</quote
->,<quote
->3</quote
-> or <quote
->8</quote
->).</para>
-
-<para
->As capital letters are different characters from their non-capital equivalents, to create a caseless character class matching <quote
->a</quote
-> or <quote
->b</quote
->, in any case, you need to write it <userinput
->[aAbB]</userinput
->.</para>
-
-<para
->It is of cause possible to create a <quote
->negative</quote
-> class matching as <quote
->anything but</quote
-> To do so put a caret (<literal
->^</literal
->) at the beginning of the class: </para>
-
-<para
-><userinput
->[^abc]</userinput
-> will match any character <emphasis
->but</emphasis
-> <quote
->a</quote
->, <quote
->b</quote
-> or <quote
->c</quote
->.</para>
-
-<para
->In addition to literal characters, some abbreviations are defined, making life still a bit easier: <variablelist>
+<title>Character Classes and abbreviations</title>
+
+<para>A <emphasis>character class</emphasis> is an expression that matches one of a defined set of characters. In Regular Expressions, character classes are defined by putting the legal characters for the class in square brackets, <literal>[]</literal>, or by using one of the abbreviated classes described below.</para>
+
+<para>Simple character classes just contains one or more literal characters, for example <userinput>[abc]</userinput> (matching either of the letters <quote>a</quote>, <quote>b</quote> or <quote>c</quote>) or <userinput>[0123456789]</userinput> (matching any digit).</para>
+
+<para>Because letters and digits have a logical order, you can abbreviate those by specifying ranges of them: <userinput>[a-c]</userinput> is equal to <userinput>[abc]</userinput> and <userinput>[0-9]</userinput> is equal to <userinput>[0123456789]</userinput>. Combining these constructs, for example <userinput>[a-fynot1-38]</userinput> is completely legal (the last one would match, of cause, either of <quote>a</quote>,<quote>b</quote>,<quote>c</quote>,<quote>d</quote>, <quote>e</quote>,<quote>f</quote>,<quote>y</quote>,<quote>n</quote>,<quote>o</quote>,<quote>t</quote>, <quote>1</quote>,<quote>2</quote>,<quote>3</quote> or <quote>8</quote>).</para>
+
+<para>As capital letters are different characters from their non-capital equivalents, to create a caseless character class matching <quote>a</quote> or <quote>b</quote>, in any case, you need to write it <userinput>[aAbB]</userinput>.</para>
+
+<para>It is of cause possible to create a <quote>negative</quote> class matching as <quote>anything but</quote> To do so put a caret (<literal>^</literal>) at the beginning of the class: </para>
+
+<para><userinput>[^abc]</userinput> will match any character <emphasis>but</emphasis> <quote>a</quote>, <quote>b</quote> or <quote>c</quote>.</para>
+
+<para>In addition to literal characters, some abbreviations are defined, making life still a bit easier: <variablelist>
<varlistentry>
-<term
-><userinput
->\a</userinput
-></term>
-<listitem
-><para
->This matches the <acronym
->ASCII</acronym
-> bell character (BEL, 0x07).</para
-></listitem>
+<term><userinput>\a</userinput></term>
+<listitem><para>This matches the <acronym>ASCII</acronym> bell character (BEL, 0x07).</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\f</userinput
-></term>
-<listitem
-><para
->This matches the <acronym
->ASCII</acronym
-> form feed character (FF, 0x0C).</para
-></listitem>
+<term><userinput>\f</userinput></term>
+<listitem><para>This matches the <acronym>ASCII</acronym> form feed character (FF, 0x0C).</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\n</userinput
-></term>
-<listitem
-><para
->This matches the <acronym
->ASCII</acronym
-> line feed character (LF, 0x0A, Unix newline).</para
-></listitem>
+<term><userinput>\n</userinput></term>
+<listitem><para>This matches the <acronym>ASCII</acronym> line feed character (LF, 0x0A, Unix newline).</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\r</userinput
-></term>
-<listitem
-><para
->This matches the <acronym
->ASCII</acronym
-> carriage return character (CR, 0x0D).</para
-></listitem>
+<term><userinput>\r</userinput></term>
+<listitem><para>This matches the <acronym>ASCII</acronym> carriage return character (CR, 0x0D).</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\t</userinput
-></term>
-<listitem
-><para
->This matches the <acronym
->ASCII</acronym
-> horizontal tab character (HT, 0x09).</para
-></listitem>
+<term><userinput>\t</userinput></term>
+<listitem><para>This matches the <acronym>ASCII</acronym> horizontal tab character (HT, 0x09).</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\v</userinput
-></term>
-<listitem
-><para
->This matches the <acronym
->ASCII</acronym
-> vertical tab character (VT, 0x0B).</para
-></listitem>
+<term><userinput>\v</userinput></term>
+<listitem><para>This matches the <acronym>ASCII</acronym> vertical tab character (VT, 0x0B).</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\xhhhh</userinput
-></term>
-
-<listitem
-><para
->This matches the Unicode character corresponding to the hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo (&ie;, \zero ooo) matches the <acronym
->ASCII</acronym
->/Latin-1 character corresponding to the octal number ooo (between 0 and 0377).</para
-></listitem>
+<term><userinput>\xhhhh</userinput></term>
+
+<listitem><para>This matches the Unicode character corresponding to the hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo (&ie;, \zero ooo) matches the <acronym>ASCII</acronym>/Latin-1 character corresponding to the octal number ooo (between 0 and 0377).</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->.</userinput
-> (dot)</term>
-<listitem
-><para
->This matches any character (including newline).</para
-></listitem>
+<term><userinput>.</userinput> (dot)</term>
+<listitem><para>This matches any character (including newline).</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\d</userinput
-></term>
-<listitem
-><para
->This matches a digit. Equal to <literal
->[0-9]</literal
-></para
-></listitem>
+<term><userinput>\d</userinput></term>
+<listitem><para>This matches a digit. Equal to <literal>[0-9]</literal></para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\D</userinput
-></term>
-<listitem
-><para
->This matches a non-digit. Equal to <literal
->[^0-9]</literal
-> or <literal
->[^\d]</literal
-></para
-></listitem>
+<term><userinput>\D</userinput></term>
+<listitem><para>This matches a non-digit. Equal to <literal>[^0-9]</literal> or <literal>[^\d]</literal></para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\s</userinput
-></term>
-<listitem
-><para
->This matches a whitespace character. Practically equal to <literal
->[ \t\n\r]</literal
-></para
-></listitem>
+<term><userinput>\s</userinput></term>
+<listitem><para>This matches a whitespace character. Practically equal to <literal>[ \t\n\r]</literal></para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\S</userinput
-></term>
-<listitem
-><para
->This matches a non-whitespace. Practically equal to <literal
->[^ \t\r\n]</literal
->, and equal to <literal
->[^\s]</literal
-></para
-></listitem>
+<term><userinput>\S</userinput></term>
+<listitem><para>This matches a non-whitespace. Practically equal to <literal>[^ \t\r\n]</literal>, and equal to <literal>[^\s]</literal></para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\w</userinput
-></term>
-<listitem
-><para
->Matches any <quote
->word character</quote
-> - in this case any letter or digit. Note that underscore (<literal
->_</literal
->) is not matched, as is the case with perl regular expressions. Equal to <literal
->[a-zA-Z0-9]</literal
-></para
-></listitem>
+<term><userinput>\w</userinput></term>
+<listitem><para>Matches any <quote>word character</quote> - in this case any letter or digit. Note that underscore (<literal>_</literal>) is not matched, as is the case with perl regular expressions. Equal to <literal>[a-zA-Z0-9]</literal></para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\W</userinput
-></term>
-<listitem
-><para
->Matches any non-word character - anything but letters or numbers. Equal to <literal
->[^a-zA-Z0-9]</literal
-> or <literal
->[^\w]</literal
-></para
-></listitem>
+<term><userinput>\W</userinput></term>
+<listitem><para>Matches any non-word character - anything but letters or numbers. Equal to <literal>[^a-zA-Z0-9]</literal> or <literal>[^\w]</literal></para></listitem>
</varlistentry>
@@ -493,69 +164,31 @@ expressions of perl, nor with those of for example
</para>
-<para
->The abbreviated classes can be put inside a custom class, for example to match a word character, a blank or a dot, you could write <userinput
->[\w \.]</userinput
-></para
->
+<para>The abbreviated classes can be put inside a custom class, for example to match a word character, a blank or a dot, you could write <userinput>[\w \.]</userinput></para>
-<note
-> <para
->The POSIX notation of classes, <userinput
->[:&lt;class name&gt;:]</userinput
-> is currently not supported.</para
-> </note>
+<note> <para>The POSIX notation of classes, <userinput>[:&lt;class name&gt;:]</userinput> is currently not supported.</para> </note>
<sect3>
-<title
->Characters with special meanings inside character classes</title>
+<title>Characters with special meanings inside character classes</title>
-<para
->The following characters has a special meaning inside the <quote
->[]</quote
-> character class construct, and must be escaped to be literally included in a class:</para>
+<para>The following characters has a special meaning inside the <quote>[]</quote> character class construct, and must be escaped to be literally included in a class:</para>
<variablelist>
<varlistentry>
-<term
-><userinput
->]</userinput
-></term>
-<listitem
-><para
->Ends the character class. Must be escaped unless it is the very first character in the class (may follow an unescaped caret)</para
-></listitem>
+<term><userinput>]</userinput></term>
+<listitem><para>Ends the character class. Must be escaped unless it is the very first character in the class (may follow an unescaped caret)</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->^</userinput
-> (caret)</term>
-<listitem
-><para
->Denotes a negative class, if it is the first character. Must be escaped to match literally if it is the first character in the class.</para
-></listitem
->
+<term><userinput>^</userinput> (caret)</term>
+<listitem><para>Denotes a negative class, if it is the first character. Must be escaped to match literally if it is the first character in the class.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->-</userinput
-> (dash)</term>
-<listitem
-><para
->Denotes a logical range. Must always be escaped within a character class.</para
-></listitem>
+<term><userinput>-</userinput> (dash)</term>
+<listitem><para>Denotes a logical range. Must always be escaped within a character class.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\</userinput
-> (backslash)</term>
-<listitem
-><para
->The escape character. Must always be escaped.</para
-></listitem>
+<term><userinput>\</userinput> (backslash)</term>
+<listitem><para>The escape character. Must always be escaped.</para></listitem>
</varlistentry>
</variablelist>
@@ -566,240 +199,110 @@ expressions of perl, nor with those of for example
<sect2>
-<title
->Alternatives: matching <quote
->one of</quote
-></title>
-
-<para
->If you want to match one of a set of alternative patterns, you can separate those with <literal
->|</literal
-> (vertical bar character).</para>
-
-<para
->For example to find either <quote
->John</quote
-> or <quote
->Harry</quote
-> you would use an expression <userinput
->John|Harry</userinput
->.</para>
+<title>Alternatives: matching <quote>one of</quote></title>
+
+<para>If you want to match one of a set of alternative patterns, you can separate those with <literal>|</literal> (vertical bar character).</para>
+
+<para>For example to find either <quote>John</quote> or <quote>Harry</quote> you would use an expression <userinput>John|Harry</userinput>.</para>
</sect2>
<sect2>
-<title
->Sub Patterns</title>
+<title>Sub Patterns</title>
-<para
-><emphasis
->Sub patterns</emphasis
-> are patterns enclosed in parentheses, and they have several uses in the world of regular expressions.</para>
+<para><emphasis>Sub patterns</emphasis> are patterns enclosed in parentheses, and they have several uses in the world of regular expressions.</para>
<sect3>
-<title
->Specifying alternatives</title>
-
-<para
->You may use a sub pattern to group a set of alternatives within a larger pattern. The alternatives are separated by the character <quote
->|</quote
-> (vertical bar).</para>
-
-<para
->For example to match either of the words <quote
->int</quote
->, <quote
->float</quote
-> or <quote
->double</quote
->, you could use the pattern <userinput
->int|float|double</userinput
->. If you only want to find one if it is followed by some whitespace and then some letters, put the alternatives inside a subpattern: <userinput
->(int|float|double)\s+\w+</userinput
->.</para>
+<title>Specifying alternatives</title>
+
+<para>You may use a sub pattern to group a set of alternatives within a larger pattern. The alternatives are separated by the character <quote>|</quote> (vertical bar).</para>
+
+<para>For example to match either of the words <quote>int</quote>, <quote>float</quote> or <quote>double</quote>, you could use the pattern <userinput>int|float|double</userinput>. If you only want to find one if it is followed by some whitespace and then some letters, put the alternatives inside a subpattern: <userinput>(int|float|double)\s+\w+</userinput>.</para>
</sect3>
<sect3>
-<title
->Capturing matching text (back references)</title>
-
-<para
->If you want to use a back reference, use a sub pattern to have the desired part of the pattern remembered.</para>
-
-<para
->For example, it you want to find two occurrences of the same word separated by a comma and possibly some whitespace, you could write <userinput
->(\w+),\s*\1</userinput
->. The sub pattern <literal
->\w+</literal
-> would find a chunk of word characters, and the entire expression would match if those were followed by a comma, 0 or more whitespace and then an equal chunk of word characters. (The string <literal
->\1</literal
-> references <emphasis
->the first sub pattern enclosed in parentheses</emphasis
->)</para>
-
-<!-- <para
->See also <link linkend="backreferences"
->Back references</link
->.</para
-> -->
+<title>Capturing matching text (back references)</title>
+
+<para>If you want to use a back reference, use a sub pattern to have the desired part of the pattern remembered.</para>
+
+<para>For example, it you want to find two occurrences of the same word separated by a comma and possibly some whitespace, you could write <userinput>(\w+),\s*\1</userinput>. The sub pattern <literal>\w+</literal> would find a chunk of word characters, and the entire expression would match if those were followed by a comma, 0 or more whitespace and then an equal chunk of word characters. (The string <literal>\1</literal> references <emphasis>the first sub pattern enclosed in parentheses</emphasis>)</para>
+
+<!-- <para>See also <link linkend="backreferences">Back references</link>.</para> -->
</sect3>
<sect3 id="lookahead-assertions">
-<title
->Lookahead Assertions</title>
-
-<para
->A lookahead assertion is a sub pattern, starting with either <literal
->?=</literal
-> or <literal
->?!</literal
->.</para>
-
-<para
->For example to match the literal string <quote
->Bill</quote
-> but only if not followed by <quote
-> Gates</quote
->, you could use this expression: <userinput
->Bill(?! Gates)</userinput
->. (This would find <quote
->Bill Clinton</quote
-> as well as <quote
->Billy the kid</quote
->, but silently ignore the other matches.)</para>
-
-<para
->Sub patterns used for assertions are not captured.</para>
-
-<para
->See also <link linkend="assertions"
->Assertions</link
-></para>
+<title>Lookahead Assertions</title>
+
+<para>A lookahead assertion is a sub pattern, starting with either <literal>?=</literal> or <literal>?!</literal>.</para>
+
+<para>For example to match the literal string <quote>Bill</quote> but only if not followed by <quote> Gates</quote>, you could use this expression: <userinput>Bill(?! Gates)</userinput>. (This would find <quote>Bill Clinton</quote> as well as <quote>Billy the kid</quote>, but silently ignore the other matches.)</para>
+
+<para>Sub patterns used for assertions are not captured.</para>
+
+<para>See also <link linkend="assertions">Assertions</link></para>
</sect3>
</sect2>
<sect2 id="special-characters-in-patterns">
-<title
->Characters with a special meaning inside patterns</title>
+<title>Characters with a special meaning inside patterns</title>
-<para
->The following characters have meaning inside a pattern, and must be escaped if you want to literally match them: <variablelist>
+<para>The following characters have meaning inside a pattern, and must be escaped if you want to literally match them: <variablelist>
<varlistentry>
-<term
-><userinput
->\</userinput
-> (backslash)</term>
-<listitem
-><para
->The escape character.</para
-></listitem>
+<term><userinput>\</userinput> (backslash)</term>
+<listitem><para>The escape character.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->^</userinput
-> (caret)</term>
-<listitem
-><para
->Asserts the beginning of the string.</para
-></listitem>
+<term><userinput>^</userinput> (caret)</term>
+<listitem><para>Asserts the beginning of the string.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->$</userinput
-></term>
-<listitem
-><para
->Asserts the end of string.</para
-></listitem>
+<term><userinput>$</userinput></term>
+<listitem><para>Asserts the end of string.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->()</userinput
-> (left and right parentheses)</term>
-<listitem
-><para
->Denotes sub patterns.</para
-></listitem>
+<term><userinput>()</userinput> (left and right parentheses)</term>
+<listitem><para>Denotes sub patterns.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->{}</userinput
-> (left and right curly braces)</term>
-<listitem
-><para
->Denotes numeric quantifiers.</para
-></listitem>
+<term><userinput>{}</userinput> (left and right curly braces)</term>
+<listitem><para>Denotes numeric quantifiers.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->[]</userinput
-> (left and right square brackets)</term>
-<listitem
-><para
->Denotes character classes.</para
-></listitem>
+<term><userinput>[]</userinput> (left and right square brackets)</term>
+<listitem><para>Denotes character classes.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->|</userinput
-> (vertical bar)</term>
-<listitem
-><para
->logical OR. Separates alternatives.</para
-></listitem>
+<term><userinput>|</userinput> (vertical bar)</term>
+<listitem><para>logical OR. Separates alternatives.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->+</userinput
-> (plus sign)</term>
-<listitem
-><para
->Quantifier, 1 or more.</para
-></listitem>
+<term><userinput>+</userinput> (plus sign)</term>
+<listitem><para>Quantifier, 1 or more.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->*</userinput
-> (asterisk)</term>
-<listitem
-><para
->Quantifier, 0 or more.</para
-></listitem>
+<term><userinput>*</userinput> (asterisk)</term>
+<listitem><para>Quantifier, 0 or more.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->?</userinput
-> (question mark)</term>
-<listitem
-><para
->An optional character. Can be interpreted as a quantifier, 0 or 1.</para
-></listitem>
+<term><userinput>?</userinput> (question mark)</term>
+<listitem><para>An optional character. Can be interpreted as a quantifier, 0 or 1.</para></listitem>
</varlistentry>
</variablelist>
@@ -811,125 +314,58 @@ expressions of perl, nor with those of for example
</sect1>
<sect1 id="quantifiers">
-<title
->Quantifiers</title>
-
-<para
-><emphasis
->Quantifiers</emphasis
-> allows a regular expression to match a specified number or range of numbers of either a character, character class or sub pattern.</para>
-
-<para
->Quantifiers are enclosed in curly brackets (<literal
->{</literal
-> and <literal
->}</literal
->) and have the general form <literal
->{[minimum-occurrences][,[maximum-occurrences]]}</literal
-> </para>
-
-<para
->The usage is best explained by example: <variablelist>
+<title>Quantifiers</title>
+
+<para><emphasis>Quantifiers</emphasis> allows a regular expression to match a specified number or range of numbers of either a character, character class or sub pattern.</para>
+
+<para>Quantifiers are enclosed in curly brackets (<literal>{</literal> and <literal>}</literal>) and have the general form <literal>{[minimum-occurrences][,[maximum-occurrences]]}</literal> </para>
+
+<para>The usage is best explained by example: <variablelist>
<varlistentry>
-<term
-><userinput
->{1}</userinput
-></term>
-<listitem
-><para
->Exactly 1 occurrence</para
-></listitem>
+<term><userinput>{1}</userinput></term>
+<listitem><para>Exactly 1 occurrence</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->{0,1}</userinput
-></term>
-<listitem
-><para
->Zero or 1 occurrences</para
-></listitem>
+<term><userinput>{0,1}</userinput></term>
+<listitem><para>Zero or 1 occurrences</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->{,1}</userinput
-></term>
-<listitem
-><para
->The same, with less work;)</para
-></listitem>
+<term><userinput>{,1}</userinput></term>
+<listitem><para>The same, with less work;)</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->{5,10}</userinput
-></term>
-<listitem
-><para
->At least 5 but maximum 10 occurrences.</para
-></listitem>
+<term><userinput>{5,10}</userinput></term>
+<listitem><para>At least 5 but maximum 10 occurrences.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->{5,}</userinput
-></term>
-<listitem
-><para
->At least 5 occurrences, no maximum.</para
-></listitem>
+<term><userinput>{5,}</userinput></term>
+<listitem><para>At least 5 occurrences, no maximum.</para></listitem>
</varlistentry>
</variablelist>
</para>
-<para
->Additionally, there are some abbreviations: <variablelist>
+<para>Additionally, there are some abbreviations: <variablelist>
<varlistentry>
-<term
-><userinput
->*</userinput
-> (asterisk)</term>
-<listitem
-><para
->similar to <literal
->{0,}</literal
->, find any number of occurrences.</para
-></listitem>
+<term><userinput>*</userinput> (asterisk)</term>
+<listitem><para>similar to <literal>{0,}</literal>, find any number of occurrences.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->+</userinput
-> (plus sign)</term>
-<listitem
-><para
->similar to <literal
->{1,}</literal
->, at least 1 occurrence.</para
-></listitem>
+<term><userinput>+</userinput> (plus sign)</term>
+<listitem><para>similar to <literal>{1,}</literal>, at least 1 occurrence.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->?</userinput
-> (question mark)</term>
-<listitem
-><para
->similar to <literal
->{0,1}</literal
->, zero or 1 occurrence.</para
-></listitem>
+<term><userinput>?</userinput> (question mark)</term>
+<listitem><para>similar to <literal>{0,1}</literal>, zero or 1 occurrence.</para></listitem>
</varlistentry>
</variablelist>
@@ -938,98 +374,39 @@ expressions of perl, nor with those of for example
<sect2>
-<title
->Greed</title>
+<title>Greed</title>
-<para
->When using quantifiers with no maximum, regular expressions defaults to match as much of the searched string as possible, commonly known as <emphasis
->greedy</emphasis
-> behaviour.</para>
+<para>When using quantifiers with no maximum, regular expressions defaults to match as much of the searched string as possible, commonly known as <emphasis>greedy</emphasis> behaviour.</para>
-<para
->Modern regular expression software provides the means of <quote
->turning off greediness</quote
->, though in a graphical environment it is up to the interface to provide you with access to this feature. For example a search dialogue providing a regular expression search could have a check box labelled <quote
->Minimal matching</quote
-> as well as it ought to indicate if greediness is the default behaviour.</para>
+<para>Modern regular expression software provides the means of <quote>turning off greediness</quote>, though in a graphical environment it is up to the interface to provide you with access to this feature. For example a search dialogue providing a regular expression search could have a check box labelled <quote>Minimal matching</quote> as well as it ought to indicate if greediness is the default behaviour.</para>
</sect2>
<sect2>
-<title
->In context examples</title>
+<title>In context examples</title>
-<para
->Here are a few examples of using quantifiers</para>
+<para>Here are a few examples of using quantifiers</para>
<variablelist>
<varlistentry>
-<term
-><userinput
->^\d{4,5}\s</userinput
-></term>
-<listitem
-><para
->Matches the digits in <quote
->1234 go</quote
-> and <quote
->12345 now</quote
->, but neither in <quote
->567 eleven</quote
-> nor in <quote
->223459 somewhere</quote
-></para
-></listitem>
+<term><userinput>^\d{4,5}\s</userinput></term>
+<listitem><para>Matches the digits in <quote>1234 go</quote> and <quote>12345 now</quote>, but neither in <quote>567 eleven</quote> nor in <quote>223459 somewhere</quote></para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\s+</userinput
-></term>
-<listitem
-><para
->Matches one or more whitespace characters</para
-></listitem>
+<term><userinput>\s+</userinput></term>
+<listitem><para>Matches one or more whitespace characters</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->(bla){1,}</userinput
-></term>
-<listitem
-><para
->Matches all of <quote
->blablabla</quote
-> and the <quote
->bla</quote
-> in <quote
->blackbird</quote
-> or <quote
->tabla</quote
-></para
-></listitem>
+<term><userinput>(bla){1,}</userinput></term>
+<listitem><para>Matches all of <quote>blablabla</quote> and the <quote>bla</quote> in <quote>blackbird</quote> or <quote>tabla</quote></para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->/?&gt;</userinput
-></term>
-<listitem
-><para
->Matches <quote
->/&gt;</quote
-> in <quote
->&lt;closeditem/&gt;</quote
-> as well as <quote
->&gt;</quote
-> in <quote
->&lt;openitem&gt;</quote
->.</para
-></listitem>
+<term><userinput>/?&gt;</userinput></term>
+<listitem><para>Matches <quote>/&gt;</quote> in <quote>&lt;closeditem/&gt;</quote> as well as <quote>&gt;</quote> in <quote>&lt;openitem&gt;</quote>.</para></listitem>
</varlistentry>
</variablelist>
@@ -1039,164 +416,56 @@ expressions of perl, nor with those of for example
</sect1>
<sect1 id="assertions">
-<title
->Assertions</title>
-
-<para
-><emphasis
->Assertions</emphasis
-> allows a regular expression to match only under certain controlled conditions.</para>
-
-<para
->An assertion does not need a character to match, it rather investigates the surroundings of a possible match before acknowledging it. For example the <emphasis
->word boundary</emphasis
-> assertion does not try to find a non word character opposite a word one at its position, instead it makes sure that there is not a word character. This means that the assertion can match where there is no character, &ie; at the ends of a searched string.</para>
-
-<para
->Some assertions actually does have a pattern to match, but the part of the string matching that will not be a part of the result of the match of the full expression.</para>
-
-<para
->Regular Expressions as documented here supports the following assertions: <variablelist>
-
-<varlistentry
->
-<term
-><userinput
->^</userinput
-> (caret: beginning of string)</term
->
-<listitem
-><para
->Matches the beginning of the searched string.</para
-> <para
->The expression <userinput
->^Peter</userinput
-> will match at <quote
->Peter</quote
-> in the string <quote
->Peter, hey!</quote
-> but not in <quote
->Hey, Peter!</quote
-> </para
-> </listitem>
+<title>Assertions</title>
+
+<para><emphasis>Assertions</emphasis> allows a regular expression to match only under certain controlled conditions.</para>
+
+<para>An assertion does not need a character to match, it rather investigates the surroundings of a possible match before acknowledging it. For example the <emphasis>word boundary</emphasis> assertion does not try to find a non word character opposite a word one at its position, instead it makes sure that there is not a word character. This means that the assertion can match where there is no character, &ie; at the ends of a searched string.</para>
+
+<para>Some assertions actually does have a pattern to match, but the part of the string matching that will not be a part of the result of the match of the full expression.</para>
+
+<para>Regular Expressions as documented here supports the following assertions: <variablelist>
+
+<varlistentry>
+<term><userinput>^</userinput> (caret: beginning of string)</term>
+<listitem><para>Matches the beginning of the searched string.</para> <para>The expression <userinput>^Peter</userinput> will match at <quote>Peter</quote> in the string <quote>Peter, hey!</quote> but not in <quote>Hey, Peter!</quote> </para> </listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->$</userinput
-> (end of string)</term>
-<listitem
-><para
->Matches the end of the searched string.</para>
-
-<para
->The expression <userinput
->you\?$</userinput
-> will match at the last you in the string <quote
->You didn't do that, did you?</quote
-> but nowhere in <quote
->You didn't do that, right?</quote
-></para>
+<term><userinput>$</userinput> (end of string)</term>
+<listitem><para>Matches the end of the searched string.</para>
+
+<para>The expression <userinput>you\?$</userinput> will match at the last you in the string <quote>You didn't do that, did you?</quote> but nowhere in <quote>You didn't do that, right?</quote></para>
</listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\b</userinput
-> (word boundary)</term>
-<listitem
-><para
->Matches if there is a word character at one side and not a word character at the other.</para>
-<para
->This is useful to find word ends, for example both ends to find a whole word. The expression <userinput
->\bin\b</userinput
-> will match at the separate <quote
->in</quote
-> in the string <quote
->He came in through the window</quote
->, but not at the <quote
->in</quote
-> in <quote
->window</quote
->.</para
-></listitem>
+<term><userinput>\b</userinput> (word boundary)</term>
+<listitem><para>Matches if there is a word character at one side and not a word character at the other.</para>
+<para>This is useful to find word ends, for example both ends to find a whole word. The expression <userinput>\bin\b</userinput> will match at the separate <quote>in</quote> in the string <quote>He came in through the window</quote>, but not at the <quote>in</quote> in <quote>window</quote>.</para></listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->\B</userinput
-> (non word boundary)</term>
-<listitem
-><para
->Matches wherever <quote
->\b</quote
-> does not.</para>
-<para
->That means that it will match for example within words: The expression <userinput
->\Bin\B</userinput
-> will match at in <quote
->window</quote
-> but not in <quote
->integer</quote
-> or <quote
->I'm in love</quote
->.</para>
+<term><userinput>\B</userinput> (non word boundary)</term>
+<listitem><para>Matches wherever <quote>\b</quote> does not.</para>
+<para>That means that it will match for example within words: The expression <userinput>\Bin\B</userinput> will match at in <quote>window</quote> but not in <quote>integer</quote> or <quote>I'm in love</quote>.</para>
</listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->(?=PATTERN)</userinput
-> (Positive lookahead)</term>
-<listitem
-><para
->A lookahead assertion looks at the part of the string following a possible match. The positive lookahead will prevent the string from matching if the text following the possible match does not match the <emphasis
->PATTERN</emphasis
-> of the assertion, but the text matched by that will not be included in the result.</para>
-<para
->The expression <userinput
->handy(?=\w)</userinput
-> will match at <quote
->handy</quote
-> in <quote
->handyman</quote
-> but not in <quote
->That came in handy!</quote
-></para>
+<term><userinput>(?=PATTERN)</userinput> (Positive lookahead)</term>
+<listitem><para>A lookahead assertion looks at the part of the string following a possible match. The positive lookahead will prevent the string from matching if the text following the possible match does not match the <emphasis>PATTERN</emphasis> of the assertion, but the text matched by that will not be included in the result.</para>
+<para>The expression <userinput>handy(?=\w)</userinput> will match at <quote>handy</quote> in <quote>handyman</quote> but not in <quote>That came in handy!</quote></para>
</listitem>
</varlistentry>
<varlistentry>
-<term
-><userinput
->(?!PATTERN)</userinput
-> (Negative lookahead)</term>
-
-<listitem
-><para
->The negative lookahead prevents a possible match to be acknowledged if the following part of the searched string does match its <emphasis
->PATTERN</emphasis
->.</para>
-<para
->The expression <userinput
->const \w+\b(?!\s*&amp;)</userinput
-> will match at <quote
->const char</quote
-> in the string <quote
->const char* foo</quote
-> while it can not match <quote
->const QString</quote
-> in <quote
->const QString&amp; bar</quote
-> because the <quote
->&amp;</quote
-> matches the negative lookahead assertion pattern.</para>
+<term><userinput>(?!PATTERN)</userinput> (Negative lookahead)</term>
+
+<listitem><para>The negative lookahead prevents a possible match to be acknowledged if the following part of the searched string does match its <emphasis>PATTERN</emphasis>.</para>
+<para>The expression <userinput>const \w+\b(?!\s*&amp;)</userinput> will match at <quote>const char</quote> in the string <quote>const char* foo</quote> while it can not match <quote>const QString</quote> in <quote>const QString&amp; bar</quote> because the <quote>&amp;</quote> matches the negative lookahead assertion pattern.</para>
</listitem>
</varlistentry>
@@ -1208,11 +477,9 @@ expressions of perl, nor with those of for example
<!-- TODO sect1 id="backreferences">
-<title
->Back References</title>
+<title>Back References</title>
-<para
-></para>
+<para></para>
</sect1 -->