diff options
Diffstat (limited to 'doc/KRegExpEditor/index.docbook')
-rw-r--r-- | doc/KRegExpEditor/index.docbook | 652 |
1 files changed, 652 insertions, 0 deletions
diff --git a/doc/KRegExpEditor/index.docbook b/doc/KRegExpEditor/index.docbook new file mode 100644 index 0000000..fda544f --- /dev/null +++ b/doc/KRegExpEditor/index.docbook @@ -0,0 +1,652 @@ +<?xml version="1.0" ?> +<!DOCTYPE book PUBLIC "-//KDE//DTD DocBook XML V4.2-Based Variant V1.1//EN" "dtd/kdex.dtd" [ + <!ENTITY % English "INCLUDE"> + <!ENTITY % addindex "IGNORE"> +]> + +<book lang="&language;"> + + <bookinfo> + <title>The Regular Expression Editor Manual</title> + + <authorgroup> + <author> + <firstname>Jesper K.</firstname> + <surname>Pedersen</surname> + <affiliation><address><email>[email protected]</email></address></affiliation> + </author> + </authorgroup> + + <date>2001-07-03</date> + <releaseinfo>0.1</releaseinfo> + + <legalnotice>&underFDL;</legalnotice> + + <copyright> + <year>2001</year> + <holder>Jesper K. Pedersen</holder> + </copyright> + + <abstract> + <para>This Handbook describes the Regular Expression Editor widget</para> + </abstract> + + <keywordset> + <keyword>KDE</keyword> + <keyword>regular expression</keyword> + </keywordset> + </bookinfo> + + <!-- ====================================================================== --> + <!-- Introduction --> + <!-- ====================================================================== --> + <chapter id="introduction"> + <title>Introduction</title> + + + <para> + The regular expression editor is an editor for editing regular expression + in a graphical style (in contrast to the <acronym>ASCII</acronym> syntax). Traditionally + regular expressions have been typed in the <acronym>ASCII</acronym> syntax, which for example + looks like <literal>^.*kde\b</literal>. The major drawbacks of + this style are: + <itemizedlist> + <listitem><para>It is hard to understand for + non-programmers.</para></listitem> + + <listitem><para>It requires that you <emphasis>escape</emphasis> + certain symbols (to match a star for example, you need to type + <literal>\*</literal>). </para></listitem> + + <listitem><para>It requires that you remember rules for + <emphasis>precedence</emphasis> (What does <literal>x|y*</literal> + match? a single <literal>x</literal> or a number of + <literal>y</literal>, <emphasis>OR</emphasis> a number of + <literal>x</literal> and <literal>y</literal>'s mixed?)</para></listitem> + </itemizedlist> + </para> + + <para> + The regular expression editor, on the other hand, lets you + <emphasis>draw</emphasis> your regular expression in an unambiguous + way. The editor solves at least item two and three above. It might not make + regular expressions available for the non-programmers, though only tests by + users can tell that. So, if are you a non programmer, who has gained the + power of regular expression from this editor, then please + <ulink url="mailto:[email protected]">let me know</ulink>. + </para> + + </chapter> + + <!-- ====================================================================== --> + <!-- What is a Regular Expression --> + <!-- ====================================================================== --> + <chapter id="whatIsARegExp"> + <title>What is a Regular Expression</title> + + <para>A regular expression is a way to specify + <emphasis>conditions</emphasis> to be fulfilled for a situation + in mind. Normally when you search in a text editor you specify + the text to search for <emphasis>literally</emphasis>, using a + regular expression, on the other hand, you tell what a given + match would look like. Examples of this include <emphasis>I'm + searching for the word KDE, but only at the beginning of the + line</emphasis>, or <emphasis>I'm searching for the word + <literal>the</literal>, but it must stand on its own</emphasis>, + or <emphasis>I'm searching for files starting with the word + <literal>test</literal>, followed by a number of digits, for + example <literal>test12</literal>, <literal>test107</literal> + and <literal>test007</literal></emphasis></para> + + <para>You build regular expressions from smaller regular + expressions, just like you build large Lego toys from smaller + subparts. As in the Lego world, there are a number of basic + building blocks. In the following I will describe each of these + basic building blocks using a number of examples.</para> + + <example> + <title>Searching for normal text.</title> + <para>If you just want to search for a given text, a then regular + expression is definitely not a good choice. The reason for this is that + regular expressions assign special meaning to some characters. This + includes the following characters: <literal>.*|$</literal>. Thus if you want to + search for the text <literal>kde.</literal> (i.e. the characters + <literal>kde</literal> followed by a period), then you would need to + specify this as <literal>kde\.</literal><footnote><para>The regular + expression editor solves this problem by taking care of escape rules for + you.</para></footnote> Writing <literal>\.</literal> rather than just + <literal>.</literal> is called <emphasis>escaping</emphasis>. + </para> + </example> + + <example id="positionregexp"> + <title>Matching URLs</title> + <para>When you select something looking like a URL in KDE, then the + program <command>klipper</command> will offer to start + <command>konqueror</command> with the selected URL.</para> + + <para><command>Klipper</command> does this by matching the selection + against several different regular expressions, when one of the regular + expressions matches, the accommodating command will be offered.</para> + + <para>The regular expression for URLs says (among other things), that the + selection must start with the text <literal>http://</literal>. This is + described using regular expressions by prefixing the text + <literal>http://</literal> with a hat (the <literal>^</literal> + character).</para> + + <para>The above is an example of matching positions using regular + expressions. Similar, the position <emphasis>end-of-line</emphasis> can + be matched using the character <literal>$</literal> (i.e. a dollar + sign).</para> + </example> + + <example id="boundaryregexp"> + <title>Searching for the word <literal>the</literal>, but not + <emphasis>the</emphasis><literal>re</literal>, + <literal>brea</literal><emphasis>the</emphasis> or + <literal>ano</literal><emphasis>the</emphasis><literal>r</literal></title> + <para>Two extra position types can be matches in the above way, + namely <emphasis>the position at a word boundary</emphasis>, and + <emphasis>the position at a <emphasis>non</emphasis>-word + boundary</emphasis>. The positions are specified using the text + <literal>\b</literal> (for word-boundary) and <literal>\B</literal> (for + non-word boundary)<emphasis></emphasis></para> + + <para>Thus, searching for the word <literal>the</literal> can be done + using the regular expression <literal>\bthe\b</literal>. This specifies + that we are searching for <literal>the</literal> with no letters on each + side of it (i.e. with a word boundary on each side)</para> + + <para>The four position matching regular expressions are inserted in the + regular expression editor using <link linkend="positiontool">four + different positions tool</link></para> + </example> + + <example id="altnregexp"> + <title>Searching for either <literal>this</literal> or <literal>that</literal></title> + <para>Imagine that you want to run through your document searching for + either the word <literal>this</literal> or the word + <literal>that</literal>. With a normal search method you could do this in + two sweeps, the first time around, you would search for + <literal>this</literal>, and the second time around you would search for + <literal>that</literal>.</para> + + <para>Using regular expression searches you would search for both in the + same sweep. You do this by searching for + <literal>this|that</literal>. I.e. separating the two words with a + vertical bar.<footnote><para>Note on each side of the vertical bar is a + regular expression, so this feature is not only for searching for two + different pieces of text, but for searching for two different regular + expressions.</para></footnote></para> + + <para>In the regular expression editor you do not write the vertical bar + yourself, but instead select the <link linkend="altntool">alternative + tool</link>, and insert the smaller regular expressions above each other.</para> + </example> + + <example id="repeatregexp"> + <title>Matching anything</title> + <para>Regular expressions are often compared to wildcard matching in the + shell - that is the capability to specify a number of files using the + asterisk. You will most likely recognize wildcard matching from the + following examples: + <itemizedlist> + <listitem><para><literal>ls *.txt</literal> - here <literal>*.txt</literal> is + the shell wildcard matching every file ending with the + <literal>.txt</literal> extension.</para></listitem> + <listitem><para><literal>cat test??.res</literal> - matching every file starting with + <literal>test</literal> followed by two arbitrary characters, and finally + followed by the test <literal>.res</literal></para></listitem> + </itemizedlist> + </para> + + <para>In the shell the asterisk matches any character any number of + times. In other words, the asterisk matches <emphasis>anything</emphasis>. + This is written like <literal>.*</literal> with regular expression + syntax. The dot matches any single character, i.e. just + <emphasis>one</emphasis> character, and the asterisk, says that the + regular expression prior to it should be matched any number of + times. Together this says any single character any number of + times.</para> + + <para>This may seem overly complicated, but when you get the larger + picture you will see the power. Let me show you another basic regular + expression: <literal>a</literal>. The letter <literal>a</literal> on its + own is a regular expression that matches a single letter, namely the + letter <literal>a</literal>. If we combine this with the asterisk, + i.e. <literal>a*</literal>, then we have a regular expression matching + any number of a's.</para> + + <para>We can combine several regular expression after each + other, for example <literal>ba(na)*</literal>. + <footnote><para><literal>(na)*</literal> just says that what is inside + the parenthesis is repeated any number of times.</para></footnote> + Imagine you had typed this regular expression into the search field in a + text editor, then you would have found the following words (among + others): <literal>ba</literal>, <literal>bana</literal>, + <literal>banana</literal>, <literal>bananananananana</literal> + </para> + + <para>Given the information above, it hopefully isn't hard for you to write the + shell wildcard <literal>test??.res</literal> as a regular expression + Answer: <literal>test..\.res</literal>. The dot on its own is any + character. To match a single dot you must write + <literal>\.</literal><footnote><para>This is called escaping</para></footnote>. In + other word, the regular expression <literal>\.</literal> matches a dot, + while a dot on its own matches any character. </para> + + <para>In the regular expression editor, a repeated regular expression is + created using the <link linkend="repeattool">repeat tool</link> </para> + </example> + + <example id="lookaheadregexp"> + <title>Replacing <literal>&</literal> with + <literal>&amp;</literal> in a HTML document</title> <para>In + HTML the special character <literal>&</literal> must be + written as <literal>&amp;</literal> - this is similar to + escaping in regular expressions.</para> + + <para>Imagine that you have written an HTML document in a normal editor + (e.g. XEmacs or Kate), and you totally forgot about this rule. What you + would do when realized your mistake was to replace every occurrences of + <literal>&</literal> with <literal>&amp;</literal>.</para> + + <para>This can easily be done using normal search and replace, + there is, however, one glitch. Imagine that you did remember + this rule - <emphasis>just a bit</emphasis> - and did it right + in some places. Replacing unconditionally would result in + <literal>&amp;</literal> being replaced with + <literal>&amp;amp;</literal></para> + + <para>What you really want to say is that <literal>&</literal> should + only be replaced if it is <emphasis>not</emphasis> followed by the letters + <literal>amp;</literal>. You can do this using regular expressions using + <emphasis>positive lookahead</emphasis>. </para> + + <para>The regular expression, which only matches an ampersand if it is + not followed by the letters <literal>amp;</literal> looks as follows: + <literal>&(?!amp;)</literal>. This is, of course, easier to read using + the regular expression editor, where you would use the + <link linkend="lookaheadtools">lookahead tools</link>.</para> + </example> + + </chapter> + + <!-- ====================================================================== --> + <!-- Using the Regular Expression Editor --> + <!-- ====================================================================== --> + <chapter id="theEditor"> + <title>Using the Regular Expression Editor</title> + + <para> + This chapter will tell you about how the regular expression editor works. + </para> + + <!-- ====================================================================== --> + <!-- The organization of the screen --> + <!-- ====================================================================== --> + <sect1 id="screenorganization"> + <title>The organization of the screen</title> + + <mediaobject> + <imageobject><imagedata format="PNG" fileref="theEditor.png"/></imageobject> + </mediaobject> + + <para>The most important part of the editor is of course the editing + area, this is the area where you draw your regular expression. This + area is the larger gray one in the middle.</para> + + <para>Above the editing area you have two Toolbars, the first one + contains the <link linkend="editingtools">editing actions</link> - + much like drawing tools in a drawing program. The second Toolbar + contains the <emphasis>whats this</emphasis> button, and buttons + for undo and redo.</para> + + <para>Below the editing area you find the regular expression + currently build, in the so called ascii syntax. The ascii syntax + is updated while you edit the regular expression in the graphical + editor. If you rather want to update the ascii syntax then please + do, the graphical editor is updated on the fly to reflect your + changes.</para> + + <para>Finally to the left of the editor area you will find a number + of pre-built regular expressions. They serve two purposes: (1) When + you load the editor with a regular expression then this regular + expression is made <emphasis>nicer</emphasis> or more comprehensive + by replacing common regular expressions. In the screen dump above, + you can see how the ascii syntax ".*" have been replaced with a box + saying "anything". (2) When you insert regular expression you may + find building blocks for your own regular expression from the set of + pre build regular expressions. See the section on + <link linkend="userdefinedregexps">user defined regular + expressions</link> to learn how to save your own regular expressions.</para> + </sect1> + + <!-- ====================================================================== --> + <!-- Editing Tools --> + <!-- ====================================================================== --> + <sect1 id="editingtools"> + <title>Editing Tools</title> + <para>The text in this section expects that you have read the chapter + on <link linkend="whatIsARegExp">what a regular expression + is</link>, or have previous knowledge on this subject.</para> + + <para>All the editing tools are located in the tool bar above + editing area. Each of them will be described in the following.</para> + + + + <simplesect id="selecttool"> + <title>Selection Tool</title> + <mediaobject> + <imageobject><imagedata format="PNG" fileref="select.png"/> + </imageobject></mediaobject> + <para> The selection tool is used to + mark elements for cut-and-paste and drag-and-drop. This is very + similar to a selection tool in any drawing program.</para> + </simplesect> + + + + <simplesect id="texttool"><title>Text Tool</title> + <mediaobject> + <imageobject> + <imagedata format="PNG" fileref="text.png"/> + </imageobject></mediaobject> + + <para><inlinemediaobject><imageobject> + <imagedata format="PNG" fileref="texttool.png"/> + </imageobject></inlinemediaobject></para> + + <para>Using this tool you will insert normal text to match. The + text is matched literally, i.e. you do not have to worry about + escaping of special characters. In the example above the following + regular expression will be build: <literal>abc\*\\\)</literal></para> + </simplesect> + + + + <simplesect id="characterstool"><title>Character Tool</title> + <mediaobject><imageobject> + <imagedata format="PNG" fileref="characters.png"/> + </imageobject></mediaobject> + <para><inlinemediaobject><imageobject> + <imagedata format="PNG" fileref="charactertool.png"/> + </imageobject></inlinemediaobject></para> + + <para> Using this tool you insert + character ranges. Examples includes what in ASCII text says + <literal>[0-9]</literal>, <literal>[^a-zA-Z,_]</literal>. When + inserting an item with this tool a dialog will appear, in which + you specify the character ranges.</para> + + <para>See description of <link linkend="repeatregexp">repeated + regular expressions</link>.</para> + </simplesect> + + + + <simplesect id="anychartool"><title>Any Character Tool</title> + <mediaobject><imageobject> + <imagedata format="PNG" fileref="anychar.png"/> + </imageobject></mediaobject> + <para><inlinemediaobject><imageobject><imagedata format="PNG" fileref="anychartool.png"/> + </imageobject></inlinemediaobject></para> + + <para>This is the regular expression "dot" (.). It matches any + single character.</para> + + + + </simplesect> + + + + <simplesect id="repeattool"><title>Repeat Tool</title> + <mediaobject><imageobject> + <imagedata format="PNG" fileref="repeat.png"/> + </imageobject></mediaobject> + <para><inlinemediaobject><imageobject> + <imagedata format="PNG" fileref="repeattool.png"/> + </imageobject></inlinemediaobject></para> + + <para>This is the repeated + elements. This includes what in ASCII syntax is represented + using an asterix (*), a plus (+), a question mark (?), and + ranges ({3,5}). When you insert an item using this tool, a + dialog will appear asking for the number of times to + repeat.</para> + + <para>You specify what to repeat by drawing the repeated content + inside the box which this tool inserts.</para> + + <para>Repeated elements can both be built from the outside in and + the inside + out. That is you can first draw what to be repeated, select it + and use the repeat tool to repeat it. Alternatively you can + first insert the repeat element, and draw what is to be repeated + inside it.</para> + + <para>See description on the <link linkend="repeatregexp">repeated + regular expressions</link>.</para> + </simplesect> + + + + + <simplesect id="altntool"><title>Alternative Tool</title> + <mediaobject><imageobject> + <imagedata format="PNG" fileref="altn.png"/> + </imageobject></mediaobject> + <para><inlinemediaobject><imageobject><imagedata format="PNG" fileref="altntool.png"/> + </imageobject></inlinemediaobject></para> + + <para>This is the alternative regular expression (|). You specify + the alternatives by drawing each alternative on top of each other + inside the box that this tool inserts.</para> + + <para>See description on <link linkend="altnregexp">alternative + regular expressions</link></para> + </simplesect> + + + + + <simplesect id="compoundtool"><title>Compound Tool</title> + <mediaobject><imageobject> + <imagedata format="PNG" fileref="compound.png"/> + </imageobject></mediaobject> + <para><inlinemediaobject><imageobject><imagedata format="PNG" fileref="compoundtool.png"/> + </imageobject></inlinemediaobject></para> + + <para>The compound tool does not represent any regular + expressions. It is used to group other sub parts together in a + box, which easily can be collapsed to only its title. This can be + seen in the right part of the screen dump above.</para> + </simplesect> + + + + + + <simplesect id="positiontool"><title>Line Start/End Tools</title> + <mediaobject><imageobject> + <imagedata format="PNG" fileref="begline.png"/> + </imageobject></mediaobject> + <mediaobject><imageobject> + <imagedata format="PNG" fileref="endline.png"/> + </imageobject></mediaobject> + <para><inlinemediaobject><imageobject><imagedata format="PNG" fileref="linestartendtool.png"/> + </imageobject></inlinemediaobject></para> + + <para>The line start and line end tools matches the start of the + line, and the end of the line respectively. The regular + expression in the screen dump above thus matches lines only + matches spaces.</para> + + <para>See description of <link linkend="positionregexp">position + regular expressions</link>.</para> + </simplesect> + + + + + + <simplesect><title>Word (Non)Boundary Tools</title> + <mediaobject><imageobject> + <imagedata format="PNG" fileref="wordboundary.png"/> + </imageobject></mediaobject> + <mediaobject><imageobject><imagedata format="PNG" fileref="nonwordboundary.png"/> + </imageobject></mediaobject> + <para><inlinemediaobject><imageobject><imagedata format="PNG" fileref="boundarytools.png"/> + </imageobject></inlinemediaobject></para> + + <para>The boundary tools matches a word boundary respectively a + non-word boundary. The regular expression in the screen dump thus + matches any words starting with <literal>the</literal>. The word + <literal>the</literal> itself is, however, not matched.</para> + + <para>See description of <link linkend="boundaryregexp">boundary + regular expressions</link>.</para> + </simplesect> + + + + + + <simplesect id="lookaheadtools"><title>Positive/Negative Lookahead + Tools</title> + <mediaobject><imageobject> <imagedata format="PNG" fileref="poslookahead.png"/> + </imageobject></mediaobject> + <mediaobject><imageobject> <imagedata format="PNG" fileref="neglookahead.png"/> + </imageobject></mediaobject> + + <para><inlinemediaobject><imageobject> <imagedata format="PNG" fileref="lookaheadtools.png"/> + </imageobject></inlinemediaobject></para> + + <para>The look ahead tools either specify a positive or negative + regular expression to match. The match is, however, not part of + the total match.</para> + + <para>Note: You are only allowed to place lookaheads at the end + of the regular expressions. The Regular Expression Editor widget + does not enforce this.</para> + + <para>See description of <link linkend="lookaheadregexp">look ahead + regular expressions</link>.</para> + </simplesect> + </sect1> + + <!-- ====================================================================== --> + <!-- User Defined Regular Expressions --> + <!-- ====================================================================== --> + <sect1 id="userdefinedregexps"> + <title>User Defined Regular Expressions</title> + <para>Located at the left of the editing area is a list box + containing user defined regular expressions. Some regular + expressions are pre-installed with your KDE installation, while + others you can save yourself.</para> + + <para>These regular expression serves two purposes + (<link linkend="screenorganization">see detailed + description</link>), namely (1) to offer you a set of building + block and (2) to make common regular expressions prettier.</para> + + <para>You can save your own regular expressions by right clicking the + mouse button in the editing area, and choosing <literal>Save Regular + Expression</literal>.</para> + + <para>If the regular expression you save is within a + <link linkend="compoundtool">compound container</link> then the + regular expression will take part in making subsequent regular + expressions prettier.</para> + + <para>User defined regular expressions can be deleted or renamed by + pressing the right mouse button on top of the regular expression in + question in the list box.</para> + </sect1> + </chapter> + + <!-- ====================================================================== --> + <!-- Reporting a bug and Suggesting Features --> + <!-- ====================================================================== --> + <chapter id="bugreport"> + <title>Reporting bugs and Suggesting Features</title> + <para>Bug reports and feature requests should be submitted through the + <ulink url="http://bugs.kde.org/">KDE Bug Tracking System</ulink>. <emphasis + role="strong">Before</emphasis> you report a bug or suggest a feature, + please check that it hasn't already been + <ulink url="http://bugs.kde.org/simple_search.cgi?id=kregexpeditor">reported/suggested.</ulink></para> + </chapter> + + <!-- ====================================================================== --> + <!-- FAQ --> + <!-- ====================================================================== --> + <chapter id="faq"> + <title>Frequently Asked Questions</title> + <sect1 id="question1"> + <title>Does the regular expression editor support back references?</title> + <para>No currently this is not supported. It is planned for the next + version.</para> + </sect1> + + <sect1 id="question2"> + <title>Does the regular expression editor support showing matches?</title> + <para>No, hopefully this will be available in the next version.</para> + </sect1> + + <sect1 id="question3"> + <title>I'm the author of a KDE program, how can I use this widget in + my application?</title> + <para>See <ulink + url="http://developer.kde.org/documentation/library/cvs-api/classref/interfaces/KRegExpEditorInterface.html">The documentation for the class KRegExpEditorInterface</ulink>.</para> + </sect1> + + <sect1 id="question4"> + <title>I can't find the <emphasis>Edit Regular expression</emphasis> button in for example + konqueror on another KDE3 installation, why?</title> + <para>The regular expression widget is located in the package + KDE-utils. If you do not have this package installed, then the + <emphasis>edit regular expressions</emphasis> buttons will not + appear in the programs.</para> + </sect1> + </chapter> + + <!-- ====================================================================== --> + <!-- Credits and Licenses --> + <!-- ====================================================================== --> + <chapter id="credits-and-license"> + <title>Credits and Licenses</title> + + <para> + Documentation is copyright 2001, Jesper K. Pedersen + <email>[email protected]</email> + </para> + + + &underGPL; + &underFDL; + + </chapter> + + +</book> + +<!-- Keep this comment at the end of the file +Local variables: +mode: sgml +sgml-omittag:t +sgml-shorttag:t +sgml-namecase-general:t +sgml-general-insert-case:lower +sgml-minimize-attributes:nil +sgml-always-quote-attributes:t +sgml-indent-step:2 +sgml-indent-data:t +sgml-parent-document:nil +sgml-exposed-tags:nil +sgml-local-catalogs:nil +sgml-local-ecat-files:nil +End: +--> |