KPilot PalmDoc Conduit bookmark Specification ============================================= (c) 2003 Reinhold Kainhofer, reinhold@kainhofer.com This document is licensed under the FDL (Free Documentation License) as published by the FSF. Any version of the FDL can be applied at your convenience. The PalmDoc conduit has three ways to indicate bookmarks for a text: -) Inline tags of the form <* bookmarkname *> -) Endtags of the form <bookmarkname> at the end of the document -) Regular expressions in a separate textname.bmk file (textname.bmk ist the filename of the text with the .txt replaced by .bmk) In the design of the .bmk file, I tried to stay close to the syntac of MakeDocJ bookmark files, but it turned out that I needed to extend the syntax a little. Also, MakeDocJ uses Java RegExps, while the PalmDoc conduit uses the QRegExp, which have some slight differences (especially concerning the ^ and $ patterns as well as backreferences). So if you used MakeDocJ, the .bmk file syntax will be quite familiar, but you will still have to adapt your bookmark files for Qt regular expressions instead of Java regular expressions 1) INLINE TAGS Whenever a tag of the form <* someText *> appears in the text, this sequence is removed from the text, and a bookmark is set there with the bookmark name "someText" (the part between the <* and the *>). 2) ENDTAGS If the text ends with tags of the form <someText>, the string in braces is used as bookmark name, and wherever it appears in the text, a bookmark is set. After the > any number of whitespace is allowed, but no other characters like letters, numbers, or punctuation. Also, inside the braces no line break must occur. The conduit searches the text from the end and if it finds a line break inside a <...> sequence, the tag and everything before it is assumed to belong to the text and doesn't form a bookmark tag. Between endtags any number of whitespace (spaces, tabs, line feeds etc.) is allowed. As an example, assume you have a text ending in: ... the bad guy was punished, and they lived happily ever after! <Tag with line feed> <bad guy> <princess> <married> The conduit starts at the end, ignores all whitespace between the tags, so it finds the tags "married", "princess", and "bad guy". The "Tag with line feed" has a line feed, so it is assumed to belong to the text. Assume now you have a text ending in: ... the bad guy was punished, and they lived happily ever after! <bad guy> The End <princess> <married> Here, only "married" and "princess" are found as bookmarks. Because of the letters before the "princess" tags, the search for the bookmarks ends at the letter "d" of "The End" (the conduit starts from the end and moves backward until it finds some text which cannot be seen as a endtag. 3) REGULAR EXPRESSIONS IN A SEPARATE FILE This is by far the most complex way to specify bookmarks, but it is also the mose powerful. If you have a text with filename "My fairy tale.txt", the bookmarks will be specified in a file called "My fairy tale.bmk" (just the text filename with the .txt replaced by .bmk). This file contains the bookmark definitions, one in each line. Lines starting with a # are seen as comments, and empty lines are also ignored. In the .bmk file, each bookmark line has one of the following syntaces (I will explain all fields later on). Fields in [..] are optional: bmkName bmkPosition, bmkName +, bmkPatternRegExp[, bmkNameAsString[, firstIncludedBmk[, lastIncludedBmk]]] +, bmkPatternRegExp[, bmkNameIndexOfSubexpression[, firstIncludedBmk[, lastIncludedBmk]]] -, bmkPatternRegExp[, bmkNameAsString] -, bmkPatternRegExp[, bmkNameIndexOfSubexpression] If the first field is a string, it is used as the bookmark name and pattern to search for. If the first field is a number, it means the position of the bookmark, and the second field is the name of the bookmark. If the first field is either + or -, the second field gives a regular expression that is used to find the position of the bookmark. If the first field is a -, the search is done only once and only the first match will be added as bookmark. If the first field is a +, the search is done until the regular expression can no longer be found (the fourth and fifth fields can be used to include only a certain range of hits). If there is a third field, and it is a string, it gives the name of the bookmark as a regular expression (i.e. \1 are replaced by the first subexpression of the search, where subexpressions are specified by round brackets in the regexp of the second field). If there is a third field, and it is a number, it gives the index of the subexpression of bmkPatternRegExp that is used as the bookmark name. If there is no third field, the whole matched text will be used as bookmark name. The optional fourth and fifth fields can be used to set bookmarks only after the first few ocurrences of the regexp in the text, and to stop the search after the expression has been found a certain number of times. If the PDB->PC sync is set up to store the bookmarks in a bookmark file, it will create a file "My fairy tale.bm" (no "k") with entries of the form position,bmkName The .bmk file will be used if it exists, but if no .bmk file exists, the .bm file will be used. This way you can override the bookmark settings, while at the same time the PDB->TXT sync does not destroy your possibly existing .bmk file. Examples: 1) Imagine you have a line like: frog princess In this case, the text is searched for "frog princess", and a bookmark is set whenever "frog princess" occurs in the text. The name of each of these bookmarks will be "frog princess". 2) A bookmark line: 55, Bookmark at offset 55 Here, a bookmark will be set at offset 55 (55th character of the text), and it will have the name "Bookmark at offs" (truncated to 16 characters) 3) A bookmark line -,Chapter \d+ causes a bookmark to be set at the first ocurrence of "Chapter XXX", where XXX denotes one or more digits. The bookmark name will be "Chapter XXX" (XXX replaced by the actual digits). 4) A bookmark line +,Chapter \d+ causes bookmarks to be set wherever "Chapter XXX" (XXX being one or more digits) appears in the text. The bookmark name will again be "Chapter XXX", but the search does not stop after the first hit. 5) A bookmark line +,\n\s*(Chapter \d+)\D+, 1 causes a bookmark to be set whenever a new line starts with "Chapter XXX" (whitespace is allowed before the "Chapter"), and uses the first subexpression in (..) as the bookmark name. If you have a passage Chapter 15: here it starts The regular expression will match, so a bookmark will be set there and the subexpression "Chapter 15" (which matches the (Chapter \d+) ) will be used as bookmark text. 6) A bookmark line +,\n\s*Part (\d+),\1\. part sets a bookmark whenever a line starts with "Part XXX". The XXX will be stored as the first matched subexpression. The third field "\1\. part" is the regular expression for the bookmark name, where \1 is replaced by the first matched subexpression of the search (XXX in this case). So if a line starts with " Part 17: ", the bookmark name will be "17. part". 7) A bookmark line +,Table (\d+): ,\1\. Tabelle,5,25 will match whenever "Table XXX: " appears in the text, and the bookmark name will be "XXX. Tabelle". However, the fourth field means that the first four hits are ignored (the 5th hit is the first hit to be included as a bookmark), and the fifth field means that all further hits after the 25th will be ignored, too. 8) In law texts, I use a regular expression +,\n *(�\.? *\d+[a-z]?\.?) +, 1 to search for all paragraphs starting like "�. 15. " or " �23 ", and set a bookmark there using only the part from the � to the last digit or the full stop after the last digit (the pattern between the (), in our two cases the bookmark names will be "�. 15." and "�23" ).