Class TMultiPageTemplate
Unit
Declaration
type TMultiPageTemplate = class(TObject)
Description
A multi-page template, which defines which and how web pages are processed.
A multi-page template defines a list of actions, each action listing webpages to download and queries to run on those webpages.
You can then call an action, let it run its queries, and read the result as variables.
(In the past patterns, were called templates, too, but they are very different from the multi-page template of this unit.
A multi-page template is a list of explicit actions that are performed in order, like an algorithm or script;
A pattern (single-page template) is an implicit pattern that is matched against the page, like a regular expression)
The syntax of a multi-page template is inspired by the XSLT/XProc syntax and looks like this:
<actions> <action id="action-1"> <variable name="foobar" value="xyz"/> <page url="url to send the request to"> <header name="header name">value...</header> <post name="post variable name"> value... </post> </page> <pattern> ...to apply to the previous page (inline)... </pattern> <pattern href="to apply to the previous page (from a file)"/> ... </action> <action id="action-2"> ... </action> ... </actions>
<actions> contains a list/map of named actions, each <action> can contain:
<page>
Downloads a webpage.<json>
Same as <page> but to download JSON data.<pattern>
Processes the last page with pattern matching.<variable>
Sets an variable, either to a string value or to an evaluated XPath expression.<loop>
Repeats the children of the loop element.<call>
Calls another action.<if>
Tests, if a condition is satisfied.<choose><when><otherwise>
Switches depending on a value.<s>
Evaluates an XPath/XQuery expression.<try><catch>
Catch errors.<include>
Includes template elements from another file.
Details for each element:
<page url="request url">
Specifies a page to download and process.
You can use<post name="..name.." value="..value..">..value..</post>
child elements under <page> to add variables for a post request to send to the url.
If the name attribute exists, the content is url-encoded, otherwise not.
(currently, the value attribute and the contained text are treated as a string to send. In future versions, the contained text will be evaluated as XPath expression.)
If no <post> children exist, a GET request is sent.The patterns that should be applied to the downloaded page, can be given directly in a <pattern> element, or in a separate file linked by the pattern-href attribute. (see THtmlTemplateParser for a description of the pattern-matching single-page template.)
The attribute
test="xpath"
can be used to skip a page if the condition in the attribute evaluates to false().<pattern href="file" name=".."> inline pattern </variable>
This applies a pattern to the last page.
The pattern can be given inline or loaded from a file in the href attribute.
The name attribute is only used for debugging.
<variable name="name" value="str value">xpath expression</variable>
This sets the value of the variable with name $name.
If the value attribute is given, it is set to the string value of the attribute, otherwise, the xpath expression is evaluated and its result is used.
The last downloaded webpage is available as the root element in the XPath expression.
<loop var="variable name" list="list (xpath)" test="condition (xpath)">
Repeats the children of this element.
It can be used like a foreach loop by giving the var/list attributes, like a while loop by using test, or like a combination of both.
In the first case, the expression in list is evaluated, each element of the resulting sequence is assigned once to the variable with the name $var, and the loop body is evaluated each time.
In the second case, the loop is simply repeated forever, until the expression in the test attributes evaluates to false.<call action="name">
Calls the action of the given name.
<if test="...">
Evaluates the children of this element, if the test evaluates to true().
<choose> <when test="..."/> <otherwise/> </choose>
Evaluates the tests of the when-elements and the children of the first <when> that is true.
If no test evaluates to true(), the children of <otherwise> are evaluated.<s>...</s>
Evaluates an XPath/XQuery expression (which can set global variables with :=).
<try> ... <catch errors="...">...</catch> </s>
Iff an error occurs during the evaluation of the non-<catch> children of the <try>-element, the children of matching <catch>-element are evaluated. This behaves similar to the try-except statement in Pascal and <try><catch> in XSLT.
The errors attribute is a whitespace-separated list of error codes caught by that <catch> element. XPath/XQuery errors have the form
err:*
with the value of * given in the XQuery standard.
HTTP errors have the internal formpxp:http123
where pxp: is the default prefix. Nevertheless, they can be matched using the namespace prefix http ashttp:123
. Partial wildcards are accepted likehttp:4*
to match the range 400 to 499.
pxp:pattern
is used for pattern matching failures.<include href="filename">
Includes another XML file. It behaves as if the elements of the other file were copy-pasted here.
Within all string attributes, you can access the previously defined variables by writing {$variable}
.
Within an XPath expression, you can access the variable with $variable
.
Hierarchy
- TObject
- TMultiPageTemplate
Overview
Fields
baseActions: TTemplateAction; |
|
name:string; |
Methods
constructor create(); |
|
procedure loadTemplateFromDirectory(_dataPath: string; aname: string = 'unknown'); |
|
procedure loadTemplateFromString(template: string; aname: string = 'unknown'; path: string = ''); |
|
procedure loadTemplateWithCallback(loadSomething: TLoadTemplateFile; _dataPath: string; aname: string = 'unknown'); |
|
destructor destroy; override; |
|
function findAction(_name:string): TTemplateAction; |
|
function findVariableValue(aname: string): string; |
|
function clone: TMultiPageTemplate; |
Description
Fields
baseActions: TTemplateAction; |
|
The primary <actions> element (or the first <action> element, if only one exists) |
name:string; |
|
A name for the template, for debugging |
Methods
constructor create(); |
|
procedure loadTemplateFromDirectory(_dataPath: string; aname: string = 'unknown'); |
|
Loads a template from a directory. |
procedure loadTemplateWithCallback(loadSomething: TLoadTemplateFile; _dataPath: string; aname: string = 'unknown'); |
|
Loads a template using a callback function. The callback function is called with different files names to load the corresponding file. |
destructor destroy; override; |
|
function findAction(_name:string): TTemplateAction; |
|
Returns the <action> element with the given id. |
function clone: TMultiPageTemplate; |
|
Generated by PasDoc 0.16.0.