Unit extendedhtmlparser

Description

This units contains a pattern matching HTML parser named THtmlTemplateParser

Overview

Classes, Interfaces, Objects and Records

Name Description
Class ETemplateParseException  
Class EHTMLParseException  
Class EHTMLParseMatchingException  
Class THtmlTemplateParser This is the pattern matching processor class which can apply a pattern to one or more HTML documents.
Class TXQTerm_VisitorFindWeirdGlobalVariableDeclarations  

Functions and Procedures

function guessExtractionKind(e: string): TExtractionKind;

Types

TTemplateElementType = (...);
TTemplateElementFlag = (...);
TTemplateElementFlags = set of TTemplateElementFlag;
TReplaceFunction = procedure (variable: string; var value:string) of object;
TStringAttributeList = tStringList;
TTrimTextNodes = (...);
TKeepPreviousVariables = (...);
TXQTermVariableArray = array of TXQTermVariable;
TExtractionKind = (...);

Constants

HTMLPARSER_NAMESPACE_URL = 'http://www.benibela.de/2011/templateparser';
rsPatternMatchingFailedS: string = 'Matching of pattern %s failed.';
rsPatternMatchingFailedDebugAtS: string = 'Couldn''t find a match for: %s';
rsPatternMatchingFailedDebugPreviousElementS: string = 'Previous element is: %s';
rsPatternMatchingFailedDebugLastMatchSS: string = 'Last match was: %s with %s';
rsPatternMatchingFailedDebugAllMatched: string = 'However, all elements have found their match.';

Description

Functions and Procedures

function guessExtractionKind(e: string): TExtractionKind;
 

Types

TTemplateElementType = (...);

These are all possible template commands, for internal use

Values
  • tetIgnore: useless thing
  • tetHTMLOpen: normal html opening tag, searched in the processed document
  • tetHTMLClose: normal html closing tag, searched in the processed document
  • tetHTMLText: text node, , searched in the processed document
  • tetMatchElementOpen
  • tetMatchElementClose
  • tetMatchText
  • tetCommandMeta: <template:meta> command to specify how strings are compared
  • tetCommandMetaAttribute: <template:meta-attribute> command to specify how attributes are compared
  • tetCommandRead: <template:read> command to set a variable
  • tetCommandShortRead: <template:s> command to execute a xq expression
  • tetCommandLoopOpen: <template:loop> command to repeat something as long as possible
  • tetCommandLoopClose
  • tetCommandIfOpen: <template:if> command to skip something
  • tetCommandIfClose
  • tetCommandElseOpen: <template:else> command to skip something
  • tetCommandElseClose
  • tetCommandSwitchOpen: <template:switch> command to branch
  • tetCommandSwitchClose
  • tetCommandSwitchPrioritizedOpen: <template:switch-prioritized> command to branch
  • tetCommandSwitchPrioritizedClose
  • tetCommandSiblingsOpen
  • tetCommandSiblingsClose
  • tetCommandSiblingsHeaderOpen
  • tetCommandSiblingsHeaderClose
TTemplateElementFlag = (...);
 
Values
  • tefOptional
  • tefSwitchChild
TTemplateElementFlags = set of TTemplateElementFlag;
 
TReplaceFunction = procedure (variable: string; var value:string) of object;

Possible callback for getting the value of a variable

TStringAttributeList = tStringList;
 
TTrimTextNodes = (...);

Specifies when the text of text nodes is trimmed. Each value removes strictly more whitespace than the previous ones.

Values
  • ttnNever: never, all whitespace is kept
  • ttnForMatching: When comparing two text nodes, whitespace is ignored; but all whitespace will be returned when reading text
  • ttnWhenLoadingEmptyOnly
  • ttnWhenLoading: All starting/ending whitespace is unconditionally removed from all text nodes
TKeepPreviousVariables = (...);

This specifies the handling of the variables read in the previous document

In every case all node variables are converted to strings (because the nodes point to elements of the previous document, but the previous document will be deleted)

Values
  • kpvForget: Old variables are deleted
  • kpvKeepValues: Old variables are moved from the property variableChangelog to the property oldVariableChangelog
  • kpvKeepInNewChangeLog: Old variables stay where they are (i.e. in the variableChangelog property merged with the new ones)
TXQTermVariableArray = array of TXQTermVariable;
 
TExtractionKind = (...);
 
Values
  • ekAuto
  • ekDefault
  • ekXPath2
  • ekXPath3_0
  • ekXPath3_1
  • ekXPath4_0
  • ekXQuery1
  • ekXQuery3_0
  • ekXQuery3_1
  • ekXQuery4_0
  • ekPatternHTML
  • ekPatternXML
  • ekCSS
  • ekMultipage

Constants

HTMLPARSER_NAMESPACE_URL = 'http://www.benibela.de/2011/templateparser';

xml compatible namespace url to define new template prefixes

rsPatternMatchingFailedS: string = 'Matching of pattern %s failed.';
 
rsPatternMatchingFailedDebugAtS: string = 'Couldn''t find a match for: %s';
 
rsPatternMatchingFailedDebugPreviousElementS: string = 'Previous element is: %s';
 
rsPatternMatchingFailedDebugLastMatchSS: string = 'Last match was: %s with %s';
 
rsPatternMatchingFailedDebugAllMatched: string = 'However, all elements have found their match.';
 

Author


Generated by PasDoc 0.16.0.