Unit simplehtmltreeparser

DescriptionUsesClasses, Interfaces, Objects and RecordsFunctions and ProceduresTypesConstantsVariables

Description

This unit contains a html/xml -> tree converter

Overview

Classes, Interfaces, Objects and Records

Name Description
Class TXQHashmapStr  
Class TXQHashmapStrOwning  
Interface INamespace  
Class TNamespace  
Class TNamespaceList  
record TAttributeEnumerator  
Class TAttributeList A list of attributes.
Class TTreeNode This class representates an element of the html file
Class TTreeAttribute  
Class TTreeDocument  
Class ETreeParseException  
Class TTreeParser This parses a html/sgml/xml file to a tree like structure.

Functions and Procedures

function xmlStrEscape(s: string; attrib: boolean = false):string;
function xmlStrWhitespaceCollapse(const s: string):string;
function htmlStrEscape(s: string; attrib: boolean = false; encoding: TSystemCodePage = CP_NONE):string;
function equalNamespaces(const ans, bns: INamespace): boolean; inline;
function equalNamespaces(const ans, bns: string): boolean; inline;
function namespaceGetURL(const n: INamespace): string; inline;
function guessFormat(const data, uri, contenttype: string): TInternetToolsFormat;
function strEncodingFromContentType(const contenttype: string): TSystemCodePage;
function isInvalidUTF8(const s: string): boolean;
function nodeNameHash(const s: RawByteString): cardinal;

Types

TXQHashKeyString = TFLRERawByteString;
TXQHashmapStrOwningObject = specialize TXQHashmapStrOwning<TObject, TObjectList>;
TXQHashmapStrOwningInterface = specialize TXQHashmapStrOwning<IUnknown, TInterfaceList>;
TTreeNodeType = (...);
TTreeNodeTypes = set of TTreeNodeType;
TTreeNodeFindOptions = set of (tefoIgnoreType, tefoIgnoreText, tefoCaseSensitive, tefoNoChildren, tefoNoGrandChildren);
TStringComparisonFunc = function (const a,b: string): boolean of object;
TTreeNodeClass = class of TTreeNode;
TBasicParsingState = (...);
TParsingModel = (...);
TInternetToolsFormat = (...);

Constants

XMLNamespaceUrl_XML = 'http://www.w3.org/XML/1998/namespace';
XMLNamespaceUrl_XMLNS = 'http://www.w3.org/2000/xmlns/';
TreeNodesWithChildren = [tetOpen, tetDocument];

Variables

XMLNamespace_XMLNS: INamespace;
XMLNamespace_XML: INamespace;

Description

Functions and Procedures

function xmlStrEscape(s: string; attrib: boolean = false):string;
 
function xmlStrWhitespaceCollapse(const s: string):string;
 
function htmlStrEscape(s: string; attrib: boolean = false; encoding: TSystemCodePage = CP_NONE):string;
 
function equalNamespaces(const ans, bns: INamespace): boolean; inline;
 
function equalNamespaces(const ans, bns: string): boolean; inline;
 
function namespaceGetURL(const n: INamespace): string; inline;
 
function guessFormat(const data, uri, contenttype: string): TInternetToolsFormat;
 
function strEncodingFromContentType(const contenttype: string): TSystemCodePage;
 
function isInvalidUTF8(const s: string): boolean;
 
function nodeNameHash(const s: RawByteString): cardinal;
 

Types

TXQHashKeyString = TFLRERawByteString;
 
TXQHashmapStrOwningObject = specialize TXQHashmapStrOwning<TObject, TObjectList>;
 
TXQHashmapStrOwningInterface = specialize TXQHashmapStrOwning<IUnknown, TInterfaceList>;
 
TTreeNodeType = (...);

The type of a tree element. <Open>, text, or </close>

Values
  • tetOpen:  
  • tetClose:  
  • tetText:  
  • tetComment:  
  • tetProcessingInstruction:  
  • tetAttribute:  
  • tetDocument:  
  • tetInternalDoNotUseCDATAText:  
  • tetNamespace:  
TTreeNodeTypes = set of TTreeNodeType;
 
TTreeNodeFindOptions = set of (tefoIgnoreType, tefoIgnoreText, tefoCaseSensitive, tefoNoChildren, tefoNoGrandChildren);

Controls the search for a tree element.
ignore type: do not check for a matching type, ignore text: do not check for a matching text, case sensitive: do not ignore the case, no descend: only check elements that direct children of the current node

TStringComparisonFunc = function (const a,b: string): boolean of object;
 
TTreeNodeClass = class of TTreeNode;
 
TBasicParsingState = (...);
 
Values
  • bpmBeforeHtml:  
  • bpmBeforeHead:  
  • bpmInHead:  
  • bpmAfterHead:  
  • bpmInBody:  
  • bpmInFrameset:  
  • bpmAfterBody:  
  • bpmAfterAfterBody:  
TParsingModel = (...);

Parsing model used to interpret the document pmStrict: every tag must be closed explicitely (otherwise an exception is raised) pmHtml: accept everything, tries to create the best fitting tree using a heuristic to recover from faulty documents (no exceptions are raised), detect encoding

Values
  • pmStrict:  
  • pmHTML:  
  • pmUnstrictXML:  
TInternetToolsFormat = (...);
 
Values
  • itfXML:  
  • itfHTML:  
  • itfJSON:  
  • itfXMLPreparsedEntity:  

Constants

XMLNamespaceUrl_XML = 'http://www.w3.org/XML/1998/namespace';
 
XMLNamespaceUrl_XMLNS = 'http://www.w3.org/2000/xmlns/';
 
TreeNodesWithChildren = [tetOpen, tetDocument];
 

Variables

XMLNamespace_XMLNS: INamespace;
 
XMLNamespace_XML: INamespace;
 

Author


Generated by PasDoc 0.14.0.