Class TTreeParser
Unit
simplehtmltreeparser
Declaration
type TTreeParser = class(TObject)
Description
This parses an HTML/SGML/XML file to a tree like structure.
To use it, you have to call parseTree
with a string containing the document. Afterwards you can call getLastTree
to get the document root node.
The data structure is like a stream of annotated tokens with back links (so you can traverse it like a tree).
If TargetEncoding is not CP_NONE, the parsed data is automatically converted to that encoding. (the initial encoding is detected depending on the unicode BOM, the xml-declaration, the content-type header, the http-equiv meta tag and invalid characters.) You can change the class used for the elements in the tree with the field treeNodeClass.
Hierarchy
Overview
Fields
Methods
Properties
Description
Fields
|
treeNodeClass: TTreeNodeClass; |
Class of the tree nodes. You can subclass TTreeNode if you need to store additional data at every node
|
|
allowTextAtRootLevel: boolean; |
|
Methods
|
constructor Create; |
|
|
destructor destroy; override; |
|
|
procedure clearTrees; |
|
|
function parseTree(html: string; uri: string = ''; contentType: string = ''): TTreeDocument; virtual; |
Creates a new tree from an HTML document contained in html. contentType is used to detect the encoding
|
|
function parseTreeFromFile(filename: string): TTreeDocument; virtual; |
|
|
function getLastTree: TTreeDocument; |
Returns the last created tree
|
|
procedure removeEmptyTextNodes(const whenTrimmed: boolean); |
|
Properties
|
property repairMissingStartTags: boolean read FrepairMissingStartTags write FrepairMissingStartTags ; |
|
|
property repairMissingEndTags: boolean read FRepairMissingEndTags write FRepairMissingEndTags ; |
|
|
property trimText: boolean read FTrimText write FTrimText; |
If this is true (default is false), white space is removed from text nodes
|
|
property readComments: boolean read FReadComments write FReadComments; |
If this is true (default is false) comments are included in the generated tree
|
|
property readProcessingInstructions: boolean read FReadProcessingInstructions write FReadProcessingInstructions; |
If this is true (default is false) processing instructions are included in the generated tree
|
|
property autoDetectHTMLEncoding: boolean read FAutoDetectHTMLEncoding write fautoDetectHTMLEncoding; |
Determines if the encoding should be automatically detected (default true)
|
|
property TargetEncoding: TSystemCodePage read FEncodingTarget write FEncodingTarget; |
|
Generated by PasDoc 0.16.0.