|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectjavax.xml.parsers.DocumentBuilder
nu.validator.htmlparser.dom.HtmlDocumentBuilder
public class HtmlDocumentBuilder
This class implements an HTML5 parser that exposes data through the DOM interface.
By default, when using the constructor without arguments, the
this parser coerces XML 1.0-incompatible infosets into XML 1.0-compatible
infosets. This corresponds to ALTER_INFOSET as the general
XML violation policy. To make the parser support non-conforming HTML fully
per the HTML 5 spec while on the other hand potentially violating the SAX2
API contract, set the general XML violation policy to ALLOW.
This does not work with a standard DOM implementation.
It is possible to treat XML 1.0 infoset violations as fatal by setting
the general XML violation policy to FATAL.
The doctype is not represented in the tree.
The document mode is represented as user data DocumentMode
object with the key nu.validator.document-mode on the document
node.
The form pointer is also stored as user data with the key
nu.validator.form-pointer.
| Field Summary | |
|---|---|
private DOMTreeBuilder |
domTreeBuilder
The tree builder. |
private EntityResolver |
entityResolver
The entity resolver. |
private DOMImplementation |
implementation
The DOM impl. |
private Driver |
tokenizer
The tokenizer. |
| Constructor Summary | |
|---|---|
HtmlDocumentBuilder()
Instantiates the document builder with the JAXP DOM implementation and the infoset-altering XML violation policy. |
|
HtmlDocumentBuilder(DOMImplementation implementation)
Instantiates the document builder with a specific DOM implementation and the infoset-altering XML violation policy. |
|
HtmlDocumentBuilder(DOMImplementation implementation,
XmlViolationPolicy xmlPolicy)
Instantiates the document builder with a specific DOM implementation and XML violation policy. |
|
HtmlDocumentBuilder(XmlViolationPolicy xmlPolicy)
Instantiates the document builder with the JAXP DOM implementation and a specific XML violation policy. |
|
| Method Summary | |
|---|---|
DOMImplementation |
getDOMImplementation()
Returns the DOM implementation |
boolean |
isNamespaceAware()
Returns true. |
boolean |
isValidating()
Returns false |
private static DOMImplementation |
jaxpDOMImplementation()
Returns the JAXP DOM implementation. |
Document |
newDocument()
For API compatibility. |
Document |
parse(InputSource is)
Parses a document from a SAX InputSource. |
DocumentFragment |
parseFragment(InputSource is,
String context)
Parses a document fragment from a SAX InputSource. |
void |
setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
Deprecated. |
void |
setCheckingNormalization(boolean enable)
Toggles the checking of the NFC normalization of source. |
void |
setCommentPolicy(XmlViolationPolicy commentPolicy)
Sets the policy for consecutive hyphens in comments. |
void |
setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
Sets the policy for non-XML characters except white space. |
void |
setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
Sets the policy for non-XML white space. |
void |
setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
Sets the doctype expectation. |
void |
setDocumentModeHandler(DocumentModeHandler documentModeHandler)
Sets the document mode handler. |
void |
setEntityResolver(EntityResolver resolver)
Sets the entity resolver for URI-only inputs. |
void |
setErrorHandler(ErrorHandler errorHandler)
Sets the error handler. |
void |
setHeuristics(Heuristics heuristics)
Sets the encoding sniffing heuristics. |
void |
setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
Whether the HTML 4 mode reports boolean attributes in a way that repeats the name in the value. |
void |
setIgnoringComments(boolean ignoreComments)
Sets whether comment nodes appear in the tree. |
void |
setMappingLangToXmlLang(boolean mappingLangToXmlLang)
Whether to map the HTML lang attribute to xml:lang. |
void |
setNamePolicy(XmlViolationPolicy namePolicy)
Sets the policy for dealing with names that aren't XML 1.0 4th ed. |
void |
setScriptingEnabled(boolean scriptingEnabled)
Sets whether the parser considers scripting to be enabled for noscript treatment. |
void |
setXmlPolicy(XmlViolationPolicy xmlPolicy)
This is a catch-all convenience method for setting name, content space, content non-XML char and comment policies in one go. |
private void |
tokenize(InputSource is)
Tokenizes the input source. |
| Methods inherited from class javax.xml.parsers.DocumentBuilder |
|---|
getSchema, isXIncludeAware, parse, parse, parse, parse, reset |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private final Driver tokenizer
private final DOMTreeBuilder domTreeBuilder
private final DOMImplementation implementation
private EntityResolver entityResolver
| Constructor Detail |
|---|
public HtmlDocumentBuilder(DOMImplementation implementation,
XmlViolationPolicy xmlPolicy)
implementation - the DOM implementationxmlPolicy - the policypublic HtmlDocumentBuilder(DOMImplementation implementation)
implementation - the DOM implementationpublic HtmlDocumentBuilder()
public HtmlDocumentBuilder(XmlViolationPolicy xmlPolicy)
xmlPolicy - the policy| Method Detail |
|---|
private static DOMImplementation jaxpDOMImplementation()
public DOMImplementation getDOMImplementation()
getDOMImplementation in class DocumentBuilderDocumentBuilder.getDOMImplementation()public boolean isNamespaceAware()
true.
isNamespaceAware in class DocumentBuildertrueDocumentBuilder.isNamespaceAware()public boolean isValidating()
false
isValidating in class DocumentBuilderfalseDocumentBuilder.isValidating()public Document newDocument()
newDocument in class DocumentBuilderDocumentBuilder.newDocument()
public Document parse(InputSource is)
throws SAXException,
IOException
InputSource.
parse in class DocumentBuilderis - the source
SAXException - if stuff goes wrong
IOException - if IO goes wrongDocumentBuilder.parse(org.xml.sax.InputSource)
public DocumentFragment parseFragment(InputSource is,
String context)
throws IOException,
SAXException
InputSource.
is - the sourcecontext - the context element name
SAXException - if stuff goes wrong
IOException - if IO goes wrongpublic void setEntityResolver(EntityResolver resolver)
setEntityResolver in class DocumentBuilderresolver - the resolverDocumentBuilder.setEntityResolver(org.xml.sax.EntityResolver)public void setErrorHandler(ErrorHandler errorHandler)
setErrorHandler in class DocumentBuildererrorHandler - the handlerDocumentBuilder.setErrorHandler(org.xml.sax.ErrorHandler)public void setIgnoringComments(boolean ignoreComments)
ignoreComments - true to ignore commentsTreeBuilder.setIgnoringComments(boolean)public void setScriptingEnabled(boolean scriptingEnabled)
scriptingEnabled - true to enableTreeBuilder.setScriptingEnabled(boolean)public void setCheckingNormalization(boolean enable)
enable - true to check normalizationnu.validator.htmlparser.impl.Tokenizer#setCheckingNormalization(boolean)public void setCommentPolicy(XmlViolationPolicy commentPolicy)
commentPolicy - the policyTokenizer.setCommentPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public void setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
contentNonXmlCharPolicy - the policyTokenizer.setContentNonXmlCharPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public void setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
contentSpacePolicy - the policyTokenizer.setContentSpacePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
html4ModeCompatibleWithXhtml1Schemata - public void setMappingLangToXmlLang(boolean mappingLangToXmlLang)
lang attribute to xml:lang.
mappingLangToXmlLang - true to map lang to xml:langTokenizer.setMappingLangToXmlLang(boolean)public void setNamePolicy(XmlViolationPolicy namePolicy)
namePolicy - the policyTokenizer.setNamePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public void setXmlPolicy(XmlViolationPolicy xmlPolicy)
namePolicy - the policypublic void setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
public void setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
doctypeExpectation - the doctypeExpectation to setTreeBuilder.setDoctypeExpectation(nu.validator.htmlparser.common.DoctypeExpectation)public void setDocumentModeHandler(DocumentModeHandler documentModeHandler)
documentModeHandler - TreeBuilder.setDocumentModeHandler(nu.validator.htmlparser.common.DocumentModeHandler)public void setHeuristics(Heuristics heuristics)
heuristics - the heuristics to setnu.validator.htmlparser.impl.Tokenizer#setHeuristics(nu.validator.htmlparser.common.Heuristics)
private void tokenize(InputSource is)
throws SAXException,
IOException,
MalformedURLException
is - the source
SAXException - if stuff goes wrong
IOException - if IO goes wrong
MalformedURLException - if the system ID is malformed and the entity resolver is null
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||