|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjavax.xml.parsers.DocumentBuilder
nu.validator.htmlparser.dom.HtmlDocumentBuilder
public class HtmlDocumentBuilder
This class implements an HTML5 parser that exposes data through the DOM interface.
By default, when using the constructor without arguments, the
this parser coerces XML 1.0-incompatible infosets into XML 1.0-compatible
infosets. This corresponds to ALTER_INFOSET
as the general
XML violation policy. To make the parser support non-conforming HTML fully
per the HTML 5 spec while on the other hand potentially violating the SAX2
API contract, set the general XML violation policy to ALLOW
.
This does not work with a standard DOM implementation.
It is possible to treat XML 1.0 infoset violations as fatal by setting
the general XML violation policy to FATAL
.
The doctype is not represented in the tree.
The document mode is represented as user data DocumentMode
object with the key nu.validator.document-mode
on the document
node.
The form pointer is also stored as user data with the key
nu.validator.form-pointer
.
Constructor Summary | |
---|---|
HtmlDocumentBuilder()
Instantiates the document builder with the JAXP DOM implementation and the infoset-altering XML violation policy. |
|
HtmlDocumentBuilder(org.w3c.dom.DOMImplementation implementation)
Instantiates the document builder with a specific DOM implementation and the infoset-altering XML violation policy. |
|
HtmlDocumentBuilder(org.w3c.dom.DOMImplementation implementation,
XmlViolationPolicy xmlPolicy)
Instantiates the document builder with a specific DOM implementation and XML violation policy. |
|
HtmlDocumentBuilder(XmlViolationPolicy xmlPolicy)
Instantiates the document builder with the JAXP DOM implementation and a specific XML violation policy. |
Method Summary | |
---|---|
void |
addCharacterHandler(CharacterHandler characterHandler)
|
XmlViolationPolicy |
getBogusXmlnsPolicy()
Deprecated. |
XmlViolationPolicy |
getCommentPolicy()
Returns the commentPolicy. |
XmlViolationPolicy |
getContentNonXmlCharPolicy()
Returns the contentNonXmlCharPolicy. |
XmlViolationPolicy |
getContentSpacePolicy()
Returns the contentSpacePolicy. |
DoctypeExpectation |
getDoctypeExpectation()
Returns the doctype expectation. |
org.xml.sax.Locator |
getDocumentLocator()
Returns the Locator during parse. |
DocumentModeHandler |
getDocumentModeHandler()
Returns the document mode handler. |
org.w3c.dom.DOMImplementation |
getDOMImplementation()
Returns the DOM implementation |
Heuristics |
getHeuristics()
|
XmlViolationPolicy |
getNamePolicy()
The policy for non-NCName element and attribute names. |
XmlViolationPolicy |
getStreamabilityViolationPolicy()
Returns the streamabilityViolationPolicy. |
XmlViolationPolicy |
getXmlnsPolicy()
Returns the xmlnsPolicy. |
boolean |
isCheckingNormalization()
Indicates whether NFC normalization of source is being checked. |
boolean |
isHtml4ModeCompatibleWithXhtml1Schemata()
Whether the HTML 4 mode reports boolean attributes in a way that repeats the name in the value. |
boolean |
isMappingLangToXmlLang()
Whether lang is mapped to xml:lang . |
boolean |
isNamespaceAware()
Returns true . |
boolean |
isReportingDoctype()
Returns the reportingDoctype. |
boolean |
isScriptingEnabled()
Whether the parser considers scripting to be enabled for noscript treatment. |
boolean |
isValidating()
Returns false |
org.w3c.dom.Document |
newDocument()
For API compatibility. |
org.w3c.dom.Document |
parse(org.xml.sax.InputSource is)
Parses a document from a SAX InputSource . |
org.w3c.dom.DocumentFragment |
parseFragment(org.xml.sax.InputSource is,
java.lang.String context)
Parses a document fragment from a SAX InputSource . |
void |
setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
Deprecated. |
void |
setCheckingNormalization(boolean enable)
Toggles the checking of the NFC normalization of source. |
void |
setCommentPolicy(XmlViolationPolicy commentPolicy)
Sets the policy for consecutive hyphens in comments. |
void |
setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
Sets the policy for non-XML characters except white space. |
void |
setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
Sets the policy for non-XML white space. |
void |
setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
Sets the doctype expectation. |
void |
setDocumentModeHandler(DocumentModeHandler documentModeHandler)
Sets the document mode handler. |
void |
setEntityResolver(org.xml.sax.EntityResolver resolver)
Sets the entity resolver for URI-only inputs. |
void |
setErrorHandler(org.xml.sax.ErrorHandler errorHandler)
Sets the error handler. |
void |
setHeuristics(Heuristics heuristics)
Sets the encoding sniffing heuristics. |
void |
setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
Whether the HTML 4 mode reports boolean attributes in a way that repeats the name in the value. |
void |
setIgnoringComments(boolean ignoreComments)
Sets whether comment nodes appear in the tree. |
void |
setMappingLangToXmlLang(boolean mappingLangToXmlLang)
Whether lang is mapped to xml:lang . |
void |
setNamePolicy(XmlViolationPolicy namePolicy)
The policy for non-NCName element and attribute names. |
void |
setReportingDoctype(boolean reportingDoctype)
|
void |
setScriptingEnabled(boolean scriptingEnabled)
Sets whether the parser considers scripting to be enabled for noscript treatment. |
void |
setStreamabilityViolationPolicy(XmlViolationPolicy streamabilityViolationPolicy)
Sets the streamabilityViolationPolicy. |
void |
setTransitionHander(TransitionHandler handler)
|
void |
setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
Whether the xmlns attribute on the root element is
passed to through. |
void |
setXmlPolicy(XmlViolationPolicy xmlPolicy)
This is a catch-all convenience method for setting name, xmlns, content space, content non-XML char and comment policies in one go. |
Methods inherited from class javax.xml.parsers.DocumentBuilder |
---|
getSchema, isXIncludeAware, parse, parse, parse, parse, reset |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public HtmlDocumentBuilder(org.w3c.dom.DOMImplementation implementation, XmlViolationPolicy xmlPolicy)
implementation
- the DOM implementationxmlPolicy
- the policypublic HtmlDocumentBuilder(org.w3c.dom.DOMImplementation implementation)
implementation
- the DOM implementationpublic HtmlDocumentBuilder()
public HtmlDocumentBuilder(XmlViolationPolicy xmlPolicy)
xmlPolicy
- the policyMethod Detail |
---|
public org.w3c.dom.DOMImplementation getDOMImplementation()
getDOMImplementation
in class javax.xml.parsers.DocumentBuilder
DocumentBuilder.getDOMImplementation()
public boolean isNamespaceAware()
true
.
isNamespaceAware
in class javax.xml.parsers.DocumentBuilder
true
DocumentBuilder.isNamespaceAware()
public boolean isValidating()
false
isValidating
in class javax.xml.parsers.DocumentBuilder
false
DocumentBuilder.isValidating()
public org.w3c.dom.Document newDocument()
newDocument
in class javax.xml.parsers.DocumentBuilder
DocumentBuilder.newDocument()
public org.w3c.dom.Document parse(org.xml.sax.InputSource is) throws org.xml.sax.SAXException, java.io.IOException
InputSource
.
parse
in class javax.xml.parsers.DocumentBuilder
is
- the source
org.xml.sax.SAXException
- if stuff goes wrong
java.io.IOException
- if IO goes wrongDocumentBuilder.parse(org.xml.sax.InputSource)
public org.w3c.dom.DocumentFragment parseFragment(org.xml.sax.InputSource is, java.lang.String context) throws java.io.IOException, org.xml.sax.SAXException
InputSource
.
is
- the sourcecontext
- the context element name
org.xml.sax.SAXException
- if stuff goes wrong
java.io.IOException
- if IO goes wrongpublic void setEntityResolver(org.xml.sax.EntityResolver resolver)
setEntityResolver
in class javax.xml.parsers.DocumentBuilder
resolver
- the resolverDocumentBuilder.setEntityResolver(org.xml.sax.EntityResolver)
public void setErrorHandler(org.xml.sax.ErrorHandler errorHandler)
setErrorHandler
in class javax.xml.parsers.DocumentBuilder
errorHandler
- the handlerDocumentBuilder.setErrorHandler(org.xml.sax.ErrorHandler)
public void setTransitionHander(TransitionHandler handler)
public boolean isCheckingNormalization()
true
if NFC normalization of source is being checked.nu.validator.htmlparser.impl.Tokenizer#isCheckingNormalization()
public void setCheckingNormalization(boolean enable)
enable
- true
to check normalizationnu.validator.htmlparser.impl.Tokenizer#setCheckingNormalization(boolean)
public void setCommentPolicy(XmlViolationPolicy commentPolicy)
commentPolicy
- the policyTokenizer.setCommentPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
contentNonXmlCharPolicy
- the policyTokenizer.setContentNonXmlCharPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
contentSpacePolicy
- the policyTokenizer.setContentSpacePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public boolean isScriptingEnabled()
true
if enabledTreeBuilder.isScriptingEnabled()
public void setScriptingEnabled(boolean scriptingEnabled)
scriptingEnabled
- true
to enableTreeBuilder.setScriptingEnabled(boolean)
public DoctypeExpectation getDoctypeExpectation()
public void setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
doctypeExpectation
- the doctypeExpectation to setTreeBuilder.setDoctypeExpectation(nu.validator.htmlparser.common.DoctypeExpectation)
public DocumentModeHandler getDocumentModeHandler()
public void setDocumentModeHandler(DocumentModeHandler documentModeHandler)
documentModeHandler
- the documentModeHandler to setTreeBuilder.setDocumentModeHandler(nu.validator.htmlparser.common.DocumentModeHandler)
public XmlViolationPolicy getStreamabilityViolationPolicy()
public void setStreamabilityViolationPolicy(XmlViolationPolicy streamabilityViolationPolicy)
streamabilityViolationPolicy
- the streamabilityViolationPolicy to setpublic void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
html4ModeCompatibleWithXhtml1Schemata
- public org.xml.sax.Locator getDocumentLocator()
Locator
during parse.
Locator
public boolean isHtml4ModeCompatibleWithXhtml1Schemata()
public void setMappingLangToXmlLang(boolean mappingLangToXmlLang)
lang
is mapped to xml:lang
.
mappingLangToXmlLang
- Tokenizer.setMappingLangToXmlLang(boolean)
public boolean isMappingLangToXmlLang()
lang
is mapped to xml:lang
.
public void setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
xmlns
attribute on the root element is
passed to through. (FATAL not allowed.)
xmlnsPolicy
- Tokenizer.setXmlnsPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public XmlViolationPolicy getXmlnsPolicy()
public XmlViolationPolicy getCommentPolicy()
public XmlViolationPolicy getContentNonXmlCharPolicy()
public XmlViolationPolicy getContentSpacePolicy()
public void setReportingDoctype(boolean reportingDoctype)
reportingDoctype
- TreeBuilder.setReportingDoctype(boolean)
public boolean isReportingDoctype()
public void setNamePolicy(XmlViolationPolicy namePolicy)
namePolicy
- Tokenizer.setNamePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setHeuristics(Heuristics heuristics)
heuristics
- the heuristics to setnu.validator.htmlparser.impl.Tokenizer#setHeuristics(nu.validator.htmlparser.common.Heuristics)
public Heuristics getHeuristics()
public void setXmlPolicy(XmlViolationPolicy xmlPolicy)
xmlPolicy
- public XmlViolationPolicy getNamePolicy()
public void setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
public XmlViolationPolicy getBogusXmlnsPolicy()
XmlViolationPolicy.ALTER_INFOSET
.
XmlViolationPolicy.ALTER_INFOSET
public void addCharacterHandler(CharacterHandler characterHandler)
public void setIgnoringComments(boolean ignoreComments)
ignoreComments
- true
to ignore commentsTreeBuilder.setIgnoringComments(boolean)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |