|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjavax.xml.parsers.DocumentBuilder
nu.validator.htmlparser.dom.HtmlDocumentBuilder
public class HtmlDocumentBuilder
This class implements an HTML5 parser that exposes data through the DOM interface.
By default, when using the constructor without arguments, the
this parser treats XML 1.0-incompatible infosets as fatal errors.
This corresponds to
FATAL
as the general XML violation policy. To make the parser
support non-conforming HTML fully per the HTML 5 spec while on the other
hand potentially violating the DOM API contract, set the general XML
violation policy to ALLOW
. This does not work with a standard
DOM implementation. Handling all input without fatal errors and without
violating the DOM API contract is possible by setting
the general XML violation policy to ALTER_INFOSET
. This
makes the parser non-conforming but is probably the most useful
setting for most applications.
The doctype is not represented in the tree.
The document mode is represented as user data DocumentMode
object with the key nu.validator.document-mode
on the document
node.
The form pointer is also stored as user data with the key
nu.validator.form-pointer
.
Field Summary | |
---|---|
private DOMTreeBuilder |
domTreeBuilder
|
private EntityResolver |
entityResolver
|
private DOMImplementation |
implementation
|
private Tokenizer |
tokenizer
|
Constructor Summary | |
---|---|
HtmlDocumentBuilder()
Instantiates the document builder with the JAXP DOM implementation and fatal XML violation policy. |
|
HtmlDocumentBuilder(DOMImplementation implementation)
Instantiates the document builder with a specific DOM implementation and fatal XML violation policy. |
|
HtmlDocumentBuilder(DOMImplementation implementation,
XmlViolationPolicy xmlPolicy)
Instantiates the document builder with a specific DOM implementation and XML violation policy. |
|
HtmlDocumentBuilder(XmlViolationPolicy xmlPolicy)
Instantiates the document builder with the JAXP DOM implementation and a specific XML violation policy. |
Method Summary | |
---|---|
DOMImplementation |
getDOMImplementation()
Returns the DOM implementation |
boolean |
isNamespaceAware()
Returns true . |
boolean |
isValidating()
Returns false |
private static DOMImplementation |
jaxpDOMImplementation()
|
Document |
newDocument()
For API compatibility. |
Document |
parse(InputSource is)
Parses a document from a SAX InputSource . |
DocumentFragment |
parseFragment(InputSource is,
String context)
Parses a document fragment from a SAX InputSource . |
void |
setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
Sets the policy for forbidden xmlns attributes. |
void |
setCheckingNormalization(boolean enable)
Toggles the checking of the NFC normalization of source. |
void |
setCommentPolicy(XmlViolationPolicy commentPolicy)
Sets the policy for consecutive hyphens in comments. |
void |
setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
Sets the policy for non-XML characters except white space. |
void |
setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
Sets the policy for non-XML white space. |
void |
setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
Sets the doctype expectation. |
void |
setDocumentModeHandler(DocumentModeHandler documentModeHandler)
Sets the document mode handler. |
void |
setEntityResolver(EntityResolver resolver)
Sets the entity resolver for URI-only inputs. |
void |
setErrorHandler(ErrorHandler errorHandler)
|
void |
setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
Whether the HTML 4 mode reports boolean attributes in a way that repeats the name in the value. |
void |
setIgnoringComments(boolean ignoreComments)
Sets whether comment nodes appear in the tree. |
void |
setMappingLangToXmlLang(boolean mappingLangToXmlLang)
|
void |
setNamePolicy(XmlViolationPolicy namePolicy)
|
void |
setScriptingEnabled(boolean scriptingEnabled)
Sets whether the parser considers scripting to be enabled for noscript treatment. |
void |
setXmlPolicy(XmlViolationPolicy xmlPolicy)
This is a catch-all convenience method for setting name, content space, content non-XML char and comment policies in one go. |
private void |
tokenize(InputSource is)
|
Methods inherited from class javax.xml.parsers.DocumentBuilder |
---|
getSchema, isXIncludeAware, parse, parse, parse, parse, reset |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private final Tokenizer tokenizer
private final DOMTreeBuilder domTreeBuilder
private final DOMImplementation implementation
private EntityResolver entityResolver
Constructor Detail |
---|
public HtmlDocumentBuilder(DOMImplementation implementation, XmlViolationPolicy xmlPolicy)
implementation
- the DOM implementationxmlPolicy
- the policypublic HtmlDocumentBuilder(DOMImplementation implementation)
implementation
- the DOM implementationpublic HtmlDocumentBuilder()
public HtmlDocumentBuilder(XmlViolationPolicy xmlPolicy)
xmlPolicy
- the policyMethod Detail |
---|
private static DOMImplementation jaxpDOMImplementation()
public DOMImplementation getDOMImplementation()
getDOMImplementation
in class DocumentBuilder
DocumentBuilder.getDOMImplementation()
public boolean isNamespaceAware()
true
.
isNamespaceAware
in class DocumentBuilder
true
DocumentBuilder.isNamespaceAware()
public boolean isValidating()
false
isValidating
in class DocumentBuilder
false
DocumentBuilder.isValidating()
public Document newDocument()
newDocument
in class DocumentBuilder
DocumentBuilder.newDocument()
public Document parse(InputSource is) throws SAXException, IOException
InputSource
.
parse
in class DocumentBuilder
is
- the source
SAXException
IOException
DocumentBuilder.parse(org.xml.sax.InputSource)
public DocumentFragment parseFragment(InputSource is, String context) throws IOException, SAXException
InputSource
.
is
- the sourcecontext
- the context element name
IOException
SAXException
private void tokenize(InputSource is) throws SAXException, IOException, MalformedURLException
is
-
SAXException
IOException
MalformedURLException
public void setEntityResolver(EntityResolver resolver)
setEntityResolver
in class DocumentBuilder
resolver
- the resolverDocumentBuilder.setEntityResolver(org.xml.sax.EntityResolver)
public void setErrorHandler(ErrorHandler errorHandler)
setErrorHandler
in class DocumentBuilder
DocumentBuilder.setErrorHandler(org.xml.sax.ErrorHandler)
public void setIgnoringComments(boolean ignoreComments)
ignoreComments
- true
to ignore commentsTreeBuilder.setIgnoringComments(boolean)
public void setScriptingEnabled(boolean scriptingEnabled)
scriptingEnabled
- true
to enableTreeBuilder.setScriptingEnabled(boolean)
public void setCheckingNormalization(boolean enable)
enable
- true
to check normalizationTokenizer.setCheckingNormalization(boolean)
public void setCommentPolicy(XmlViolationPolicy commentPolicy)
commentPolicy
- the policyTokenizer.setCommentPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
contentNonXmlCharPolicy
- the policyTokenizer.setContentNonXmlCharPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
contentSpacePolicy
- the policyTokenizer.setContentSpacePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
html4ModeCompatibleWithXhtml1Schemata
- public void setMappingLangToXmlLang(boolean mappingLangToXmlLang)
mappingLangToXmlLang
- Tokenizer.setMappingLangToXmlLang(boolean)
public void setNamePolicy(XmlViolationPolicy namePolicy)
namePolicy
- Tokenizer.setNamePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setXmlPolicy(XmlViolationPolicy xmlPolicy)
xmlPolicy
- public void setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
doctypeExpectation
- the doctypeExpectation to setTreeBuilder.setDoctypeExpectation(nu.validator.htmlparser.common.DoctypeExpectation)
public void setDocumentModeHandler(DocumentModeHandler documentModeHandler)
documentModeHandler
- TreeBuilder.setDocumentModeHandler(nu.validator.htmlparser.common.DocumentModeHandler)
public void setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
xmlns
attributes.
bogusXmlnsPolicy
- the policyTokenizer.setBogusXmlnsPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |