|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectnu.xom.Builder
nu.validator.htmlparser.xom.HtmlBuilder
public class HtmlBuilder
This class implements an HTML5 parser that exposes data through the XOM interface.
By default, when using the constructor without arguments, the
this parser coerces XML 1.0-incompatible infosets into XML 1.0-compatible
infosets. This corresponds to ALTER_INFOSET
as the general
XML violation policy. It is possible to treat XML 1.0 infoset violations
as fatal by setting the general XML violation policy to FATAL
.
The doctype is not represented in the tree.
The document mode is represented via the Mode
interface on the Document
node if the node implements
that interface (depends on the used node factory).
The form pointer is stored if the node factory supports storing it.
This package has its own node factory class because the official XOM node factory may return multiple nodes instead of one confusing the assumptions of the DOM-oriented HTML5 parsing algorithm.
Constructor Summary | |
---|---|
HtmlBuilder()
Constructor with default node factory and fatal XML violation policy. |
|
HtmlBuilder(SimpleNodeFactory nodeFactory)
Constructor with given node factory and fatal XML violation policy. |
|
HtmlBuilder(SimpleNodeFactory nodeFactory,
XmlViolationPolicy xmlPolicy)
Constructor with given node factory and given XML violation policy. |
|
HtmlBuilder(XmlViolationPolicy xmlPolicy)
Constructor with default node factory and given XML violation policy. |
Method Summary | |
---|---|
void |
addCharacterHandler(CharacterHandler characterHandler)
|
nu.xom.Document |
build(java.io.File file)
Parse from File . |
nu.xom.Document |
build(org.xml.sax.InputSource is)
Parse from SAX InputSource . |
nu.xom.Document |
build(java.io.InputStream stream)
Parse from InputStream . |
nu.xom.Document |
build(java.io.InputStream stream,
java.lang.String uri)
Parse from InputStream . |
nu.xom.Document |
build(java.io.Reader stream)
Parse from Reader . |
nu.xom.Document |
build(java.io.Reader stream,
java.lang.String uri)
Parse from Reader . |
nu.xom.Document |
build(java.lang.String uri)
Parse from URI. |
nu.xom.Document |
build(java.lang.String content,
java.lang.String uri)
Parse from String . |
nu.xom.Nodes |
buildFragment(org.xml.sax.InputSource is,
java.lang.String context)
Parse a fragment from SAX InputSource . |
XmlViolationPolicy |
getBogusXmlnsPolicy()
Deprecated. |
XmlViolationPolicy |
getCommentPolicy()
Returns the commentPolicy. |
XmlViolationPolicy |
getContentNonXmlCharPolicy()
Returns the contentNonXmlCharPolicy. |
XmlViolationPolicy |
getContentSpacePolicy()
Returns the contentSpacePolicy. |
DoctypeExpectation |
getDoctypeExpectation()
Returns the doctype expectation. |
org.xml.sax.Locator |
getDocumentLocator()
Returns the Locator during parse. |
DocumentModeHandler |
getDocumentModeHandler()
Returns the document mode handler. |
Heuristics |
getHeuristics()
|
XmlViolationPolicy |
getNamePolicy()
The policy for non-NCName element and attribute names. |
SimpleNodeFactory |
getSimpleNodeFactory()
Gets the node factory |
XmlViolationPolicy |
getStreamabilityViolationPolicy()
Returns the streamabilityViolationPolicy. |
XmlViolationPolicy |
getXmlnsPolicy()
Returns the xmlnsPolicy. |
boolean |
isCheckingNormalization()
Indicates whether NFC normalization of source is being checked. |
boolean |
isHtml4ModeCompatibleWithXhtml1Schemata()
Whether the HTML 4 mode reports boolean attributes in a way that repeats the name in the value. |
boolean |
isMappingLangToXmlLang()
Whether lang is mapped to xml:lang . |
boolean |
isReportingDoctype()
Returns the reportingDoctype. |
boolean |
isScriptingEnabled()
Whether the parser considers scripting to be enabled for noscript treatment. |
void |
setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
Deprecated. |
void |
setCheckingNormalization(boolean enable)
Toggles the checking of the NFC normalization of source. |
void |
setCommentPolicy(XmlViolationPolicy commentPolicy)
Sets the policy for consecutive hyphens in comments. |
void |
setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
Sets the policy for non-XML characters except white space. |
void |
setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
Sets the policy for non-XML white space. |
void |
setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
Sets the doctype expectation. |
void |
setDocumentModeHandler(DocumentModeHandler documentModeHandler)
Sets the document mode handler. |
void |
setEntityResolver(org.xml.sax.EntityResolver resolver)
|
void |
setErrorHandler(org.xml.sax.ErrorHandler handler)
|
void |
setHeuristics(Heuristics heuristics)
Sets the encoding sniffing heuristics. |
void |
setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
Whether the HTML 4 mode reports boolean attributes in a way that repeats the name in the value. |
void |
setIgnoringComments(boolean ignoreComments)
Sets whether comment nodes appear in the tree. |
void |
setMappingLangToXmlLang(boolean mappingLangToXmlLang)
Whether lang is mapped to xml:lang . |
void |
setNamePolicy(XmlViolationPolicy namePolicy)
The policy for non-NCName element and attribute names. |
void |
setReportingDoctype(boolean reportingDoctype)
|
void |
setScriptingEnabled(boolean scriptingEnabled)
Sets whether the parser considers scripting to be enabled for noscript treatment. |
void |
setStreamabilityViolationPolicy(XmlViolationPolicy streamabilityViolationPolicy)
Sets the streamabilityViolationPolicy. |
void |
setTransitionHander(TransitionHandler handler)
|
void |
setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
Whether the xmlns attribute on the root element is
passed to through. |
void |
setXmlPolicy(XmlViolationPolicy xmlPolicy)
This is a catch-all convenience method for setting name, xmlns, content space, content non-XML char and comment policies in one go. |
Methods inherited from class nu.xom.Builder |
---|
getNodeFactory |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public HtmlBuilder()
public HtmlBuilder(SimpleNodeFactory nodeFactory)
nodeFactory
- the factorypublic HtmlBuilder(XmlViolationPolicy xmlPolicy)
xmlPolicy
- the policypublic HtmlBuilder(SimpleNodeFactory nodeFactory, XmlViolationPolicy xmlPolicy)
nodeFactory
- the factoryxmlPolicy
- the policyMethod Detail |
---|
public nu.xom.Document build(org.xml.sax.InputSource is) throws nu.xom.ParsingException, java.io.IOException
InputSource
.
is
- the InputSource
nu.xom.ParsingException
- in case of an XML violation
java.io.IOException
- if IO goes wrangpublic nu.xom.Nodes buildFragment(org.xml.sax.InputSource is, java.lang.String context) throws java.io.IOException, nu.xom.ParsingException
InputSource
.
is
- the InputSource
context
- the name of the context element
nu.xom.ParsingException
- in case of an XML violation
java.io.IOException
- if IO goes wrangpublic nu.xom.Document build(java.io.File file) throws nu.xom.ParsingException, nu.xom.ValidityException, java.io.IOException
File
.
build
in class nu.xom.Builder
file
- the file
nu.xom.ParsingException
- in case of an XML violation
java.io.IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.io.File)
public nu.xom.Document build(java.io.InputStream stream, java.lang.String uri) throws nu.xom.ParsingException, nu.xom.ValidityException, java.io.IOException
InputStream
.
build
in class nu.xom.Builder
stream
- the streamuri
- the base URI
nu.xom.ParsingException
- in case of an XML violation
java.io.IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.io.InputStream, java.lang.String)
public nu.xom.Document build(java.io.InputStream stream) throws nu.xom.ParsingException, nu.xom.ValidityException, java.io.IOException
InputStream
.
build
in class nu.xom.Builder
stream
- the stream
nu.xom.ParsingException
- in case of an XML violation
java.io.IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.io.InputStream)
public nu.xom.Document build(java.io.Reader stream, java.lang.String uri) throws nu.xom.ParsingException, nu.xom.ValidityException, java.io.IOException
Reader
.
build
in class nu.xom.Builder
stream
- the readeruri
- the base URI
nu.xom.ParsingException
- in case of an XML violation
java.io.IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.io.Reader, java.lang.String)
public nu.xom.Document build(java.io.Reader stream) throws nu.xom.ParsingException, nu.xom.ValidityException, java.io.IOException
Reader
.
build
in class nu.xom.Builder
stream
- the reader
nu.xom.ParsingException
- in case of an XML violation
java.io.IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.io.Reader)
public nu.xom.Document build(java.lang.String content, java.lang.String uri) throws nu.xom.ParsingException, nu.xom.ValidityException, java.io.IOException
String
.
build
in class nu.xom.Builder
content
- the HTML source as stringuri
- the base URI
nu.xom.ParsingException
- in case of an XML violation
java.io.IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.lang.String, java.lang.String)
public nu.xom.Document build(java.lang.String uri) throws nu.xom.ParsingException, nu.xom.ValidityException, java.io.IOException
build
in class nu.xom.Builder
uri
- the URI of the document
nu.xom.ParsingException
- in case of an XML violation
java.io.IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.lang.String)
public SimpleNodeFactory getSimpleNodeFactory()
public void setEntityResolver(org.xml.sax.EntityResolver resolver)
XMLReader.setEntityResolver(org.xml.sax.EntityResolver)
public void setErrorHandler(org.xml.sax.ErrorHandler handler)
XMLReader.setErrorHandler(org.xml.sax.ErrorHandler)
public void setTransitionHander(TransitionHandler handler)
public boolean isCheckingNormalization()
true
if NFC normalization of source is being checked.nu.validator.htmlparser.impl.Tokenizer#isCheckingNormalization()
public void setCheckingNormalization(boolean enable)
enable
- true
to check normalizationnu.validator.htmlparser.impl.Tokenizer#setCheckingNormalization(boolean)
public void setCommentPolicy(XmlViolationPolicy commentPolicy)
commentPolicy
- the policyTokenizer.setCommentPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
contentNonXmlCharPolicy
- the policyTokenizer.setContentNonXmlCharPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
contentSpacePolicy
- the policyTokenizer.setContentSpacePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public boolean isScriptingEnabled()
true
if enabledTreeBuilder.isScriptingEnabled()
public void setScriptingEnabled(boolean scriptingEnabled)
scriptingEnabled
- true
to enableTreeBuilder.setScriptingEnabled(boolean)
public DoctypeExpectation getDoctypeExpectation()
public void setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
doctypeExpectation
- the doctypeExpectation to setTreeBuilder.setDoctypeExpectation(nu.validator.htmlparser.common.DoctypeExpectation)
public DocumentModeHandler getDocumentModeHandler()
public void setDocumentModeHandler(DocumentModeHandler documentModeHandler)
documentModeHandler
- the documentModeHandler to setTreeBuilder.setDocumentModeHandler(nu.validator.htmlparser.common.DocumentModeHandler)
public XmlViolationPolicy getStreamabilityViolationPolicy()
public void setStreamabilityViolationPolicy(XmlViolationPolicy streamabilityViolationPolicy)
streamabilityViolationPolicy
- the streamabilityViolationPolicy to setpublic void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
html4ModeCompatibleWithXhtml1Schemata
- public org.xml.sax.Locator getDocumentLocator()
Locator
during parse.
Locator
public boolean isHtml4ModeCompatibleWithXhtml1Schemata()
public void setMappingLangToXmlLang(boolean mappingLangToXmlLang)
lang
is mapped to xml:lang
.
mappingLangToXmlLang
- Tokenizer.setMappingLangToXmlLang(boolean)
public boolean isMappingLangToXmlLang()
lang
is mapped to xml:lang
.
public void setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
xmlns
attribute on the root element is
passed to through. (FATAL not allowed.)
xmlnsPolicy
- Tokenizer.setXmlnsPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public XmlViolationPolicy getXmlnsPolicy()
public XmlViolationPolicy getCommentPolicy()
public XmlViolationPolicy getContentNonXmlCharPolicy()
public XmlViolationPolicy getContentSpacePolicy()
public void setReportingDoctype(boolean reportingDoctype)
reportingDoctype
- TreeBuilder.setReportingDoctype(boolean)
public boolean isReportingDoctype()
public void setNamePolicy(XmlViolationPolicy namePolicy)
namePolicy
- Tokenizer.setNamePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setHeuristics(Heuristics heuristics)
heuristics
- the heuristics to setnu.validator.htmlparser.impl.Tokenizer#setHeuristics(nu.validator.htmlparser.common.Heuristics)
public Heuristics getHeuristics()
public void setXmlPolicy(XmlViolationPolicy xmlPolicy)
xmlPolicy
- public XmlViolationPolicy getNamePolicy()
public void setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
public XmlViolationPolicy getBogusXmlnsPolicy()
XmlViolationPolicy.ALTER_INFOSET
.
XmlViolationPolicy.ALTER_INFOSET
public void addCharacterHandler(CharacterHandler characterHandler)
public void setIgnoringComments(boolean ignoreComments)
ignoreComments
- true
to ignore commentsTreeBuilder.setIgnoringComments(boolean)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |