|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectnu.xom.Builder
nu.validator.htmlparser.xom.HtmlBuilder
public class HtmlBuilder
This class implements an HTML5 parser that exposes data through the XOM interface.
By default, when using the constructor without arguments, the
this parser treats XML 1.0-incompatible infosets as fatal errors.
This corresponds to
FATAL
as the general XML violation policy. Handling
all input without fatal errors and without
violating the XOM API contract is possible by setting
the general XML violation policy to ALTER_INFOSET
. This
makes the parser non-conforming but is probably the most useful
setting for most applications.
The doctype is not represented in the tree.
The document mode is represented via the Mode
interface on the Document
node if the node implements
that interface (depends on the used node factory).
The form pointer is stored if the node factory supports storing it.
This package has its own node factory class because the official XOM node factory may return multiple nodes instead of one confusing the assumptions of the DOM-oriented HTML5 parsing algorithm.
Field Summary | |
---|---|
private EntityResolver |
entityResolver
|
private SimpleNodeFactory |
simpleNodeFactory
|
private Tokenizer |
tokenizer
|
private XOMTreeBuilder |
xomTreeBuilder
|
Constructor Summary | |
---|---|
HtmlBuilder()
Constructor with default node factory and fatal XML violation policy. |
|
HtmlBuilder(SimpleNodeFactory nodeFactory)
Constructor with given node factory and fatal XML violation policy. |
|
HtmlBuilder(SimpleNodeFactory nodeFactory,
XmlViolationPolicy xmlPolicy)
Constructor with given node factory and given XML violation policy. |
|
HtmlBuilder(XmlViolationPolicy xmlPolicy)
Constructor with default node factory and given XML violation policy. |
Method Summary | |
---|---|
nu.xom.Document |
build(File file)
Parse from File . |
nu.xom.Document |
build(InputSource is)
Parse from SAX InputSource . |
nu.xom.Document |
build(InputStream stream)
Parse from InputStream . |
nu.xom.Document |
build(InputStream stream,
String uri)
Parse from InputStream . |
nu.xom.Document |
build(Reader stream)
Parse from Reader . |
nu.xom.Document |
build(Reader stream,
String uri)
Parse from Reader . |
nu.xom.Document |
build(String uri)
Parse from URI. |
nu.xom.Document |
build(String content,
String uri)
Parse from String . |
nu.xom.Nodes |
buildFragment(InputSource is,
String context)
Parse a fragment from SAX InputSource . |
SimpleNodeFactory |
getSimpleNodeFactory()
Gets the node factory |
void |
setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
Sets the policy for forbidden xmlns attributes. |
void |
setCheckingNormalization(boolean enable)
Toggles the checking of the NFC normalization of source. |
void |
setCommentPolicy(XmlViolationPolicy commentPolicy)
Sets the policy for consecutive hyphens in comments. |
void |
setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
Sets the policy for non-XML characters except white space. |
void |
setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
Sets the policy for non-XML white space. |
void |
setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
Sets the doctype expectation. |
void |
setDocumentModeHandler(DocumentModeHandler documentModeHandler)
Sets the document mode handler. |
void |
setEntityResolver(EntityResolver resolver)
Sets the entity resolver for URI-only inputs. |
void |
setErrorHandler(ErrorHandler errorHandler)
|
void |
setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
Whether the HTML 4 mode reports boolean attributes in a way that repeats the name in the value. |
void |
setIgnoringComments(boolean ignoreComments)
Sets whether comment nodes appear in the tree. |
void |
setMappingLangToXmlLang(boolean mappingLangToXmlLang)
|
void |
setNamePolicy(XmlViolationPolicy namePolicy)
|
void |
setScriptingEnabled(boolean scriptingEnabled)
Sets whether the parser considers scripting to be enabled for noscript treatment. |
void |
setXmlPolicy(XmlViolationPolicy xmlPolicy)
This is a catch-all convenience method for setting name, content space, content non-XML char and comment policies in one go. |
private void |
tokenize(InputSource is)
|
Methods inherited from class nu.xom.Builder |
---|
getNodeFactory |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private final Tokenizer tokenizer
private final XOMTreeBuilder xomTreeBuilder
private final SimpleNodeFactory simpleNodeFactory
private EntityResolver entityResolver
Constructor Detail |
---|
public HtmlBuilder()
public HtmlBuilder(SimpleNodeFactory nodeFactory)
nodeFactory
- the factorypublic HtmlBuilder(XmlViolationPolicy xmlPolicy)
xmlPolicy
- the policypublic HtmlBuilder(SimpleNodeFactory nodeFactory, XmlViolationPolicy xmlPolicy)
nodeFactory
- the factoryxmlPolicy
- the policyMethod Detail |
---|
private void tokenize(InputSource is) throws nu.xom.ParsingException, IOException, MalformedURLException
nu.xom.ParsingException
IOException
MalformedURLException
public nu.xom.Document build(InputSource is) throws nu.xom.ParsingException, IOException
InputSource
.
is
- the InputSource
nu.xom.ParsingException
- in case of an XML violation
IOException
- if IO goes wrangpublic nu.xom.Nodes buildFragment(InputSource is, String context) throws IOException, nu.xom.ParsingException
InputSource
.
is
- the InputSource
context
- the name of the context element
nu.xom.ParsingException
- in case of an XML violation
IOException
- if IO goes wrangpublic nu.xom.Document build(File file) throws nu.xom.ParsingException, nu.xom.ValidityException, IOException
File
.
build
in class nu.xom.Builder
file
- the file
nu.xom.ParsingException
- in case of an XML violation
IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.io.File)
public nu.xom.Document build(InputStream stream, String uri) throws nu.xom.ParsingException, nu.xom.ValidityException, IOException
InputStream
.
build
in class nu.xom.Builder
stream
- the streamuri
- the base URI
nu.xom.ParsingException
- in case of an XML violation
IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.io.InputStream, java.lang.String)
public nu.xom.Document build(InputStream stream) throws nu.xom.ParsingException, nu.xom.ValidityException, IOException
InputStream
.
build
in class nu.xom.Builder
stream
- the stream
nu.xom.ParsingException
- in case of an XML violation
IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.io.InputStream)
public nu.xom.Document build(Reader stream, String uri) throws nu.xom.ParsingException, nu.xom.ValidityException, IOException
Reader
.
build
in class nu.xom.Builder
stream
- the readeruri
- the base URI
nu.xom.ParsingException
- in case of an XML violation
IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.io.Reader, java.lang.String)
public nu.xom.Document build(Reader stream) throws nu.xom.ParsingException, nu.xom.ValidityException, IOException
Reader
.
build
in class nu.xom.Builder
stream
- the reader
nu.xom.ParsingException
- in case of an XML violation
IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.io.Reader)
public nu.xom.Document build(String content, String uri) throws nu.xom.ParsingException, nu.xom.ValidityException, IOException
String
.
build
in class nu.xom.Builder
content
- the HTML source as stringuri
- the base URI
nu.xom.ParsingException
- in case of an XML violation
IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.lang.String, java.lang.String)
public nu.xom.Document build(String uri) throws nu.xom.ParsingException, nu.xom.ValidityException, IOException
build
in class nu.xom.Builder
uri
- the URI of the document
nu.xom.ParsingException
- in case of an XML violation
IOException
- if IO goes wrang
nu.xom.ValidityException
Builder.build(java.lang.String)
public SimpleNodeFactory getSimpleNodeFactory()
public void setEntityResolver(EntityResolver resolver)
resolver
- the resolverDocumentBuilder.setEntityResolver(org.xml.sax.EntityResolver)
public void setErrorHandler(ErrorHandler errorHandler)
DocumentBuilder.setErrorHandler(org.xml.sax.ErrorHandler)
public void setIgnoringComments(boolean ignoreComments)
ignoreComments
- true
to ignore commentsTreeBuilder.setIgnoringComments(boolean)
public void setScriptingEnabled(boolean scriptingEnabled)
scriptingEnabled
- true
to enableTreeBuilder.setScriptingEnabled(boolean)
public void setCheckingNormalization(boolean enable)
enable
- true
to check normalizationTokenizer.setCheckingNormalization(boolean)
public void setCommentPolicy(XmlViolationPolicy commentPolicy)
commentPolicy
- the policyTokenizer.setCommentPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
contentNonXmlCharPolicy
- the policyTokenizer.setContentNonXmlCharPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
contentSpacePolicy
- the policyTokenizer.setContentSpacePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
html4ModeCompatibleWithXhtml1Schemata
- public void setMappingLangToXmlLang(boolean mappingLangToXmlLang)
mappingLangToXmlLang
- Tokenizer.setMappingLangToXmlLang(boolean)
public void setNamePolicy(XmlViolationPolicy namePolicy)
namePolicy
- Tokenizer.setNamePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
public void setXmlPolicy(XmlViolationPolicy xmlPolicy)
xmlPolicy
- public void setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
doctypeExpectation
- the doctypeExpectation to setTreeBuilder.setDoctypeExpectation(nu.validator.htmlparser.common.DoctypeExpectation)
public void setDocumentModeHandler(DocumentModeHandler documentModeHandler)
documentModeHandler
- TreeBuilder.setDocumentModeHandler(nu.validator.htmlparser.common.DocumentModeHandler)
public void setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
xmlns
attributes.
bogusXmlnsPolicy
- the policyTokenizer.setBogusXmlnsPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |