| 
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectnu.validator.htmlparser.sax.HtmlParser
public class HtmlParser
This class implements an HTML5 parser that exposes data through the SAX2 interface.
By default, when using the constructor without arguments, the 
 this parser coerces XML 1.0-incompatible infosets into XML 1.0-compatible
 infosets. This corresponds to ALTER_INFOSET as the general 
 XML violation policy. To make the parser support non-conforming HTML fully 
 per the HTML 5 spec while on the other hand potentially violating the SAX2 
 API contract, set the general XML violation policy to ALLOW. 
 It is possible to treat XML 1.0 infoset violations as fatal by setting 
 the general XML violation policy to FATAL. 
 
 
By default, this parser doesn't do true streaming but buffers everything 
 first. The parser can be made truly streaming by calling 
 setStreamabilityViolationPolicy(XmlViolationPolicy.FATAL). This 
 has the consequence that errors that require non-streamable recovery are 
 treated as fatal.
 
 
By default, in order to make the parse events emulate the parse events 
 for a DTDless XML document, the parser does not report the doctype through 
 LexicalHandler. Doctype reporting through 
 LexicalHandler can be turned on by calling 
 setReportingDoctype(true).
| Constructor Summary | |
|---|---|
HtmlParser()
Instantiates the parser with a fatal XML violation policy.  | 
|
HtmlParser(XmlViolationPolicy xmlPolicy)
Instantiates the parser with a specific XML violation policy.  | 
|
| Method Summary | |
|---|---|
 void | 
addCharacterHandler(CharacterHandler characterHandler)
 | 
 XmlViolationPolicy | 
getBogusXmlnsPolicy()
Deprecated.  | 
 XmlViolationPolicy | 
getCommentPolicy()
Returns the commentPolicy.  | 
 org.xml.sax.ContentHandler | 
getContentHandler()
 | 
 XmlViolationPolicy | 
getContentNonXmlCharPolicy()
Returns the contentNonXmlCharPolicy.  | 
 XmlViolationPolicy | 
getContentSpacePolicy()
Returns the contentSpacePolicy.  | 
 DoctypeExpectation | 
getDoctypeExpectation()
Returns the doctype expectation.  | 
 org.xml.sax.Locator | 
getDocumentLocator()
Returns the Locator during parse. | 
 DocumentModeHandler | 
getDocumentModeHandler()
Returns the document mode handler.  | 
 org.xml.sax.DTDHandler | 
getDTDHandler()
 | 
 org.xml.sax.EntityResolver | 
getEntityResolver()
 | 
 org.xml.sax.ErrorHandler | 
getErrorHandler()
 | 
 boolean | 
getFeature(java.lang.String name)
Exposes the configuration of the emulated XML parser as well as boolean-valued configuration without using non- XMLReader
 getters directly. | 
 Heuristics | 
getHeuristics()
 | 
 org.xml.sax.ext.LexicalHandler | 
getLexicalHandler()
Returns the lexicalHandler.  | 
 XmlViolationPolicy | 
getNamePolicy()
The policy for non-NCName element and attribute names.  | 
 java.lang.Object | 
getProperty(java.lang.String name)
Allows XMLReader-level access to non-boolean valued
 getters. | 
 XmlViolationPolicy | 
getStreamabilityViolationPolicy()
Returns the streamabilityViolationPolicy.  | 
 XmlViolationPolicy | 
getXmlnsPolicy()
Returns the xmlnsPolicy.  | 
 boolean | 
isCheckingNormalization()
Indicates whether NFC normalization of source is being checked.  | 
 boolean | 
isHtml4ModeCompatibleWithXhtml1Schemata()
Whether the HTML 4 mode reports boolean attributes in a way that repeats the name in the value.  | 
 boolean | 
isMappingLangToXmlLang()
Whether lang is mapped to xml:lang. | 
 boolean | 
isReportingDoctype()
Returns the reportingDoctype.  | 
 boolean | 
isScriptingEnabled()
Whether the parser considers scripting to be enabled for noscript treatment.  | 
 void | 
parse(org.xml.sax.InputSource input)
 | 
 void | 
parse(java.lang.String systemId)
 | 
 void | 
parseFragment(org.xml.sax.InputSource input,
                           java.lang.String context)
Parses a fragment.  | 
 void | 
setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
Deprecated.  | 
 void | 
setCheckingNormalization(boolean enable)
Toggles the checking of the NFC normalization of source.  | 
 void | 
setCommentPolicy(XmlViolationPolicy commentPolicy)
Sets the policy for consecutive hyphens in comments.  | 
 void | 
setContentHandler(org.xml.sax.ContentHandler handler)
 | 
 void | 
setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
Sets the policy for non-XML characters except white space.  | 
 void | 
setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
Sets the policy for non-XML white space.  | 
 void | 
setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
Sets the doctype expectation.  | 
 void | 
setDocumentModeHandler(DocumentModeHandler documentModeHandler)
Sets the document mode handler.  | 
 void | 
setDTDHandler(org.xml.sax.DTDHandler handler)
 | 
 void | 
setEntityResolver(org.xml.sax.EntityResolver resolver)
 | 
 void | 
setErrorHandler(org.xml.sax.ErrorHandler handler)
 | 
 void | 
setErrorProfile(java.util.HashMap<java.lang.String,java.lang.String> errorProfileMap)
 | 
 void | 
setFeature(java.lang.String name,
                     boolean value)
Sets a boolean feature without having to use non- XMLReader
 setters directly. | 
 void | 
setHeuristics(Heuristics heuristics)
Sets the encoding sniffing heuristics.  | 
 void | 
setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
Whether the HTML 4 mode reports boolean attributes in a way that repeats the name in the value.  | 
 void | 
setLexicalHandler(org.xml.sax.ext.LexicalHandler handler)
Sets the lexical handler.  | 
 void | 
setMappingLangToXmlLang(boolean mappingLangToXmlLang)
Whether lang is mapped to xml:lang. | 
 void | 
setNamePolicy(XmlViolationPolicy namePolicy)
The policy for non-NCName element and attribute names.  | 
 void | 
setProperty(java.lang.String name,
                       java.lang.Object value)
Sets a non-boolean property without having to use non- XMLReader
 setters directly. | 
 void | 
setReportingDoctype(boolean reportingDoctype)
 | 
 void | 
setScriptingEnabled(boolean scriptingEnabled)
Sets whether the parser considers scripting to be enabled for noscript treatment.  | 
 void | 
setStreamabilityViolationPolicy(XmlViolationPolicy streamabilityViolationPolicy)
Sets the streamabilityViolationPolicy.  | 
 void | 
setTransitionHandler(TransitionHandler handler)
 | 
 void | 
setTreeBuilderErrorHandlerOverride(org.xml.sax.ErrorHandler handler)
Deprecated. For Validator.nu internal use  | 
 void | 
setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
Whether the xmlns attribute on the root element is 
 passed to through. | 
 void | 
setXmlPolicy(XmlViolationPolicy xmlPolicy)
This is a catch-all convenience method for setting name, xmlns, content space, content non-XML char and comment policies in one go.  | 
| Methods inherited from class java.lang.Object | 
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Constructor Detail | 
|---|
public HtmlParser()
public HtmlParser(XmlViolationPolicy xmlPolicy)
xmlPolicy - the policy| Method Detail | 
|---|
public org.xml.sax.ContentHandler getContentHandler()
getContentHandler in interface org.xml.sax.XMLReaderXMLReader.getContentHandler()public org.xml.sax.DTDHandler getDTDHandler()
getDTDHandler in interface org.xml.sax.XMLReaderXMLReader.getDTDHandler()public org.xml.sax.EntityResolver getEntityResolver()
getEntityResolver in interface org.xml.sax.XMLReaderXMLReader.getEntityResolver()public org.xml.sax.ErrorHandler getErrorHandler()
getErrorHandler in interface org.xml.sax.XMLReaderXMLReader.getErrorHandler()
public boolean getFeature(java.lang.String name)
                   throws org.xml.sax.SAXNotRecognizedException,
                          org.xml.sax.SAXNotSupportedException
XMLReader
 getters directly.
 
 http://xml.org/sax/features/external-general-entitiesfalsehttp://xml.org/sax/features/external-parameter-entitiesfalsehttp://xml.org/sax/features/is-standalonetruehttp://xml.org/sax/features/lexical-handler/parameter-entitiesfalsehttp://xml.org/sax/features/namespacestruehttp://xml.org/sax/features/namespace-prefixesfalsehttp://xml.org/sax/features/resolve-dtd-uristruehttp://xml.org/sax/features/string-interningfalsehttp://xml.org/sax/features/unicode-normalization-checkingisCheckingNormalizationhttp://xml.org/sax/features/use-attributes2falsehttp://xml.org/sax/features/use-locator2falsehttp://xml.org/sax/features/use-entity-resolver2falsehttp://xml.org/sax/features/validationfalsehttp://xml.org/sax/features/xmlns-urisfalsehttp://xml.org/sax/features/xml-1.1falsehttp://validator.nu/features/html4-mode-compatible-with-xhtml1-schemataisHtml4ModeCompatibleWithXhtml1Schematahttp://validator.nu/features/mapping-lang-to-xml-langisMappingLangToXmlLanghttp://validator.nu/features/scripting-enabledisScriptingEnabled
getFeature in interface org.xml.sax.XMLReadername - feature URI string
org.xml.sax.SAXNotRecognizedException
org.xml.sax.SAXNotSupportedExceptionXMLReader.getFeature(java.lang.String)
public java.lang.Object getProperty(java.lang.String name)
                             throws org.xml.sax.SAXNotRecognizedException,
                                    org.xml.sax.SAXNotSupportedException
XMLReader-level access to non-boolean valued
 getters.
 
 The properties are mapped as follows:
http://xml.org/sax/properties/document-xml-version"1.0"http://xml.org/sax/properties/lexical-handlergetLexicalHandlerhttp://validator.nu/properties/content-space-policygetContentSpacePolicyhttp://validator.nu/properties/content-non-xml-char-policygetContentNonXmlCharPolicyhttp://validator.nu/properties/comment-policygetCommentPolicyhttp://validator.nu/properties/xmlns-policygetXmlnsPolicyhttp://validator.nu/properties/name-policygetNamePolicyhttp://validator.nu/properties/streamability-violation-policygetStreamabilityViolationPolicyhttp://validator.nu/properties/document-mode-handlergetDocumentModeHandlerhttp://validator.nu/properties/doctype-expectationgetDoctypeExpectationhttp://xml.org/sax/features/unicode-normalization-checking
getProperty in interface org.xml.sax.XMLReadername - property URI string
org.xml.sax.SAXNotRecognizedException
org.xml.sax.SAXNotSupportedExceptionXMLReader.getProperty(java.lang.String)
public void parse(org.xml.sax.InputSource input)
           throws java.io.IOException,
                  org.xml.sax.SAXException
parse in interface org.xml.sax.XMLReaderjava.io.IOException
org.xml.sax.SAXExceptionXMLReader.parse(org.xml.sax.InputSource)
public void parseFragment(org.xml.sax.InputSource input,
                          java.lang.String context)
                   throws java.io.IOException,
                          org.xml.sax.SAXException
input - the input to parsecontext - the name of the context element
java.io.IOException
org.xml.sax.SAXException
public void parse(java.lang.String systemId)
           throws java.io.IOException,
                  org.xml.sax.SAXException
parse in interface org.xml.sax.XMLReaderjava.io.IOException
org.xml.sax.SAXExceptionXMLReader.parse(java.lang.String)public void setContentHandler(org.xml.sax.ContentHandler handler)
setContentHandler in interface org.xml.sax.XMLReaderXMLReader.setContentHandler(org.xml.sax.ContentHandler)public void setLexicalHandler(org.xml.sax.ext.LexicalHandler handler)
handler - the hander.public void setDTDHandler(org.xml.sax.DTDHandler handler)
setDTDHandler in interface org.xml.sax.XMLReaderXMLReader.setDTDHandler(org.xml.sax.DTDHandler)public void setEntityResolver(org.xml.sax.EntityResolver resolver)
setEntityResolver in interface org.xml.sax.XMLReaderXMLReader.setEntityResolver(org.xml.sax.EntityResolver)public void setErrorHandler(org.xml.sax.ErrorHandler handler)
setErrorHandler in interface org.xml.sax.XMLReaderXMLReader.setErrorHandler(org.xml.sax.ErrorHandler)public void setTransitionHandler(TransitionHandler handler)
public void setTreeBuilderErrorHandlerOverride(org.xml.sax.ErrorHandler handler)
XMLReader.setErrorHandler(org.xml.sax.ErrorHandler)
public void setFeature(java.lang.String name,
                       boolean value)
                throws org.xml.sax.SAXNotRecognizedException,
                       org.xml.sax.SAXNotSupportedException
XMLReader
 setters directly.
 
 The supported features are:
http://xml.org/sax/features/unicode-normalization-checkingsetCheckingNormalizationhttp://validator.nu/features/html4-mode-compatible-with-xhtml1-schematasetHtml4ModeCompatibleWithXhtml1Schematahttp://validator.nu/features/mapping-lang-to-xml-langsetMappingLangToXmlLanghttp://validator.nu/features/scripting-enabledsetScriptingEnabled
setFeature in interface org.xml.sax.XMLReaderorg.xml.sax.SAXNotRecognizedException
org.xml.sax.SAXNotSupportedExceptionXMLReader.setFeature(java.lang.String, boolean)
public void setProperty(java.lang.String name,
                        java.lang.Object value)
                 throws org.xml.sax.SAXNotRecognizedException,
                        org.xml.sax.SAXNotSupportedException
XMLReader
 setters directly.
 
 http://xml.org/sax/properties/lexical-handlersetLexicalHandlerhttp://validator.nu/properties/content-space-policysetContentSpacePolicyhttp://validator.nu/properties/content-non-xml-char-policysetContentNonXmlCharPolicyhttp://validator.nu/properties/comment-policysetCommentPolicyhttp://validator.nu/properties/xmlns-policysetXmlnsPolicyhttp://validator.nu/properties/name-policysetNamePolicyhttp://validator.nu/properties/streamability-violation-policysetStreamabilityViolationPolicyhttp://validator.nu/properties/document-mode-handlersetDocumentModeHandlerhttp://validator.nu/properties/doctype-expectationsetDoctypeExpectationhttp://validator.nu/properties/xml-policysetXmlPolicy
setProperty in interface org.xml.sax.XMLReaderorg.xml.sax.SAXNotRecognizedException
org.xml.sax.SAXNotSupportedExceptionXMLReader.setProperty(java.lang.String,
      java.lang.Object)public boolean isCheckingNormalization()
true if NFC normalization of source is being checked.nu.validator.htmlparser.impl.Tokenizer#isCheckingNormalization()public void setCheckingNormalization(boolean enable)
enable - true to check normalizationnu.validator.htmlparser.impl.Tokenizer#setCheckingNormalization(boolean)public void setCommentPolicy(XmlViolationPolicy commentPolicy)
commentPolicy - the policyTokenizer.setCommentPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public void setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
contentNonXmlCharPolicy - the policyTokenizer.setContentNonXmlCharPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public void setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
contentSpacePolicy - the policyTokenizer.setContentSpacePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public boolean isScriptingEnabled()
true if enabledTreeBuilder.isScriptingEnabled()public void setScriptingEnabled(boolean scriptingEnabled)
scriptingEnabled - true to enableTreeBuilder.setScriptingEnabled(boolean)public DoctypeExpectation getDoctypeExpectation()
public void setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
doctypeExpectation - the doctypeExpectation to setTreeBuilder.setDoctypeExpectation(nu.validator.htmlparser.common.DoctypeExpectation)public DocumentModeHandler getDocumentModeHandler()
public void setDocumentModeHandler(DocumentModeHandler documentModeHandler)
documentModeHandler - the documentModeHandler to setTreeBuilder.setDocumentModeHandler(nu.validator.htmlparser.common.DocumentModeHandler)public XmlViolationPolicy getStreamabilityViolationPolicy()
public void setStreamabilityViolationPolicy(XmlViolationPolicy streamabilityViolationPolicy)
streamabilityViolationPolicy - the streamabilityViolationPolicy to setpublic void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
html4ModeCompatibleWithXhtml1Schemata - public org.xml.sax.Locator getDocumentLocator()
Locator during parse.
Locatorpublic boolean isHtml4ModeCompatibleWithXhtml1Schemata()
public void setMappingLangToXmlLang(boolean mappingLangToXmlLang)
lang is mapped to xml:lang.
mappingLangToXmlLang - Tokenizer.setMappingLangToXmlLang(boolean)public boolean isMappingLangToXmlLang()
lang is mapped to xml:lang.
public void setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
xmlns attribute on the root element is 
 passed to through. (FATAL not allowed.)
xmlnsPolicy - Tokenizer.setXmlnsPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public XmlViolationPolicy getXmlnsPolicy()
public org.xml.sax.ext.LexicalHandler getLexicalHandler()
public XmlViolationPolicy getCommentPolicy()
public XmlViolationPolicy getContentNonXmlCharPolicy()
public XmlViolationPolicy getContentSpacePolicy()
public void setReportingDoctype(boolean reportingDoctype)
reportingDoctype - TreeBuilder.setReportingDoctype(boolean)public boolean isReportingDoctype()
public void setErrorProfile(java.util.HashMap<java.lang.String,java.lang.String> errorProfileMap)
errorProfile - nu.validator.htmlparser.impl.errorReportingTokenizer#setErrorProfile(set)public void setNamePolicy(XmlViolationPolicy namePolicy)
namePolicy - Tokenizer.setNamePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public void setHeuristics(Heuristics heuristics)
heuristics - the heuristics to setnu.validator.htmlparser.impl.Tokenizer#setHeuristics(nu.validator.htmlparser.common.Heuristics)public Heuristics getHeuristics()
public void setXmlPolicy(XmlViolationPolicy xmlPolicy)
xmlPolicy - public XmlViolationPolicy getNamePolicy()
public void setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
public XmlViolationPolicy getBogusXmlnsPolicy()
XmlViolationPolicy.ALTER_INFOSET.
XmlViolationPolicy.ALTER_INFOSETpublic void addCharacterHandler(CharacterHandler characterHandler)
  | 
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||