nu.validator.htmlparser.impl
Class Tokenizer

java.lang.Object
  extended by nu.validator.htmlparser.impl.Tokenizer
All Implemented Interfaces:
Locator

public final class Tokenizer
extends Object
implements Locator

An implementatition of http://www.whatwg.org/specs/web-apps/current-work/multipage/section-tokenisation.html This class implements the Locator interface. This is not an incidental implementation detail: Users of this class are encouraged to make use of the Locator nature. By default, the tokenizer may report data that XML 1.0 bans. The tokenizer can be configured to treat these conditions as fatal or to coerce the infoset to something that XML 1.0 allows.

Version:
$Id: Tokenizer.java 166 2007-10-14 19:42:57Z hsivonen $
Author:
hsivonen

Nested Class Summary
private static class Tokenizer.CommentState
           
 
Field Summary
private  boolean alreadyComplainedAboutNonAscii
          Used together with nonAsciiProhibited.
private  boolean alreadyWarnedAboutPrivateUseCharacters
          Keeps track of PUA warnings.
private  char[] astralChar
          Buffer for expanding astral NCRs.
private  String attributeName
          The current attribute name.
private  AttributesImpl attributes
          The attribute holder.
private  char[] bmpChar
          Buffer for expanding NCRs falling into the Basic Multilingual Plane.
private  XmlViolationPolicy bogusXmlnsPolicy
           
private  char[] buf
          The main input buffer that the tokenizer reads from.
private static int BUFFER_GROW_BY
          Buffer growth parameter.
private  int bufLen
          The number of chars in buf that have meaning.
private  CharacterHandler[] characterHandlers
          Used for NFC checking if non-null, source code capture, etc.
private  int col
          The current column number in the current resource being tokenized.
private  int colPrev
           
private  XmlViolationPolicy commentPolicy
          The policy for comments.
private  String contentModelElement
          The element whose end tag closes the current CDATA or RCDATA element.
private  ContentModelFlag contentModelFlag
          http://www.whatwg.org/specs/web-apps/current-work/#content2
private  XmlViolationPolicy contentNonXmlCharPolicy
          The policy for non-space non-XML characters.
private  XmlViolationPolicy contentSpacePolicy
          The policy for vertical tab and form feed.
private  int cstart
          The index of the first char in buf that is part of a coalesced run of character tokens or -1 if there is not a current run being coalesced.
private  String doctypeName
          The name of the current doctype token.
private  boolean endTag
          true if tokenizing an end tag
private  ErrorHandler errorHandler
          The error handler.
private  boolean escapeFlag
          http://www.whatwg.org/specs/web-apps/current-work/#escape
private  boolean html4
          true when HTML4-specific additional errors are requested.
private  boolean html4ModeCompatibleWithXhtml1Schemata
           
private  boolean inContent
          true when in text content or in attribute value.
private static int LEAD_OFFSET
          Magic value for UTF-16 operations.
private static char[] LF
          Array version of line feed.
private  int line
          The current line number in the current resource being parsed.
private  int linePrev
           
private  char[] longStrBuf
          Buffer for long strings.
private  int longStrBufLen
          Number of significant chars in longStrBuf.
private  char longStrBufPending
          If not U+0000, a pending code unit to be appended to longStrBuf.
private static char[] LT_GT
          UTF-16 code unit array containing less than and greater than for emitting those characters on certain parse errors.
private static char[] LT_SOLIDUS
          UTF-16 code unit array containing less than and solidus for emitting those characters on certain parse errors.
private  boolean mappingLangToXmlLang
           
private  boolean metaBoundaryPassed
          Whether the stream is past the first 512 bytes.
private  XmlViolationPolicy namePolicy
           
private static Pattern NCNAME_PATTERN
           
private  boolean nextCharOnNewLine
           
private  boolean nonAsciiProhibited
          Whether non-ASCII causes an error.
private static char[] OCTYPE
          "octype" as char[]
private  int pos
          The index of the last char read from buf.
private  char prev
          The previous char read from the buffer with infoset alteration applied except for CR.
private  char[] prevFour
          Lookbehind buffer for magic RCDATA/CDATA escaping.
private  int prevFourPtr
          Points to the last char written to prevFour.
private  String publicId
          The SAX public id for the resource being tokenized.
private  String publicIdentifier
          The public id of the current doctype token.
private  Reader reader
          The input UTF-16 code unit stream.
private static char[] REPLACEMENT_CHARACTER
          Array version of U+FFFD.
private  boolean shouldAddAttributes
          If false, addAttribute*() are no-ops.
private static char[] SPACE
          Array version of space.
private  char[] strBuf
          Buffer for short identifiers.
private  int strBufLen
          Number of significant chars in strBuf.
private static int SURROGATE_OFFSET
          Magic value for UTF-16 operations.
private  boolean swallowBom
           
private  String systemId
          The SAX system id for the resource being tokenized.
private  String systemIdentifier
          The system id of the current doctype token.
private  String tagName
          The current tag token name.
private  TokenHandler tokenHandler
          The token handler.
private static char[] UBLIC
          "ublic" as char[]
private  int unreadBuffer
          Single code unit buffer for reconsuming an input character.
private static String[] VOID_ELEMENTS
          Lexically sorted void element names
private  boolean wantsComments
          Whether comment tokens are emitted.
private  XmlViolationPolicy xmlnsPolicy
           
private static char[] YSTEM
          "ystem" as char[]
 
Constructor Summary
Tokenizer(TokenHandler tokenHandler)
          The constuctor.
 
Method Summary
private  void addAttributeWithoutValue()
           
private  void addAttributeWithValue()
           
 void addCharacterHandler(CharacterHandler characterHandler)
           
private  boolean afterAttributeNameState()
          After attribute name state
private  void afterDoctypeNameState()
          After DOCTYPE name state
private  void afterDoctypePublicIdentifierState()
          After DOCTYPE public identifier state
private  void afterDoctypeSystemIdentifierState()
          After DOCTYPE system identifier state
private  void appendLongStrBuf(char c)
          Appends to the larger buffer.
private  void appendLongStrBuf(char[] arr)
          Appends to the larger buffer.
private  void appendStrBuf(char c)
          Appends to the smaller buffer.
private  void appendStrBufToLongStrBuf()
          Append the contents of the smaller buffer to the larger one.
private  void appendToComment(char c)
          Appends to the larger buffer when it is used to buffer a comment.
private  void attributeNameComplete()
           
private  boolean attributeNameState()
          Attribute name state
private  boolean attributeValueDoubleQuotedState()
          Attribute value (double-quoted) state
private  boolean attributeValueSingleQuotedState()
          Attribute value (single-quoted) state
private  boolean attributeValueUnquotedState()
          Attribute value (unquoted) state
private  void beforeAttributeNameState()
          This method implements a wrapper loop for the attribute-related states to avoid recursion to an arbitrary depth.
private  boolean beforeAttributeNameStateImpl()
          Before attribute name state
private  boolean beforeAttributeValueState()
          Before attribute value state
private  void beforeDoctypeNameState()
          Before DOCTYPE name state
private  void beforeDoctypePublicIdentifierState()
          Before DOCTYPE public identifier state
private  void beforeDoctypeSystemIdentifierState()
          Before DOCTYPE system identifier state
private  void bogusCommentState()
          Bogus comment state
private  void bogusDoctypeState()
          Bogus DOCTYPE state
private  void clearLongStrBuf()
          Clears the larger buffer.
private  void clearStrBuf()
          Clears the smaller buffer.
private  void closeTagOpenState()
          Close tag open state
private  void commentStates()
          Comment start state, Comment start dash state, Comment state, Comment end dash state and Comment end state
private  void consumeEntity(boolean inAttribute)
          Consume entity Unlike the definition is the spec, this method does not return a value and never requires the caller to backtrack.
private  void consumeNCR(boolean inAttribute)
           
private  boolean currentIsVoid()
           
private  void dataState()
          Data state
private  CharsetDecoder decoderFromExternalDeclaration(String encoding)
          Initializes a decoder from external decl.
private  void doctypeNameState()
          DOCTYPE name state
private  void doctypePublicIdentifierDoubleQuotedState()
          DOCTYPE public identifier (double-quoted) state
private  void doctypePublicIdentifierSingleQuotedState()
          DOCTYPE public identifier (single-quoted) state
private  void doctypeState()
          DOCTYPE state
private  void doctypeSystemIdentifierDoubleQuotedState()
          DOCTYPE system identifier (double-quoted) state
private  void doctypeSystemIdentifierSingleQuotedState()
          DOCTYPE system identifier (single-quoted) state
(package private)  void dontSwallowBom()
           
private  void emitComment()
          Emits the current comment token.
private  void emitCurrentTagToken()
           
private  void emitOrAppend(char[] val, boolean inAttribute)
           
private  void emitStrBuf()
          Emits the smaller buffer as character tokens.
private  void entityDataState()
          Entity data state
private  void entityInAttributeValueState()
          Entity in attribute value state
private  void err(String message)
          Reports a Parse Error.
private  void fatal(String message)
          Reports an condition that would make the infoset incompatible with XML 1.0 as fatal.
private  void flushChars()
          Flushes coalesced character tokens.
 int getColumnNumber()
           
 XmlViolationPolicy getCommentPolicy()
          Returns the commentPolicy.
 XmlViolationPolicy getContentNonXmlCharPolicy()
          Returns the contentNonXmlCharPolicy.
 XmlViolationPolicy getContentSpacePolicy()
          Returns the contentSpacePolicy.
 int getLineNumber()
           
 String getPublicId()
           
 String getSystemId()
           
private  void handleNCRValue(int value, boolean inAttribute)
           
private  boolean isAstralPrivateUse(int c)
          Tells if the argument is an astral PUA character.
 boolean isCheckingNormalization()
          Query if checking normalization.
 boolean isMappingLangToXmlLang()
          Returns the mappingLangToXmlLang.
private  boolean isNcname(String str)
           
private  boolean isNonCharacter(int c)
          Tells if the argument is a non-character (works for BMP and astral).
private  boolean isPrivateUse(char c)
          Tells if the argument is a BMP PUA character.
private  boolean lastHyphHyph()
           
private  boolean lastLtExclHyph()
           
private  String longStrBufToString()
          The larger buffer as a string.
private  void markupDeclarationOpenState()
          Markup declaration open state
(package private)  AttributesImpl newAttributes()
           
(package private)  void noEncodingDeclared()
           
(package private)  void notifyAboutMetaBoundary()
           
private  void parseErrorUnlessPermittedSlash()
           
private  char read()
          Reads the next UTF-16 code unit.
private  void resetAttributes()
           
 void setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
          Sets the bogusXmlnsPolicy.
 void setCheckingNormalization(boolean enable)
          Turns NFC checking on or off.
 void setCommentPolicy(XmlViolationPolicy commentPolicy)
          Sets the commentPolicy.
 void setContentModelFlag(ContentModelFlag contentModelFlag, String contentModelElement)
          Sets the content model flag and the associated element name.
 void setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
          Sets the contentNonXmlCharPolicy.
 void setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
          Sets the contentSpacePolicy.
 void setErrorHandler(ErrorHandler eh)
          Sets the error handler.
 void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
          Sets the html4ModeCompatibleWithXhtml1Schemata.
 void setMappingLangToXmlLang(boolean mappingLangToXmlLang)
          Sets the mappingLangToXmlLang.
 void setNamePolicy(XmlViolationPolicy namePolicy)
           
 void setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
          Sets the xmlnsPolicy.
private  String strBufToElementNameString()
           
private  String strBufToString()
          The smaller buffer as a string.
private  void tagNameState()
          Tag name state
private  void tagOpenState()
          Tag open state
private  String toAsciiLowerCase(String str)
           
 void tokenize(InputSource is)
          Runs the tokenization.
(package private)  void turnOnAdditionalHtml4Errors()
           
private  void unread(char c)
          Unreads a code unit so that it is returned the next time read() is called.
private  void warn(String message)
          Reports a warning
private  void warnAboutPrivateUseChar()
          Emits a warning about private use characters if the warning has not been emitted yet.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NCNAME_PATTERN

private static final Pattern NCNAME_PATTERN

LEAD_OFFSET

private static final int LEAD_OFFSET
Magic value for UTF-16 operations.

See Also:
Constant Field Values

SURROGATE_OFFSET

private static final int SURROGATE_OFFSET
Magic value for UTF-16 operations.

See Also:
Constant Field Values

LT_GT

private static final char[] LT_GT
UTF-16 code unit array containing less than and greater than for emitting those characters on certain parse errors.


LT_SOLIDUS

private static final char[] LT_SOLIDUS
UTF-16 code unit array containing less than and solidus for emitting those characters on certain parse errors.


REPLACEMENT_CHARACTER

private static final char[] REPLACEMENT_CHARACTER
Array version of U+FFFD.


SPACE

private static final char[] SPACE
Array version of space.


LF

private static final char[] LF
Array version of line feed.


BUFFER_GROW_BY

private static final int BUFFER_GROW_BY
Buffer growth parameter.

See Also:
Constant Field Values

VOID_ELEMENTS

private static final String[] VOID_ELEMENTS
Lexically sorted void element names


OCTYPE

private static final char[] OCTYPE
"octype" as char[]


UBLIC

private static final char[] UBLIC
"ublic" as char[]


YSTEM

private static final char[] YSTEM
"ystem" as char[]


tokenHandler

private final TokenHandler tokenHandler
The token handler.


errorHandler

private ErrorHandler errorHandler
The error handler.


reader

private Reader reader
The input UTF-16 code unit stream. If a byte stream was given, this object is an instance of HtmlInputStreamReader.


buf

private char[] buf
The main input buffer that the tokenizer reads from. Filled from reader.


pos

private int pos
The index of the last char read from buf.


cstart

private int cstart
The index of the first char in buf that is part of a coalesced run of character tokens or -1 if there is not a current run being coalesced.


bufLen

private int bufLen
The number of chars in buf that have meaning. (The rest of the array is garbage and should not be examined.)


prev

private char prev
The previous char read from the buffer with infoset alteration applied except for CR. Used for CRLF normalization and surrogate pair checking.


prevFour

private final char[] prevFour
Lookbehind buffer for magic RCDATA/CDATA escaping.


prevFourPtr

private int prevFourPtr
Points to the last char written to prevFour.


unreadBuffer

private int unreadBuffer
Single code unit buffer for reconsuming an input character. If -1 the next read() returns from the real buffer, otherwise from here.


line

private int line
The current line number in the current resource being parsed. (First line is 1.) Passed on as locator data.


linePrev

private int linePrev

col

private int col
The current column number in the current resource being tokenized. (First column is 1, counted by UTF-16 code units.) Passed on as locator data.


colPrev

private int colPrev

nextCharOnNewLine

private boolean nextCharOnNewLine

publicId

private String publicId
The SAX public id for the resource being tokenized. (Only passed to back as part of locator data.)


systemId

private String systemId
The SAX system id for the resource being tokenized. (Only passed to back as part of locator data.)


strBuf

private char[] strBuf
Buffer for short identifiers.


strBufLen

private int strBufLen
Number of significant chars in strBuf.


longStrBuf

private char[] longStrBuf
Buffer for long strings.


longStrBufLen

private int longStrBufLen
Number of significant chars in longStrBuf.


longStrBufPending

private char longStrBufPending
If not U+0000, a pending code unit to be appended to longStrBuf.


attributes

private AttributesImpl attributes
The attribute holder.


bmpChar

private final char[] bmpChar
Buffer for expanding NCRs falling into the Basic Multilingual Plane.


astralChar

private final char[] astralChar
Buffer for expanding astral NCRs.


alreadyWarnedAboutPrivateUseCharacters

private boolean alreadyWarnedAboutPrivateUseCharacters
Keeps track of PUA warnings.


contentModelFlag

private ContentModelFlag contentModelFlag
http://www.whatwg.org/specs/web-apps/current-work/#content2


escapeFlag

private boolean escapeFlag
http://www.whatwg.org/specs/web-apps/current-work/#escape


contentModelElement

private String contentModelElement
The element whose end tag closes the current CDATA or RCDATA element.


endTag

private boolean endTag
true if tokenizing an end tag


tagName

private String tagName
The current tag token name.


attributeName

private String attributeName
The current attribute name.


wantsComments

private boolean wantsComments
Whether comment tokens are emitted.


shouldAddAttributes

private boolean shouldAddAttributes
If false, addAttribute*() are no-ops.


inContent

private boolean inContent
true when in text content or in attribute value.


html4

private boolean html4
true when HTML4-specific additional errors are requested.


nonAsciiProhibited

private boolean nonAsciiProhibited
Whether non-ASCII causes an error.


alreadyComplainedAboutNonAscii

private boolean alreadyComplainedAboutNonAscii
Used together with nonAsciiProhibited.


metaBoundaryPassed

private boolean metaBoundaryPassed
Whether the stream is past the first 512 bytes.


doctypeName

private String doctypeName
The name of the current doctype token.


publicIdentifier

private String publicIdentifier
The public id of the current doctype token.


systemIdentifier

private String systemIdentifier
The system id of the current doctype token.


characterHandlers

private CharacterHandler[] characterHandlers
Used for NFC checking if non-null, source code capture, etc.


contentSpacePolicy

private XmlViolationPolicy contentSpacePolicy
The policy for vertical tab and form feed.


contentNonXmlCharPolicy

private XmlViolationPolicy contentNonXmlCharPolicy
The policy for non-space non-XML characters.


commentPolicy

private XmlViolationPolicy commentPolicy
The policy for comments.


xmlnsPolicy

private XmlViolationPolicy xmlnsPolicy

namePolicy

private XmlViolationPolicy namePolicy

swallowBom

private boolean swallowBom

html4ModeCompatibleWithXhtml1Schemata

private boolean html4ModeCompatibleWithXhtml1Schemata

mappingLangToXmlLang

private boolean mappingLangToXmlLang

bogusXmlnsPolicy

private XmlViolationPolicy bogusXmlnsPolicy
Constructor Detail

Tokenizer

public Tokenizer(TokenHandler tokenHandler)
The constuctor.

Parameters:
tokenHandler - the handler for receiving tokens
Method Detail

setCheckingNormalization

public void setCheckingNormalization(boolean enable)
Turns NFC checking on or off.

Parameters:
enable - true if checking on

addCharacterHandler

public void addCharacterHandler(CharacterHandler characterHandler)

isCheckingNormalization

public boolean isCheckingNormalization()
Query if checking normalization.

Returns:
true if checking on

setErrorHandler

public void setErrorHandler(ErrorHandler eh)
Sets the error handler.

See Also:
XMLReader.setErrorHandler(org.xml.sax.ErrorHandler)

getCommentPolicy

public XmlViolationPolicy getCommentPolicy()
Returns the commentPolicy.

Returns:
the commentPolicy

setCommentPolicy

public void setCommentPolicy(XmlViolationPolicy commentPolicy)
Sets the commentPolicy.

Parameters:
commentPolicy - the commentPolicy to set

getContentNonXmlCharPolicy

public XmlViolationPolicy getContentNonXmlCharPolicy()
Returns the contentNonXmlCharPolicy.

Returns:
the contentNonXmlCharPolicy

setContentNonXmlCharPolicy

public void setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
Sets the contentNonXmlCharPolicy.

Parameters:
contentNonXmlCharPolicy - the contentNonXmlCharPolicy to set

getContentSpacePolicy

public XmlViolationPolicy getContentSpacePolicy()
Returns the contentSpacePolicy.

Returns:
the contentSpacePolicy

setContentSpacePolicy

public void setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
Sets the contentSpacePolicy.

Parameters:
contentSpacePolicy - the contentSpacePolicy to set

setXmlnsPolicy

public void setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
Sets the xmlnsPolicy.

Parameters:
xmlnsPolicy - the xmlnsPolicy to set

setNamePolicy

public void setNamePolicy(XmlViolationPolicy namePolicy)

setBogusXmlnsPolicy

public void setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
Sets the bogusXmlnsPolicy.

Parameters:
bogusXmlnsPolicy - the bogusXmlnsPolicy to set

setHtml4ModeCompatibleWithXhtml1Schemata

public void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
Sets the html4ModeCompatibleWithXhtml1Schemata.

Parameters:
html4ModeCompatibleWithXhtml1Schemata - the html4ModeCompatibleWithXhtml1Schemata to set

tokenize

public void tokenize(InputSource is)
              throws SAXException,
                     IOException
Runs the tokenization. This is the main entry point.

Parameters:
is - the input source
Throws:
SAXException - on fatal error (if configured to treat XML violations as fatal) or if the token handler threw
IOException - if the stream threw

setContentModelFlag

public void setContentModelFlag(ContentModelFlag contentModelFlag,
                                String contentModelElement)
Sets the content model flag and the associated element name.

Parameters:
contentModelFlag - the flag
contentModelElement - the element causing the flag to be set

getPublicId

public String getPublicId()
Specified by:
getPublicId in interface Locator
See Also:
Locator.getPublicId()

getSystemId

public String getSystemId()
Specified by:
getSystemId in interface Locator
See Also:
Locator.getSystemId()

getLineNumber

public int getLineNumber()
Specified by:
getLineNumber in interface Locator
See Also:
Locator.getLineNumber()

getColumnNumber

public int getColumnNumber()
Specified by:
getColumnNumber in interface Locator
See Also:
Locator.getColumnNumber()

notifyAboutMetaBoundary

void notifyAboutMetaBoundary()

turnOnAdditionalHtml4Errors

void turnOnAdditionalHtml4Errors()

dontSwallowBom

void dontSwallowBom()

noEncodingDeclared

void noEncodingDeclared()

newAttributes

AttributesImpl newAttributes()

clearStrBuf

private void clearStrBuf()
Clears the smaller buffer.


appendStrBuf

private void appendStrBuf(char c)
Appends to the smaller buffer.

Parameters:
c - the UTF-16 code unit to append

strBufToString

private String strBufToString()
The smaller buffer as a string.

Returns:
the smaller buffer as a string

emitStrBuf

private void emitStrBuf()
                 throws SAXException
Emits the smaller buffer as character tokens.

Throws:
SAXException - if the token handler threw

isNcname

private boolean isNcname(String str)

clearLongStrBuf

private void clearLongStrBuf()
Clears the larger buffer.


appendLongStrBuf

private void appendLongStrBuf(char c)
Appends to the larger buffer.

Parameters:
c - the UTF-16 code unit to append

appendToComment

private void appendToComment(char c)
                      throws SAXException
Appends to the larger buffer when it is used to buffer a comment. Checks for two consecutive hyphens.

Parameters:
c - the UTF-16 code unit to append
Throws:
SAXException

appendLongStrBuf

private void appendLongStrBuf(char[] arr)
Appends to the larger buffer.

Parameters:
arr - the UTF-16 code units to append

appendStrBufToLongStrBuf

private void appendStrBufToLongStrBuf()
Append the contents of the smaller buffer to the larger one.


longStrBufToString

private String longStrBufToString()
The larger buffer as a string.

Returns:
the larger buffer as a string

emitComment

private void emitComment()
                  throws SAXException
Emits the current comment token.

Throws:
SAXException

unread

private void unread(char c)
Unreads a code unit so that it is returned the next time read() is called.

Parameters:
c - the code unit to unread

read

private char read()
           throws SAXException,
                  IOException
Reads the next UTF-16 code unit.

Returns:
the next code unit
Throws:
SAXException
IOException

warnAboutPrivateUseChar

private void warnAboutPrivateUseChar()
                              throws SAXException
Emits a warning about private use characters if the warning has not been emitted yet.

Throws:
SAXException

isPrivateUse

private boolean isPrivateUse(char c)
Tells if the argument is a BMP PUA character.

Parameters:
c - the UTF-16 code unit to check
Returns:
true if PUA character

isAstralPrivateUse

private boolean isAstralPrivateUse(int c)
Tells if the argument is an astral PUA character.

Parameters:
c - the code point to check
Returns:
true if astral private use

isNonCharacter

private boolean isNonCharacter(int c)
Tells if the argument is a non-character (works for BMP and astral).

Parameters:
c - the code point to check
Returns:
true if non-character

flushChars

private void flushChars()
                 throws SAXException,
                        IOException
Flushes coalesced character tokens.

Throws:
SAXException
IOException

fatal

private void fatal(String message)
            throws SAXException
Reports an condition that would make the infoset incompatible with XML 1.0 as fatal.

Parameters:
message - the message
Throws:
SAXException
SAXParseException

err

private void err(String message)
          throws SAXException
Reports a Parse Error.

Parameters:
message - the message
Throws:
SAXException

warn

private void warn(String message)
           throws SAXException
Reports a warning

Parameters:
message - the message
Throws:
SAXException

decoderFromExternalDeclaration

private CharsetDecoder decoderFromExternalDeclaration(String encoding)
                                               throws SAXException
Initializes a decoder from external decl.

Throws:
SAXException

currentIsVoid

private boolean currentIsVoid()

dataState

private void dataState()
                throws SAXException,
                       IOException
Data state

Throws:
IOException
SAXException

lastHyphHyph

private boolean lastHyphHyph()

lastLtExclHyph

private boolean lastLtExclHyph()

entityDataState

private void entityDataState()
                      throws SAXException,
                             IOException
Entity data state

Throws:
IOException
SAXException

tagOpenState

private void tagOpenState()
                   throws SAXException,
                          IOException
Tag open state

Throws:
IOException
SAXException

closeTagOpenState

private void closeTagOpenState()
                        throws SAXException,
                               IOException
Close tag open state

Throws:
IOException
SAXException

tagNameState

private void tagNameState()
                   throws SAXException,
                          IOException
Tag name state

Throws:
IOException
SAXException

strBufToElementNameString

private String strBufToElementNameString()

beforeAttributeNameState

private void beforeAttributeNameState()
                               throws SAXException,
                                      IOException
This method implements a wrapper loop for the attribute-related states to avoid recursion to an arbitrary depth.

Throws:
IOException
SAXException

resetAttributes

private void resetAttributes()

beforeAttributeNameStateImpl

private boolean beforeAttributeNameStateImpl()
                                      throws SAXException,
                                             IOException
Before attribute name state

Throws:
IOException
SAXException

parseErrorUnlessPermittedSlash

private void parseErrorUnlessPermittedSlash()
                                     throws SAXException,
                                            IOException
Throws:
SAXException
IOException

emitCurrentTagToken

private void emitCurrentTagToken()
                          throws SAXException
Throws:
SAXException

attributeNameState

private boolean attributeNameState()
                            throws SAXException,
                                   IOException
Attribute name state

Throws:
IOException
SAXException

attributeNameComplete

private void attributeNameComplete()
                            throws SAXException
Throws:
SAXException

addAttributeWithoutValue

private void addAttributeWithoutValue()
                               throws SAXException
Throws:
SAXException

addAttributeWithValue

private void addAttributeWithValue()
                            throws SAXException
Throws:
SAXException

toAsciiLowerCase

private String toAsciiLowerCase(String str)

afterAttributeNameState

private boolean afterAttributeNameState()
                                 throws SAXException,
                                        IOException
After attribute name state

Throws:
IOException
SAXException

beforeAttributeValueState

private boolean beforeAttributeValueState()
                                   throws SAXException,
                                          IOException
Before attribute value state

Throws:
IOException
SAXException

attributeValueDoubleQuotedState

private boolean attributeValueDoubleQuotedState()
                                         throws SAXException,
                                                IOException
Attribute value (double-quoted) state

Throws:
IOException
SAXException

attributeValueSingleQuotedState

private boolean attributeValueSingleQuotedState()
                                         throws SAXException,
                                                IOException
Attribute value (single-quoted) state

Throws:
SAXException
IOException

attributeValueUnquotedState

private boolean attributeValueUnquotedState()
                                     throws SAXException,
                                            IOException
Attribute value (unquoted) state

Throws:
IOException
SAXException

entityInAttributeValueState

private void entityInAttributeValueState()
                                  throws SAXException,
                                         IOException
Entity in attribute value state

Throws:
IOException
SAXException

bogusCommentState

private void bogusCommentState()
                        throws SAXException,
                               IOException
Bogus comment state

Throws:
IOException
SAXException

markupDeclarationOpenState

private void markupDeclarationOpenState()
                                 throws SAXException,
                                        IOException
Markup declaration open state

Throws:
IOException
SAXException

commentStates

private void commentStates()
                    throws SAXException,
                           IOException
Comment start state, Comment start dash state, Comment state, Comment end dash state and Comment end state

Throws:
IOException
SAXException

doctypeState

private void doctypeState()
                   throws SAXException,
                          IOException
DOCTYPE state

Throws:
IOException
SAXException

beforeDoctypeNameState

private void beforeDoctypeNameState()
                             throws SAXException,
                                    IOException
Before DOCTYPE name state

Throws:
IOException
SAXException

doctypeNameState

private void doctypeNameState()
                       throws SAXException,
                              IOException
DOCTYPE name state

Throws:
IOException
SAXException

afterDoctypeNameState

private void afterDoctypeNameState()
                            throws SAXException,
                                   IOException
After DOCTYPE name state

Throws:
IOException
SAXException

beforeDoctypePublicIdentifierState

private void beforeDoctypePublicIdentifierState()
                                         throws SAXException,
                                                IOException
Before DOCTYPE public identifier state

Throws:
IOException
SAXException

doctypePublicIdentifierDoubleQuotedState

private void doctypePublicIdentifierDoubleQuotedState()
                                               throws SAXException,
                                                      IOException
DOCTYPE public identifier (double-quoted) state

Throws:
IOException
SAXException

doctypePublicIdentifierSingleQuotedState

private void doctypePublicIdentifierSingleQuotedState()
                                               throws SAXException,
                                                      IOException
DOCTYPE public identifier (single-quoted) state

Throws:
IOException
SAXException

afterDoctypePublicIdentifierState

private void afterDoctypePublicIdentifierState()
                                        throws SAXException,
                                               IOException
After DOCTYPE public identifier state

Throws:
IOException
SAXException

beforeDoctypeSystemIdentifierState

private void beforeDoctypeSystemIdentifierState()
                                         throws SAXException,
                                                IOException
Before DOCTYPE system identifier state

Throws:
IOException
SAXException

doctypeSystemIdentifierDoubleQuotedState

private void doctypeSystemIdentifierDoubleQuotedState()
                                               throws SAXException,
                                                      IOException
DOCTYPE system identifier (double-quoted) state

Throws:
IOException
SAXException

doctypeSystemIdentifierSingleQuotedState

private void doctypeSystemIdentifierSingleQuotedState()
                                               throws SAXException,
                                                      IOException
DOCTYPE system identifier (single-quoted) state

Throws:
IOException
SAXException

afterDoctypeSystemIdentifierState

private void afterDoctypeSystemIdentifierState()
                                        throws SAXException,
                                               IOException
After DOCTYPE system identifier state

Throws:
IOException
SAXException

bogusDoctypeState

private void bogusDoctypeState()
                        throws SAXException,
                               IOException
Bogus DOCTYPE state

Throws:
IOException
SAXException

consumeEntity

private void consumeEntity(boolean inAttribute)
                    throws SAXException,
                           IOException
Consume entity Unlike the definition is the spec, this method does not return a value and never requires the caller to backtrack. This method takes care of emitting characters or appending to the current attribute value. It also takes care of that in the case when consuming the entity fails.

Throws:
IOException
SAXException

consumeNCR

private void consumeNCR(boolean inAttribute)
                 throws SAXException,
                        IOException
Throws:
SAXException
IOException

handleNCRValue

private void handleNCRValue(int value,
                            boolean inAttribute)
                     throws SAXException,
                            IOException
Throws:
SAXException
IOException

emitOrAppend

private void emitOrAppend(char[] val,
                          boolean inAttribute)
                   throws SAXException,
                          IOException
Parameters:
val -
Throws:
SAXException
IOException

isMappingLangToXmlLang

public boolean isMappingLangToXmlLang()
Returns the mappingLangToXmlLang.

Returns:
the mappingLangToXmlLang

setMappingLangToXmlLang

public void setMappingLangToXmlLang(boolean mappingLangToXmlLang)
Sets the mappingLangToXmlLang.

Parameters:
mappingLangToXmlLang - the mappingLangToXmlLang to set