public interface XmlPullParser
There are following different kinds of parser depending on which features are set:
There are two key methods: next() and nextToken(). While next() provides access to high level parsing events, nextToken() allows access to lower level tokens.
The current event state of the parser can be determined by calling the getEventType() method. Initially, the parser is in the START_DOCUMENT state.
The method next() advances the parser to the next event. The int value returned from next determines the current parser state and is identical to the value returned from following calls to getEventType ().
Th following event types are seen by next()
after first next() or nextToken() (or any other next*() method) is called user application can obtain XML version, standalone and encoding from XML declaration in following ways:
import java.io.IOException; import java.io.StringReader; import org.xmlpull.v1.XmlPullParser; import org.xmlpull.v1.XmlPullParserException.html; import org.xmlpull.v1.XmlPullParserFactory; public class SimpleXmlPullApp { public static void main (String args[]) throws XmlPullParserException, IOException { XmlPullParserFactory factory = XmlPullParserFactory.newInstance(); factory.setNamespaceAware(true); XmlPullParser xpp = factory.newPullParser(); xpp.setInput( new StringReader ( "<foo>Hello World!</foo>" ) ); int eventType = xpp.getEventType(); while (eventType != XmlPullParser.END_DOCUMENT) { if(eventType == XmlPullParser.START_DOCUMENT) { System.out.println("Start document"); } else if(eventType == XmlPullParser.END_DOCUMENT) { System.out.println("End document"); } else if(eventType == XmlPullParser.START_TAG) { System.out.println("Start tag "+xpp.getName()); } else if(eventType == XmlPullParser.END_TAG) { System.out.println("End tag "+xpp.getName()); } else if(eventType == XmlPullParser.TEXT) { System.out.println("Text "+xpp.getText()); } eventType = xpp.next(); } } }
The above example will generate the following output:
Start document Start tag foo Text Hello World! End tag foo
For more details on API usage, please refer to the quick Introduction available at http://www.xmlpull.org
XmlPullParserFactory
,
defineEntityReplacementText(java.lang.String, java.lang.String)
,
getName()
,
getNamespace(java.lang.String)
,
getText()
,
next()
,
nextToken()
,
setInput(java.io.Reader)
,
FEATURE_PROCESS_DOCDECL
,
FEATURE_VALIDATION
,
START_DOCUMENT
,
START_TAG
,
TEXT
,
END_TAG
,
END_DOCUMENT
Modifier and Type | Field and Description |
---|---|
static int |
CDSECT
A CDATA sections was just read; this token is available only from calls to nextToken()
.
|
static int |
COMMENT
An XML comment was just read.
|
static int |
DOCDECL
An XML document type declaration was just read.
|
static int |
END_DOCUMENT
Logical end of the xml document.
|
static int |
END_TAG
Returned from getEventType(), next(), or nextToken() when an end
tag was read.
|
static int |
ENTITY_REF
An entity reference was just read; this token is available from nextToken() only.
|
static String |
FEATURE_PROCESS_DOCDECL
This feature determines whether the document declaration is processed.
|
static String |
FEATURE_PROCESS_NAMESPACES
This feature determines whether the parser processes namespaces.
|
static String |
FEATURE_REPORT_NAMESPACE_ATTRIBUTES
This feature determines whether namespace attributes are exposed via the attribute access methods.
|
static String |
FEATURE_VALIDATION
If this feature is activated, all validation errors as defined in the XML 1.0 sepcification are reported.
|
static int |
IGNORABLE_WHITESPACE
Ignorable whitespace was just read.
|
static String |
NO_NAMESPACE
This constant represents the default namespace (empty string "")
|
static int |
PROCESSING_INSTRUCTION
An XML processing instruction declaration was just read.
|
static int |
START_DOCUMENT
Signalize that parser is at the very beginning of the document and nothing was read yet.
|
static int |
START_TAG
Returned from getEventType(), next(), nextToken() when a start
tag was read.
|
static int |
TEXT
Character data was read and will is available by calling getText().
|
static String[] |
TYPES
This array can be used to convert the event type integer constants such as START_TAG or TEXT to to a string.
|
Modifier and Type | Method and Description |
---|---|
void |
defineEntityReplacementText(String entityName,
String replacementText)
Set new value for entity replacement text as defined in
XML 1.0 Section 4.5 Construction of Internal Entity
Replacement Text.
|
int |
getAttributeCount()
Returns the number of attributes of the current start tag, or -1 if the current event type is not START_TAG
|
String |
getAttributeName(int index)
Returns the local name of the specified attribute if namespaces are enabled or just attribute name if namespaces
are disabled.
|
String |
getAttributeNamespace(int index)
Returns the namespace URI of the attribute with the given index (starts from 0).
|
String |
getAttributePrefix(int index)
Returns the prefix of the specified attribute Returns null if the element has no prefix.
|
String |
getAttributeType(int index)
Returns the type of the specified attribute If parser is non-validating it MUST return CDATA.
|
String |
getAttributeValue(int index)
Returns the given attributes value.
|
String |
getAttributeValue(String namespace,
String name)
Returns the attributes value identified by namespace URI and namespace localName.
|
int |
getColumnNumber()
Returns the current column number, starting from 0.
|
int |
getDepth()
Returns the current depth of the element.
|
int |
getEventType()
Returns the type of the current event (START_TAG, END_TAG, TEXT, etc.)
|
boolean |
getFeature(String name)
Returns the current value of the given feature.
|
String |
getInputEncoding()
Returns the input encoding if known, null otherwise.
|
int |
getLineNumber()
Returns the current line number, starting from 1.
|
String |
getName()
For START_TAG or END_TAG events, the (local) name of the current element is returned when namespaces are enabled.
|
String |
getNamespace()
Returns the namespace URI of the current element.
|
String |
getNamespace(String prefix)
Returns the URI corresponding to the given prefix, depending on current state of the parser.
|
int |
getNamespaceCount(int depth)
Returns the numbers of elements in the namespace stack for the given depth.
|
String |
getNamespacePrefix(int pos)
Returns the namespace prefixe for the given position in the namespace stack.
|
String |
getNamespaceUri(int pos)
Returns the namespace URI for the given position in the namespace stack If the position is out of range, an
exception is thrown.
|
String |
getPositionDescription()
Returns a short text describing the current parser state, including the position, a description of the current
event and the data source if known.
|
String |
getPrefix()
Returns the prefix of the current element.
|
Object |
getProperty(String name)
Look up the value of a property.
|
String |
getText()
Returns the text content of the current event as String.
|
char[] |
getTextCharacters(int[] holderForStartAndLength)
Returns the buffer that contains the text of the current event, as well as the start offset and length relevant
for the current event.
|
boolean |
isAttributeDefault(int index)
Returns if the specified attribute was not in input was declared in XML.
|
boolean |
isEmptyElementTag()
Returns true if the current event is START_TAG and the tag is degenerated (e.g.
|
boolean |
isWhitespace()
Checks whether the current TEXT event contains only whitespace characters.
|
int |
next()
Get next parsing event - element content wil be coalesced and only one TEXT event must be returned for whole
element content (comments and processing instructions will be ignored and emtity references must be expanded or
exception mus be thrown if entity reerence can not be exapnded).
|
int |
nextTag()
Call next() and return event if it is START_TAG or END_TAG otherwise throw an exception.
|
String |
nextText()
If current event is START_TAG then if next element is TEXT then element content is returned or if next event is
END_TAG then empty string is returned, otherwise exception is thrown.
|
int |
nextToken()
This method works similarly to next() but will expose additional event types (COMMENT, CDSECT, DOCDECL,
ENTITY_REF, PROCESSING_INSTRUCTION, or IGNORABLE_WHITESPACE) if they are available in input.
|
void |
require(int type,
String namespace,
String name)
Test if the current event is of the given type and if the namespace and name do match.
|
void |
setFeature(String name,
boolean state)
Use this call to change the general behaviour of the parser, such as namespace processing or doctype declaration
handling.
|
void |
setInput(InputStream inputStream,
String inputEncoding)
Sets the input stream the parser is going to process.
|
void |
setInput(Reader in)
Set the input source for parser to the given reader and resets the parser.
|
void |
setProperty(String name,
Object value)
Set the value of a property.
|
static final int CDSECT
nextToken()
,
getText()
,
Constant Field Valuesstatic final int COMMENT
nextToken()
,
getText()
,
Constant Field Valuesstatic final int DOCDECL
nextToken()
,
getText()
,
Constant Field Valuesstatic final int END_DOCUMENT
NOTE: calling again next() or nextToken() will result in exception being thrown.
next()
,
nextToken()
,
Constant Field Valuesstatic final int END_TAG
static final int ENTITY_REF
nextToken()
,
getText()
,
Constant Field Valuesstatic final String FEATURE_PROCESS_DOCDECL
Please note: If the document type declaration was ignored, entity references may cause exceptions later in the parsing process. The default value of this feature is false. It cannot be changed during parsing.
static final String FEATURE_PROCESS_NAMESPACES
NOTE: The value can not be changed during parsing an must be set before parsing.
static final String FEATURE_REPORT_NAMESPACE_ATTRIBUTES
static final String FEATURE_VALIDATION
Please Note: This feature can not be changed during parsing. The default value is false.
static final int IGNORABLE_WHITESPACE
NOTE: this is different from calling the isWhitespace() method, since text content may be whitespace but not ignorable. Ignorable whitespace is skipped by next() automatically; this event type is never returned from next().
nextToken()
,
getText()
,
Constant Field Valuesstatic final String NO_NAMESPACE
static final int PROCESSING_INSTRUCTION
nextToken()
,
getText()
,
Constant Field Valuesstatic final int START_DOCUMENT
next()
,
nextToken()
,
Constant Field Valuesstatic final int START_TAG
static final int TEXT
Please note: next() will accumulate multiple events into one TEXT event, skipping IGNORABLE_WHITESPACE, PROCESSING_INSTRUCTION and COMMENT events, In contrast, nextToken() will stop reading text when any other event is observed. Also, when the state was reached by calling next(), the text value will be normalized, whereas getText() will return unnormalized content in the case of nextToken(). This allows an exact roundtrip without chnanging line ends when examining low level events, whereas for high level applications the text is normalized apropriately.
next()
,
nextToken()
,
getText()
,
Constant Field Valuesstatic final String[] TYPES
void defineEntityReplacementText(String entityName, String replacementText) throws XmlPullParserException
The motivation for this function is to allow very small implementations of XMLPULL that will work in J2ME environments. Though these implementations may not be able to process the document type declaration, they still can work with known DTDs by using this function.
Please notes: The given value is used literally as replacement text and it corresponds to declaring entity in DTD that has all special characters escaped: left angle bracket is replaced with <, ampersnad with & and so on.
Note: The given value is the literal replacement text and must not contain any other entity reference (if it contains any entity reference there will be no further replacement).
Note: The list of pre-defined entity names will always contain standard XML entities such as amp (&), lt (<), gt (>), quot ("), and apos ('). Those cannot be redefined by this method!
XmlPullParserException
setInput(java.io.Reader)
,
FEATURE_PROCESS_DOCDECL
,
FEATURE_VALIDATION
int getAttributeCount()
String getAttributeName(int index)
index
- based index of attributeString getAttributeNamespace(int index)
NOTE: if FEATURE_REPORT_NAMESPACE_ATTRIBUTES is set then namespace attributes (xmlns:ns='...') must be reported with namespace http://www.w3.org/2000/xmlns/ (visit this URL for description!). The default namespace attribute (xmlns="...") will be reported with empty namespace.
NOTE:The xml prefix is bound as defined in Namespaces in XML specification to "http://www.w3.org/XML/1998/namespace".
index
- based index of attributeString getAttributePrefix(int index)
index
- based index of attributeString getAttributeType(int index)
index
- based index of attributeString getAttributeValue(int index)
NOTE: attribute value must be normalized (including entity replacement text if PROCESS_DOCDECL is false) as described in XML 1.0 section 3.3.3 Attribute-Value Normalization
index
- based index of attributedefineEntityReplacementText(java.lang.String, java.lang.String)
String getAttributeValue(String namespace, String name)
NOTE: attribute value must be normalized (including entity replacement text if PROCESS_DOCDECL is false) as described in XML 1.0 section 3.3.3 Attribute-Value Normalization
namespace
- Namespace of the attribute if namespaces are enabled otherwise must be nullname
- If namespaces enabled local name of attribute otherwise just attribute namedefineEntityReplacementText(java.lang.String, java.lang.String)
int getColumnNumber()
int getDepth()
<!-- outside --> 0 <root> 1 sometext 1 <foobar> 2 </foobar> 2 </root> 1 <!-- outside --> 0
int getEventType() throws XmlPullParserException
XmlPullParserException
next()
,
nextToken()
boolean getFeature(String name)
Please note: unknown features are always returned as false.
name
- The name of feature to be retrieved.IllegalArgumentException
- if string the feature name is nullString getInputEncoding()
int getLineNumber()
String getName()
Please note: To reconstruct the raw element name when namespaces are enabled and the prefix is not null, you will need to add the prefix and a colon to localName..
String getNamespace()
String getNamespace(String prefix)
If the prefix was not declared in the current scope, null is returned. The default namespace is included in the namespace table and is available via getNamespace (null).
This method is a convenience method for
for (int i = getNamespaceCount(getDepth()) - 1; i >= 0; i--) { if (getNamespacePrefix(i).equals(prefix)) { return getNamespaceUri(i); } } return null;
Please note: parser implementations may provide more efifcient lookup, e.g. using a Hashtable. The 'xml' prefix is bound to "http://www.w3.org/XML/1998/namespace", as defined in the Namespaces in XML specification. Analogous, the 'xmlns' prefix is resolved to http://www.w3.org/2000/xmlns/
int getNamespaceCount(int depth) throws XmlPullParserException
NOTE: when parser is on END_TAG then it is allowed to call this function with getDepth()+1 argument to retrieve position of namespace prefixes and URIs that were declared on corresponding START_TAG.
NOTE: to retrieve lsit of namespaces declared in current element:
XmlPullParser pp = ... int nsStart = pp.getNamespaceCount(pp.getDepth()-1); int nsEnd = pp.getNamespaceCount(pp.getDepth()); for (int i = nsStart; i < nsEnd; i++) { String prefix = pp.getNamespacePrefix(i); String ns = pp.getNamespaceUri(i); // ... }
XmlPullParserException
getNamespacePrefix(int)
,
getNamespaceUri(int)
,
getNamespace()
,
getNamespace(String)
String getNamespacePrefix(int pos) throws XmlPullParserException
Please note: when the parser is on an END_TAG, namespace prefixes that were declared in the corresponding START_TAG are still accessible although they are no longer in scope.
XmlPullParserException
String getNamespaceUri(int pos) throws XmlPullParserException
NOTE: when parser is on END_TAG then namespace prefixes that were declared in corresponding START_TAG are still accessible even though they are not in scope
XmlPullParserException
String getPositionDescription()
String getPrefix()
Object getProperty(String name)
NOTE: unknown properties are always returned as null.
name
- The name of property to be retrieved.String getText()
NOTE: in case of ENTITY_REF, this method returns the entity replacement text (or null if not available). This is the only case where getText() and getTextCharacters() return different values.
getEventType()
,
next()
,
nextToken()
char[] getTextCharacters(int[] holderForStartAndLength)
Please note: this buffer must not be modified and its content MAY change after a call to next() or nextToken(). This method will always return the same value as getText(), except for ENTITY_REF. In the case of ENTITY ref, getText() returns the replacement text and this method returns the actual input buffer containing the entity name. If getText() returns null, this method returns null as well and the values returned in the holder array MUST be -1 (both start and length).
holderForStartAndLength
- Must hold an 2-element int array into which the start offset and length values will be written.getText()
,
next()
,
nextToken()
boolean isAttributeDefault(int index)
index
- based index of attributeboolean isEmptyElementTag() throws XmlPullParserException
NOTE: if the parser is not on START_TAG, an exception will be thrown.
XmlPullParserException
boolean isWhitespace() throws XmlPullParserException
Please note: non-validating parsers are not able to distinguish whitespace and ignorable whitespace, except from whitespace outside the root element. Ignorable whitespace is reported as separate event, which is exposed via nextToken only.
XmlPullParserException
int next() throws XmlPullParserException, IOException
NOTE: empty element (such as <tag/>) will be reported with two separate events: START_TAG, END_TAG - it must be so to preserve parsing equivalency of empty element to <tag></tag>. (see isEmptyElementTag ())
XmlPullParserException
IOException
isEmptyElementTag()
,
START_TAG
,
TEXT
,
END_TAG
,
END_DOCUMENT
int nextTag() throws XmlPullParserException, IOException
essentially it does this
int eventType = next(); if (eventType == TEXT && isWhitespace()) { // skip whitespace eventType = next(); } if (eventType != START_TAG && eventType != END_TAG) { throw new XmlPullParserException("expected start or end tag", this, null); } return eventType;
XmlPullParserException
IOException
String nextText() throws XmlPullParserException, IOException
The motivation for this function is to allow to parse consistently both empty elements and elements that has non empty content, for example for input:
p.nextTag() p.requireEvent(p.START_TAG, "", "tag"); String content = p.nextText(); p.requireEvent(p.END_TAG, "", "tag");This function together with nextTag make it very easy to parse XML that has no mixed content.
Essentially it does this
if (getEventType() != START_TAG) { throw new XmlPullParserException("parser must be on START_TAG to read next text", this, null); } int eventType = next(); if (eventType == TEXT) { String result = getText(); eventType = next(); if (eventType != END_TAG) { throw new XmlPullParserException("event TEXT it must be immediately followed by END_TAG", this, null); } return result; } else if (eventType == END_TAG) { return ""; } else { throw new XmlPullParserException("parser must be on START_TAG or TEXT to read text", this, null); }
XmlPullParserException
IOException
int nextToken() throws XmlPullParserException, IOException
If special feature FEATURE_XML_ROUNDTRIP (identified by URI: http://xmlpull.org/v1/doc/features.html#xml-roundtrip) is enabled it is possible to do XML document round trip ie. reproduce exectly on output the XML input using getText(): returned content is always unnormalized (exactly as in input). Otherwise returned content is end-of-line normalized as described XML 1.0 End-of-Line Handling and. Also when this feature is enabled exact content of START_TAG, END_TAG, DOCDECL and PROCESSING_INSTRUCTION is available.
Here is the list of tokens that can be returned from nextToken() and what getText() and getTextCharacters() returns:
" titlepage SYSTEM "http://www.foo.bar/dtds/typo.dtd" [<!ENTITY % active.links "INCLUDE">]"
for input document that contained:
<!DOCTYPE titlepage SYSTEM "http://www.foo.bar/dtds/typo.dtd" [<!ENTITY % active.links "INCLUDE">]>otherwise if FEATURE_XML_ROUNDTRIP is false and PROCESS_DOCDECL is true then what is returned is undefined (it may be even null)
NOTE: there is no gurantee that there will only one TEXT or IGNORABLE_WHITESPACE event from nextToken() as parser may chose to deliver element content in multiple tokens (dividing element content into chunks)
NOTE: whether returned text of token is end-of-line normalized is depending on FEATURE_XML_ROUNDTRIP.
NOTE: XMLDecl (<?xml ...?>) is not reported but its content is available through optional properties (see class description above).
XmlPullParserException
IOException
next()
,
START_TAG
,
TEXT
,
END_TAG
,
END_DOCUMENT
,
COMMENT
,
DOCDECL
,
PROCESSING_INSTRUCTION
,
ENTITY_REF
,
IGNORABLE_WHITESPACE
void require(int type, String namespace, String name) throws XmlPullParserException, IOException
Essentially it does this
if (type != getEventType() || (namespace != null && !namespace.equals(getNamespace())) || (name != null && !name.equals(getName()))) throw new XmlPullParserException("expected " + TYPES[type] + getPositionDescription());
XmlPullParserException
IOException
void setFeature(String name, boolean state) throws XmlPullParserException
Example: call setFeature(FEATURE_PROCESS_NAMESPACES, true) in order to switch on namespace processing. The initial settings correspond to the properties requested from the XML Pull Parser factory. If none were requested, all feautures are deactivated by default.
XmlPullParserException
- If the feature is not supported or can not be setIllegalArgumentException
- If string with the feature name is nullvoid setInput(InputStream inputStream, String inputEncoding) throws XmlPullParserException
NOTE: If an input encoding string is passed, it MUST be used. Otherwise, if inputEncoding is null, the parser SHOULD try to determine input encoding following XML 1.0 specification (see below). If encoding detection is supported then following feature http://xmlpull.org/v1/doc/features.html#detect- encoding MUST be true amd otherwise it must be false
inputStream
- contains a raw byte input stream of possibly unknown encoding (when inputEncoding is null).inputEncoding
- if not null it MUST be used as encoding for inputStreamXmlPullParserException
void setInput(Reader in) throws XmlPullParserException
XmlPullParserException
void setProperty(String name, Object value) throws XmlPullParserException
XmlPullParserException
- If the property is not supported or can not be setIllegalArgumentException
- If string with the property name is null