|
|
|
|
XQuery is a strongly typed functional language for processing real or virtual XML data. Its rich data model and ergonomic expressions provide an environment where most programmers would feel at home taking apart and piecing together XML.
The XQuery specifications are prepared by the W3C XML Query Working Group, and are in the Last Call Working Draft stage. Hopefully they will achieve recommendation status in 2004.
In this article, I try to get you started exploring XQuery to see if you can take advantage of it today.
The W3C XML Query Working Group page has a list of available XQuery implementations.
Saxon by Michael Kay and GNU Qexo by Per Bothner are two open source implementations that I find helpful. Saxon implements most of the mandatory features of the November 12, 2003 XQuery working draft. GNU Qexo, based on the GNU Kawa framework for programming languages for the JVM, supports compiling XQuery programs into Java classes. It also supports interactive sessions, which is great for learning.
To follow along this article, download
unzip saxon7-8.zip, and add saxon7.jar and kawa-1.7.90.jar to the CLASSPATH.
I use these shell scripts to save some typing:
xquery:
java net.sf.saxon.Query "$@"
qexo:
java kawa.repl --xquery "$@"
[weiqi@gao] $ qexo (: 1 :) "Hello, World!" Hello, World! (: 2 :) 1024, 3.1416, 2.9979e8 1024 3.1416 2.9979E8 (: 3 :) <greeting from="weiqi">Hi</greeting> <greeting from="weiqi">Hi</greeting>
The smileys are delimiters of XQuery comments. They nest. Qexo uses them as part of the prompt. Cheers you up, doesn't it?
We first entered a string "Hello, World!", and Qexo responded by printing it out. Then we entered a sequence of three numbers: an integer, a decimal, and a double. Finally we created a piece of new XML data---an element with one attribute and some content.
All data in XQuery are sequences. A sequence is made up of items. An item can be either an atomic value or a node. A sequence with one item is the same as that item.
The comma operator combines sequences and flattens the result. You cannot create a sequence of sequences.
(: 4 :) (1024, 3.1416, 2.9979e8), ("Hello, World!", <greeting/>)
1024 3.1416 2.9979E8 Hello, World! <greeting></greeting>
Notice that parentheses are used around a sequence when only a single expression is expected as in the above case or in function calls.
XQuery also defines the union,
intersect and except operators for
sequences of nodes. Their behavior depends on the concept of node
identity, which we'll cover in a later section.
Strings, integers, decimals, and doubles are the only data types whose value can be entered into an XQuery program as literals.
They are examples of built-in W3C XML Schema types. XQuery
supports all built-in W3C XML Schema types plus a few additional
types. The W3C XML Schema types have names that start with
xs:, for example, xs:string,
xs:integer, xs:decimal,
xs:double, xs:boolean,
xs:float, etc. The XQuery defined types have names
that start with xdt:, for example,
xdt:dayTimeDuration,
xdt:yearMonthDuration. Note that xs and
xdt are predeclared namespace prefixes for
http://www.w3.org/2001/XMLSchema, and
http://www.w3.org/2003/11/xpath-datatypes
respectively.
The instance of operator tests the type of a
value:
(: instance-of.xq :) 3.1416 instance of xs:decimal [weiqi@gao] $ xquery instance-of.xq true
Constructor functions exist for all built-in types that convert strings or other values into values of its type:
(: constructor.xq :)
xs:date("2003-12-31") instance of xs:date
[weiqi@gao] $ xquery constructor.xq
true
The cast as operator works exactly like a
constructor function:
(: cast-as.xq :) 1 cast as xs:boolean, 0 cast as xs:boolean, "true" cast as xs:boolean, "false" cast as xs:boolean [weiqi@gao] $ xquery cast-as.xq true false true false
XML data appear in XQuery programs as nodes. Nodes can be either created anew or selected from existing nodes using XPath expressions.
[weiqi@gao] $ qexo
(: 1 :) <greeting from="weiqi">Hello, World!</greeting>
<greeting from="weiqi">Hello, World!</greeting>
(: 2 :) document {
(: 3{:) element { "greeting" } {
(: 4{:) attribute { "from " } { "weiqi" },
(: 5{:) "Hello, World!"
(: 6{:) }
(: 7{:) }
<greeting from ="weiqi">Hello, World!</greeting>
Here we created an element literally, and then created an XML
document using the document, element,
and attribute constructors. XQuery uses
{} to surround enclosed expressions. (Notice how
Qexo's prompt changes to indicate the current expression
nesting.)
Nodes have types. Types exist for six kinds of nodes in XML:
document-node(), element(),
attribute(), processing-instruction(),
comment(), text(). The
node() type represents all kinds of nodes. Namespace
nodes are handled through namespace declarations. The parentheses
are part of the type name, not function calls.
(: node-types.xq :)
document { <greeting/> } instance of document-node(),
element greeting { "Hello" } instance of element(),
attribute from { "weiqi" } instance of attribute()
[weiqi@gao] $ xquery node-types.xq
true
true
true
Nodes have identities. Two nodes have the same identity if and
only if they are selected from the same spot in the same XML
document. Newly constructed nodes always have a new identity.
Identities can be tested with the is operator:
(: 8 :) <greeting/> is <greeting/> false
XQuery provides the doc() and
collection() functions to bring external XML data
into a program. The doc() function takes a URI and
returns a document node. The collection() function
takes a URI and returns in a sequence of nodes. The
collection() function interprets the URI in an
implementation specific way.
We can use doc() to input an XML document
greeting.xml that contains:
<greeting from = "weiqi">Hello, World!</greeting>
[weiqi@gao] $ qexo
(: 1 :) doc("greeting.xml")
<greeting from="weiqi">Hello, World!</greeting>
Notice how the spaces surrounding the equal sign disappeared
and how a character entity reference has been resolved (111 is the
ASCII code for 'o'). XQuery works on the infoset of
XML documents, where insignificant white spaces, entity
references, and CDATA sections have already been resolved.
XQuery includes XPath 2.0 as a sublanguage. XPath expressions produce new node sequences out of old ones.
An XPath expression consists of one or more steps separated by / or //. Each step has an axis, a test and optional predicates.
Each step works on the result of the previous steps and produces its own results for the next step. A step goes through each node in the input sequence to generate partial results, which are then put together to form the output sequence.
Let's look at a few XPath expressions as they are applied to
the XML document greetings.xml:
<?xml version="1.0" encoding="UTF-8"?>
<greetings>
<greeting from="weiqi">Nihao!</greeting>
<greeting from="brian">Hi!</greeting>
<greeting from="luc">Bonjour!</greeting>
</greetings>
[weiqi@gao] $ qexo
(: 1 :) doc("greetings.xml")/greetings
<greetings>
<greeting from="weiqi">Nihao!</greeting>
<greeting from="brian">Hi!</greeting>
<greeting from="luc">Bonjour!</greeting>
</greetings>
(: 2 :) doc("greetings.xml")//greeting
<greeting from="weiqi">Nihao!</greeting><greeting
from="brian">Hi!</greeting><greeting from="luc">Bonjour!</greeting>
(: 3 :) doc("greetings.xml")//greeting[@from="weiqi"]
<greeting from="weiqi">Nihao!</greeting>
(: 4 :) doc("greetings.xml")//greeting/@from
from="weiqi" from="brian" from="luc"
(: 5 :) doc("greetings.xml")//greeting[1]
<greeting from="weiqi>Nihao!</greeting>
Here we selected the greetings element from the
child axis of the XML document, the greeting elements
from the descendant axis, greeting descendants whose
from attribute has the value "weiqi",
the from attributes of greeting
descendants, and the first greeting descendant.
There is a lot more to XPath expressions that we cannot cover
here. For example, you can use the wild card character
* in place of element and attribute names. You can
also select nodes by their types rather than names.
FLWOR, pronounced flower, stands for "for,
let, where, order by,
return", after the five clauses of the expression.
The for and let clauses introduce
variables and bind them to values. The optional
where clause filters the variables. The optional
order by clause imposes an order on the variables.
The return clause builds the result sequence. Notice
that the use of return in XQuery is quite different
from Java. It specifies the result of a sub-expression and does
not imply returning from a function.
[weiqi@gao] $ qexo
(: 1 :) for $x in (1, 2, 3)
(: 2f:) return <number>{ $x }</number>
<number>1</number><number>2</number><number>3</number>
The for clause binds the variable $x
(variable names always start with a dollar sign), to each item of
(1, 2, 3) in turn. The element constructor in the
return clause is evaluated three times. The value of
the expression is a sequence of three elements. (Qexo's prompt
changes to reflect the clauses we are in.)
(: 3 :) let $a := (1, 2, 3)
(: 4l:) return <numbers>{ $a }</numbers>
<numbers>1 2 3</numbers>
The let clause binds $a to the whole sequence (1,
2, 3). The element constructor in the return clause
is evaluated only once. The content of the numbers
element is the string value of (1, 2, 3).
(: 5 :) for $x in (1, 2, 3)
(: 6f:) where $x >= 2
(: 7w:) return <number>{ $x }</number>
<number>2</number><number>3</number>
The effect of the where clause is obvious here.
(: order-by.xq :) for $x in (<greeting/>, <greeting from="weiqi"/>, <greeting from="brian"/>) order by $x/@from ascending empty least return $x [weiqi@gao-2001 junk]$ xquery order-by.xq <?xml version="1.0" encoding="UTF-8"?> <greeting/> <?xml version="1.0" encoding="UTF-8"?> <greeting from="brian"/> <?xml version="1.0" encoding="UTF-8"?> <greeting from="weiqi"/>
Here we sorted a sequence of greeting elements by
their from attribute in ascending order where a
missing attribute is considered to be less than others. (Saxon
puts an XML declaration in front of every document node or top
level element in the sequence when they are printed. But Saxon's
output format is highly configurable.) You can also specify
descending or empty greatest. The
default order by direction is ascending.
The default empty item treatment is implementation-defined.
Quantifier expressions test for a condition for all or some
items in a sequence. The existential quantifier
(some) tests if some member satisfies the condition;
the universal quantifier (every) tests if all members
satisfy the condition.
[weiqi@gao] $ qexo (: 1 :) some $x in (1, 2, 3) satisfies $x >= 2 true (: 2 :) every $x in (1, 2, 3) satisfies $x >= 2 false (: 3 :) some $x in (1, 2, 3), $y in (3, 4, 5) satisfies $x = $y true (: 4 :) every $x in (1, 2, 3), $y in (3, 4, 5) satisfies $x = $y false
In XQuery's if expression, the else
clause is mandatory. The empty sequence () can be
used after the else clause to return nothing.
(: 5 :) for $x in (-1.5, 0.4, 1.7)
(: 6f:) return <amount> {
(: 7{:) if ($x < 0)
(: 8i:) then
(: 9i:) concat("(", -$x, ")")
(: 10i:) else
(: 11i:) $x
(: 12{:) } </amount>
<amount>(1.5)</amount><amount>0.4</amount><amount>1.7</amount>
XQuery provides a rich set of built-in functions and operators. These include functions and operators on strings, numbers, dates, times, durations, booleans, nodes, and various other kind of data encountered in XML.
XQuery also supports user defined functions and variables:
(: fib.xq :)
declare namespace jnb = "http://ociweb.com/jnb";
declare variable $jnb:pi as xs:decimal { 3.1416 };
declare function jnb:fib($i as xs:integer) as xs:integer {
if ($i = 0 or $i = 1)
then 1
else jnb:fib($i - 1) + jnb:fib($i - 2)
};
jnb:fib(3), jnb:fib(4), jnb:fib(5), $jnb:pi
[weiqi@gao] $ xquery fib.xq
3
5
8
3.1416
Here we declared jnb as an XML prefix with an URI
of http://ociweb.com/jnb, declared a variable named
$jnb:pi, declared a function named
jnb:fib that calculates the $i-th
Fibonacci number, evaluated the function three times, and printed
the value of $jnb:pi. We specified the type of
$jnb:pi as xs:decimal. We specified
both the parameter type and the return type of the function as
xs:integer.
You can append the familiar ?, *, and
+ occurrence indicators to the type specifiers. Thus
a parameter of type xs:integer? accepts either an
integer or the empty sequence (). A return type of
node()* indicates that the function returns a
(possibly empty) sequence of nodes.
Multiple function parameters are separated by commas.
Parameter or return type specifications can be omitted, in which
case they default to item()*, the type of any XQuery
sequence.
XQuery variables are read-only as there is no way to assign new
values to a variable after its declaration. However they may be
shadowed temporarily by variable bindings introduced with
for or let clauses.
You can put functions and variables declarations into library modules. A library module is a file that starts with a module namespace declaration and contains declarations of functions, variables, etc., but does not contain an expression at the end. A main module contains an expression at the end. Both library modules and main modules can import other library modules to access variables and functions declared in the imported module.
(: libfib.xq :)
module namespace jnb = "http://ociweb.com/jnb";
declare function jnb:fib($i as xs:integer) as xs:integer {
if ($i <= 1)
then 1
else jnb:fib($i - 1) + jnb:fib($i - 2)
};
(: mainfib.xq: )
import module namespace jnb = "http://ociweb.com/jnb" at "libfib.xq";
jnb:fib(6)
[weiqi@gao] $ xquery mainfib.xq # Saxon
13
Qexo supports compiled modules. A library module is compiled to a Java class whose name is derived from the module namespace URI. A main module is compiled to a Java class whose name is derived from the module file name.
[weiqi@gao] $ qexo -C libfib.xq # Compile to Java class com.ociweb.jnb (compiling libfib.xq) [weiqi@gao] $ qexo --main -C mainfib.xq # Compile to Java class mainfib (compiling mainfib.xq) [weiqi@gao] $ java mainfib 13
An XQuery API for Java is being developed as JSR 225. Few details are available now.
For the time being implementation specific Java APIs can be used to embed XQuery into Java programs. Both Saxon and Qexo provide easy to use Java APIs to execute XQuery programs inside a Java process. They also provide ways to call Java methods from XQuery programs.
We covered the very basics of the XQuery language. There are more features to XQuery than what is presented here. We did not cover W3C XML Schema imports, user defined types from schemas, static type checking, validation and integration with SQL databases and XML databases.
As the W3C XQuery specifications progress toward recommendation status and beyond, and more Open Source and commercial products become available and more robust, XQuery will become another useful and versatile tool in the Java programmers toolbox.
OCI is the leading provider of Object Oriented technology training in the Midwest. More than 3,000 students participated in our training program over the last 12 months. Targeted toward Software Engineers and the development community, our extensive program of over 50 hands-on workshops is delivered to corporations and individuals throughout the U.S. and internationally. OCI's Educational Services include Group Training events and Open Enrollment classes.
For further information regarding OCI's Educational Services programs, please visit our Educational Services section on the web or contact us at training@ociweb.com.
|
|
|