Using XML and XQuery in SQL Server 2005
XML in a Relational Database?
Since it appeared on the world scene in 1998, eXtensible Markup Language (XML) has become quite a useful standard for exchanging data and documents between different systems. XML is a markup language (derived from Standard Generalized Markup Language, or SGML) for documents that contain structured information.
You might be wondering why you would want to store and work with XML in the database
You might be wondering why you would want to store and work with XML in the database. Isn’t that what XML parsers and applications are for? Up until now, XML would usually be derived from database tables and queries, and programmatically placed into a transport format (such as a physical document or a Web service), parsed and read on the receiving end, and then placed into another database.
On one end of the spectrum, a database purist might say that you should never store XML in the database, because it is a transfer mechanism, not a storage mechanism. They would argue that you should always use XML to transport data from one database or application to another, deconstruct the XML on import and store it in relational tables, and reconstruct it on export from the relational tables back to XML for transport.
On the other end of the spectrum, an XML junkie, especially a blogger, might argue that the database is dead—that you don’t need databases anymore because XML provides a nice data storage mechanism with data integrity (XML Schema or XML Schema Definition [XSD]) and a query engine (XQuery). They view the world as just a bunch of XML files and will hardly use a relational database and use XML technologies to store and manipulate their data.
As with the most extreme sides of any argument, however, both camps are wrong. Each has a good argument and valid points. A relational database has features such as primary keys, indexes, and referential integrity, which makes it a far superior storage and querying mechanism for raw data.
Some applications, or even databases themselves, will shred XML data into relational data in order to store it in the database and compose XML when data is retrieved. Other times, the XML data is simply persisted as [unstructured] text in the database. Microsoft SQL Server 2000 offered both of these options, yet neither is necessarily the desirable solution. It would be much better if XML could be stored in a relational database in its native format.
So which do you use, a true, “pure” relational approach or a hybrid where you store XML in the database and work with it there? The answer as with so much in SQL Server is “it depends.”
When you are architecting a highly transaction application (traditionally referred to Online Transaction Processing or OLTP) system where there are many reads and writes simultaneously by users, you are best suited to use the full relational database technology like primary key, referential integrity and transactions. In a massive a data warehouse where you want to provide the user access to trend analysis and data mining algorithms you most likely will want to use the traditional relational model in conjunction with the OLAP (Online Analytical Processing) technology. Most of the time where you are doing a lot of database reads and writes simultaneously by many users you want to use the traditional relational approach.
I am not going to try to convince you to go out and convert all of your data to XML just for the sake of using a new technology
I am not going to try to convince you to go out and convert all of your data to XML just for the sake of using a new technology, but there are definitely times where you want to consider using XML in your database. An instance where you may want to store and use XML in a relational database is when you are transforming data out of an OLTP database to another “flatter” less relational database that is used for simple queries and display on web pages. For example, let’s assume that you have a highly transactional real time database with many users and the data is stored in fifth normal form. At the same time this data also powers your web site, but the web site can tolerate a level of latency in terms of the data being refreshed from the real time application (an example is a traditional banking application, it usually take a period of time from 2 minutes to 1 day for an ATM transaction to appear on your online banking site depending on the bank). A good architecture is to then transform on a regular basis (as your business needs require) the OLTP data to a more denormalized or flat model used for web access and reports. Your data model is then tailored to the web pages access and queries that power the pages and you can build an index strategy around this process. At this point you may want to store some data for XML since you can easily transform it or bind it or do whatever you want to it on the client side.
SQL Server XML Data Type
SQL Server 2005 introduces a new data type for working with XML data: the XML data type. Using this new data type you can store XML in its native format, query the data within the XML, efficiently and easily modify data within the XML without having to replace the entire contents, and index the data in the XML. And, it can be used in any of the following ways:
- As a variable
- As a parameter in a stored procedure or a user-defined function (UDF)
- As a return value from a UDF
- As a column in a table
SQL Server 2005 introduces a new data type for working with XML data: the XML data type
There are some limitations of the XML data type that you should be aware of. Although an XML data type can contain and be checked for null values, unlike other native types, you cannot directly compare an instance of an XML data type to another instance of an XML data type. Any such equality comparisons require casting the XML type to a character type first. This limitation also means that you cannot use ORDER BY or GROUP BY with an XML data type. There are several other restrictions, but I will discuss these restrictions in more detail later in a more appropriate section.
Now these may seem like some pretty severe restrictions, but in all actuality, they really don’t affect the XML data type when it is used in an appropriate manner. The XML data type also has a rich feature-set that more than compensates for these supposed limitations.
Let’s create a table and add some XML data to an XML column. You can create the XML column in SQL Server Management Studio or via TSQL. Here is an example:
USE AdventureWorks
GO
--create the table with the XML Datatype
CREATE TABLE OrdersXML
(OrderDocID INT PRIMARY KEY,
xOrders XML NOT NULL)
As I also stated earlier, however, there are a few other restrictions of the XML data type, in this case, when used as a column in a table:
- It cannot be used as a primary key.
- It cannot be used as a foreign key.
- It cannot be declared with a UNIQUE constraint.
- It cannot be declared with the COLLATE keyword.
Getting some data in there is easy, let’s just assign some XML to a variable and insert it into our new table:
--Insert Static XML via a variable
DECLARE @xmlData AS XML
SET @xmlData = ’
5
65
25
’
--insert into the table
INSERT INTO OrdersXML (OrderDocID, xOrders) Values (1, @xmlData)
Its that simple! Now let’s make it more useful and query the data.
Storing XML in the database is one thing, and querying it efficiently is another. With SQL Server 2000, which has no XML data type, you have to deconstruct the XML and move element and attribute data into relational columns in order to perform a query on the XML data residing in the text column; or some other searching mechanism had to be used, such as full-text search.
XQuery Defined
XQuery is a language used to query and process XML data. It is a W3C standard whose specification is located at http://www.w3.org/TR/xquery/. The XQuery specification contains several descriptions of requirements, use cases, and data models. We encourage you to go to the specification and read “XQuery 1.0, an XML Query Language” and “XQuery 1.0 and XPath 2.0 Functions and Operators” to get an full understanding of what XQuery is all about. However, we will explain enough here for you to get the basics done. After reading this section, you will be able to select, filter, and update XML data using XQuery.
Understanding XQuery Expressions and XPath
The first thing to remember is that XQuery is an XML language, so all the rules of XML apply: It is case sensitive and its keywords are all in lowercase. While XQuery has some powerful formatting and processing commands, XQuery is primarily a query language (as its name suggests), so we will focus on writing queries. The body of a query consists of two parts: an XPath expression and a FLWOR expression. Let’s start by discussing XPath expressions.
The body of a query consists of two parts: an XPath expression and a FLWOR expression
XPath 2.0
XPath, another W3C standard (http://www.w3.org/TR/xpath), uses path expressions to identify specific nodes in an XML document. These path expressions are similar to the syntax you see when you work with a computer file system (for example, c:\folder\myfile.doc). Take a look at the following XML document.
Remi Caron
Bill Zack
49.99
Developing ADO .NET
Joop Pecht
39.93
Windows Cluster Server
Stephen Forte
59.99
The following XPath expression selects the root element catalog:
/catalog
The following XPath expression selects all the book elements of the catalog root element:
/catalog/book
The following XPath expression selects all the author elements of all the book elements of the catalog root element:
/catalog/book/author
One last note: If the path starts with a slash (/), it represents an absolute path to an element.
Summing things up, XPath enables you to specify a subset of data within the XML (via its location with the XML structure) with which you want to work. Using XPath will get you to the data you want to process, and essentially is a very basic form of querying against XML data since it allows you select a subset of data; XQuery, however, is more robust and allows you to perform more complex queries against the XML data via FLOWR expressions.
Selection Logic: FLWOR Expressions
Just as SELECT, FROM, WHERE, GROUP BY, and ORDER BY form the basis of SQL’s selection logic, the for, let, where, order by, and return (FLWOR) keywords form the basic of every XQuery query you write. The for and let keywords are used to assign variables and iterate through the data. (The let keyword is not supported in the SQL Server 2005 implementation of XQuery; SQL Server assigns the variable automatically without a let.) The where keyword works as a restriction and outputs the value of the variable. For example, this basic XQuery uses the XPath expression /catalog/book to obtain a reference to all the nodes and the for keyword initiates a loop, but only of elements where the category attribute is equal to ITPro.
for $b in /catalog/book
where $b/@category="ITPro"
return ($b)
Armed with this basic knowledge of XQuery expressions, you are ready to see it in action.
SQL Server 2005 XQuery in Action
SQL Server 2005 has a standards-based implementation of XQuery that directly supports XQuery functions on the XML data type. It supports XQuery by using five methods of the XML data type:
- xml.exist(): uses XQuery input to returns 0, 1, or NULL, depending on the result of the query. It returns 0 if no elements match, 1 is if there is a match, and NULL if there is no XML data on which to query. This method is often used for query predicates.
- xml.value(): accepts an XQuery as input and returns a SQL Server scalar type.
- xml.query(): accepts an XQuery as input and returns an XML data type stream as output.
- xml.nodes(): accepts an XQuery as input and returns a single-column rowset from the XML document. In essence, it shreds XML into multiple smaller XML results.
- xml.modify(): allows you to insert, delete, or modify nodes or sequences of nodes in an XML data type instance using XQuery Data Manipulation Language.
SQL Server 2005 has a standards-based implementation of XQuery that directly supports XQuery functions on the XML data type
We will discuss the exist and value methods of the XML data type shortly. But first we have to create some sample data. We will create a simple table that contains speakers at a software developers conference and their corresponding classes. Usually you normalize the data and have a one-to-many relationship between a speakers table and the classes table. Instead of using an additional normalized table, we will model this as one table with the speaker’s information and one XML column with the speaker’s classes. (You might want to consider this in a “published” database on the Web that was transformed from normalized data and use the XML column for easy HTML display.) Here is the table and some data:
Create Table tblSpeakers
(
Speaker_ID Integer Primary Key Identity,
Speaker_NM nVarChar(50),
Speaker_Country nVarChar(25),
Speaker_XML XML (classes_xsd) Not Null)
Insert into tblSpeakers Values('Stephen Forte', 'USA',
'
)
Insert into tblSpeakers Values('Richard Campbell', 'Canada',
'
'
)
Insert into tblSpeakers Values('Remi Caron’, 'NL',
'
'
)
Think about it: you can run a T-SQL query and restrict the query on a value of a particular XML element!
xml.exist()
The true usefulness of XQuery comes true when you can use it to search based on the values of a particular element or attribute. The xml.exist() function accepts an XQuery as input and returns 0, 1, or NULL, depending on the result of the query; 0 is returned if no elements match, 1 is returned if there is a match, and NULL is returned if there is no data to query on.
The most useful way to use xml.exist() is to use its 0, 1, or NULL as part of a T-SQL WHERE clause. Think about it: you can run a T-SQL query and restrict the query on a value of a particular XML element! Going back to our main example, let’s look for the value of ‘SQL Server and Oracle Working Together‘ in the element. Here is the XQuery expression to do this:
/classes/class[@name="SQL Server and Oracle Working Together"]
This is how you put it to work:
Select * From tblSpeakers Where Speaker_XML.exist(
/classes/class[@name="SQL Server and Oracle Working Together"]')=1
The results look like this:
1 Stephen Forte USA data
2 Richard Campbell Canada data
The XML returned in these results look like this for Stephen:
xml.value()
The xml.value() function takes a valid XQuery expression and returns a SQL Server scalar value that you specify. Here’s an example of the syntax against our current XML document in the database:
xml.value('/classes[1]/class[1]/@name', 'varchar(40)')
This XQuery contains an XPath expression that navigates the first class’s name attribute and a cast to varchar(40).
You have to perform an XQuery expression on an XML column as part of a regular T-SQL query, as shown here; what is cool is that SQL Server combines both relational queries and XQuery in one query because in this example we use a traditional T-SQL WHERE clause to show only speakers from the USA.
select Speaker_ID, Speaker_NM, Speaker_Country,
Speaker_XML.value('/classes[1]/class[1]/@name', 'varchar(40)') as Sessions
from tblSpeakers
where speaker_country ='USA'
The results are shown here:
Speaker_ID Speaker_NM Speaker_Country Sessions
1 Stephen Forte USA Writing Secure Code for ASP .NET
As you can see, using XML in SQL Server 2005 has come compelling features: you can store XML natively, you can use XML as a variable and you can query it. Since we are limited on space, I will let you investigate the other XML features of SQL Server 2005 including the FOR XML features, XSD Schema, XML indexes and bulk load.
Happy XMLing!