Lesson 6 - Introduction to the XML File Format in Java
In the previous exercise, Solved tasks for Java Files Lessons 1-5, we've practiced our knowledge from previous lessons.
In the previous lesson, Solved tasks for Java Files Lessons 1-5, we focused on text files. Today, we're going to focus on the XML format. First we're going to describe it, then show the classes that Java provides for reading and writing it.
The XML Format
We're about to go over lots of terms. If you don't understand any of them, don't worry, we'll go into as much detail as possible in further lessons
XML (eXtensible Markup Language) is a markup language developed by W3C (the organization that is responsible for Web standards). XML is very universal and is supported by a number of languages and applications. The word extensible indicates the ability to create your own language using XML, one of which is XHTML for creating websites. XML is a self-describing language, meaning that it has a structure in which we can determine what each value means. In CSV files, we can only guess what the third number eight means, whereas in XML, it'd be immediately clear that it's the number of articles that the user has written. The disadvantage to it is that the XML files are larger, but it's not a problem for us in most cases. Personally, I almost always choose to use the XML format, it's a good choice for saving a program's configuration, high scores for game players, or for saving a small user database. Thanks to XSD schemas, we can also validate them so that we can prevent errors during run time.
XML can be processed in different ways. Usually, by continuously reading/writing or using a DOM object structure. We're so far in that some tools allow us to work with XML just like a database and execute SQL queries on it (the XPath or SQL languages are used to do that). As you can imagine, this saves a lot of work.
XML competes with JSON, which is simpler but less popular in business applications. Unlike XML, it can be used to log at the end of a file easily without loading the entire document.
XML is very often used to exchange data between different systems (e.g. desktop applications and web applications on a server). Therefore, as we've already mentioned, there are many libraries for it and every tool is aware of and is able to work with it. This includes web services, SOAP, and so on. However, we won't deal with any of them now.
Last time, we saved a list of users to a CSV file. We saved their name, age, and date of registration. The values were next to each other, separated by semicolons. Each line represented a user. The file's contents looked like this:
John Smith;22;3/21/2000 James Brown;31;10/30/2012
Anyone who isn't directly involved wouldn't know what any of that means, would they? Here is the equivalent to that file in the XML format:
<?xml version="1.0" encoding="UTF-8" ?> <users> <user age="22"> <name>John Smith</name> <registered>3/21/2000</registered> </user> <user age="22"> <name>James Brown</name> <registered>10/30/2012</registered> </user> </users>
Now everyone can tell what is stored in the file. I saved age as an attribute just to demonstrate that XML is able to do things like that. Otherwise, it'd be saved as an element along with the name and registration date. Individual items are called elements. I'm sure you're all familiar with HTML, which is based on the same fundamentals as XML. The elements are usually paired, meaning that we write the opening tag followed by the value and then the closing tag with a slash. Elements can contain other elements, so it has a tree structure. Furthermore, we're able to save an entire hierarchy of objects into a single XML document.
At the beginning of an XML file, there's a header. The document has to
contain exactly one root element in order for it to be valid. Here, it's the
<user>
element which contains the other nested elements.
Attributes are written after the attribute name in quotation marks.
As you can probably tell, the file got bigger, which is the price paid for it to look pretty. If the user had more than three fields, you'd be able to see just how messy the CSV format can get, and how worthwhile the XML format is. Personally, as I gain more and more experience, I prefer solutions that are clear and simple, even if that means that they occupy more memory. This not only applies to files but for source codes as well. There is nothing worse than when a programmer looks at their code after a year and has no idea what the eighth parameter in a CSV file is when there are 100 numbers per line. Even worse, having a five-dimensional array, which is super fast, but if they designed an object structure instead, they wouldn't have to rewrite this whole functionality now. However, let's get back to today's topic.
XML in Java
We'll focus on two fundamental approaches to work with XML files - the continuous approach (the SAX parser) and the object oriented approach (DOM). Today's and the next lessons will be dedicated to SAX, after which we'll get to DOM. Again, there are more ways to work with XML files in Java and there are lots of classes for that. I try to show the most modern approaches and simple constructs.
Parsing XML via SAX
SAX (stands for Simple API for XML) is actually a simple extension of the
text file reader. Writing is relatively simple. We subsequently write the
elements and attributes in the same order as they are present in the file (we
ignore the tree structure in this approach). Java provides the
XMLStreamWriter
which is then wrapped by the
SAXParserFactory
class. This relieves us from having to deal with
the fact that XML is a text file. We only work with the elements, more
accurately, nodes (more on that later).
Reading is performed just like writing. We read the XML as a text file, line
by line, from top to bottom. SAX gives us what are known as nodes which it gets
while reading. A node can be an element, an attribute, or a value. We receive
nodes in a loop in the same order that they're written in the file. We use the
XMLStreamReader
class to read XML files.
The advantage to the SAX approach is its high speed and low memory requirements. We'll see the disadvantages once we compare this approach to the DOM object-oriented approach later on. In the next lesson, Writing XML Files via the SAX Approach in Java, we'll create a XML file using the SAX approach.