Lesson 4 - Working with XML files in PHP
In the previous lesson, Working with CSV files in PHP , we learnt to work with files using resources, about CSV files, and we used our knowledge in a practical example.
In this lesson, we will take a look at working with another file format that we mentioned in the introductory lesson of this course - XML. Here are two ways to access XML files:
- SAX method (S imple A PI
for X ML) - classes
XMLWriter
andXMLReader
and SimpleXML
class.
Unlike the previous lessons, in which we only worked with functions, we will now move to object-oriented programming.
XML generation by SAX method -
XMLWriter
We are getting inspiration from the article Introduction to XML and writing by SAX in the C# course and we will write a similar script in PHP. We will also use the same input data:
$users = new Users(); $users->Add(new User("Paul Goodman", 22, new OurDate(2000, 3, 21))); $users->Add(new User("John Newlands", 31, new OurDate(2012, 10, 30))); $users->Add(new User("Thomas Heart", 16, new OurDate(2011, 1, 12)));
XMLWriter
class to generate XML using the SAX method:
$out = new XMLWriter(); $out->openMemory(); $out->startDocument('1.0', 'UTF-8'); $out->setIndent(TRUE); $users->exportXML($out); $out->endDocument(); echo $out->outputMemory(TRUE);
The most interesting method is probably exportXML()
of the
$users
object. Other commands only set output parameters (XML
version and encoding, indentation, etc.).
The Users
class has two methods:
- first to add another user and
- the second to export the resulting XML:
class Users{ private $list = array(); public function Add($user) { $this->list[] = $user; } public function exportXML($out) { $out->startElement('users'); foreach($this->list as $user) { $user->exportXML($out); } $out->endElement(); } }
The User
class contains 3 attributes, but it's no problem to add
more as needed:
class User { private $name; private $age; private $registration_date; public function __construct($name, $age, $registration_date) { $this->name= $name; $this->age = $age; $this->registration_date = $registration_date; } public function exportXML($out) { $out->startElement('user'); $out->writeAttribute('age', $this->age); $out->writeElement('name', $this->name); $this->registration_date->exportXML($out); $out->endElement(); } }
We can see that the variable for the $name
is added as an
element, but $age
only as an attribute. In reality, it would be
more advantageous to use the year of birth instead of age, but I wanted the data
structure to be similar to the referenced article.
The private object $registration_date
stores the date in three
numbers. It's just an example of how to work with nested objects. We would save
the whole date more easily as one value:
class OurDate{ private $day, $month, $year; public function __construct($year, $month, $day) { $this->day = $day; $this->month = $month; $this->year = $year; } public function exportXML($out) { $out->writeElement('registration_date', "$this->month/$this->day/$this->year"); } }
That is all. After running the script, we get the following result:
<?xml version="1.0" encoding="UTF-8"?> <users> <user vek="22"> <name>Paul Goodman</name> <registration_date>3/21/2000</registration_date> </user> <user vek="31"> <name>John Newlands</name> <registration_date>10/30/2012</registration_date> </user> <user vek="16"> <name>Thomas Heart</name> <registration_date>1/12/2011</registration_date> </user> </users>
SAX is a fast and memory-efficient method for directly generating an XML or XHTML document. However, if we want to further process the output with a template system, it will be more advantageous to use a DOM with which the template systems work directly.
Note: The input data for the XMLWriter class must be in the UTF-8 encoding. Otherwise, it will not work properly with special characters, that occur in many other human languages. Output encoding can be selected as required.
XML reading by SAX method -
XMLReader
Again, we are getting inspiration from the article Reading XML SAX in C# and writing a similar application in PHP. We will use the XML data we generated a while ago:
<?xml version="1.0" encoding="UTF-8"?> <users> <user vek="22"> <name>Paul Goodman</name> <registration_date>3/21/2000</registration_date> </user> <user vek="31"> <name>John Newlands</name> <registration_date>10/30/2012</registration_date> </user> <user vek="16"> <name>Thomas Heart</name> <registration_date>1/12/2011</registration_date> </user> </users>
This time, the task is much more difficult than generating XML using the SAX method. We have to read individual tokens and control the storage of data by their order. For those interested, I recommend using the method only in specific cases, for example when they only need to select some data from a huge XML file. In other cases, it is more convenient to use the DOM method:
$data = new XMLReader(); $data->open('data.xml'); while($data->read()) { switch($data->name) { case 'users': $users = new Users($data); break; } } echo $users, "\n";
More attentive programmers have noticed that we use only 1 case
in the switch
block instead of the shorter if
notation. This is because we use the methodology of a regular automaton for
reading, for which the use of a switch
is usual. When parsing a
more complex document, we will certainly appreciate the easy
extensibility:
class Users { private $list = array(); public function __construct($data) { while($data->read()) { switch($data->name) { case 'user': if($data->nodeType == XMLReader::ELEMENT) { $this->list[] = new User($data); } break; case 'users': return; } } } public function __toString() { $out = array(); foreach($this->list as $user) { $out[] = $user->__toString(); } return implode("\n", $out); } }
Here, too, the use of a switch
does not look very attractive,
but when using a more complex XML structure, we will certainly appreciate the
ease of adding other elements:
class User { private $name; private $age; private $registration_date; public function __construct($data) { $this->age = $data->getAttribute('age'); while($data->read()) { switch($data->name) { case 'name': $data->read(); $this->name = $data->value; $data->read(); break; case 'registration_date': $data->read(); $this->registration_date = $data->value; $data->read(); break; case '#text': break; default: return; } } } public function __toString() { return sprintf("%-20s %2d %10s", $this->name, $this->age, $this->registration_date); } }
In the end, the most complicated class is User
. First, we load
the age
attribute of the user
element and save it as
an attribute. Then do the same with the contents of the elements
name
and registration_date
. If the parser encounters
an unknown element, the constructor is terminated. The pseudo-element
#text
contains whitespace, which are among the elements of the
source XML and which we need to get rid of.
The __toString()
methods are for diagnostic output. After
running the script, we get the following result:
Paul Goodman 22 3/21/2000 John Newlands 31 10/30/2012 Thomas Heart 16 1/12/2011
This example is not programmed as cleanly as it could be. A combination of valid input data that would not pass this process could certainly be found. It was supposed to be just a demonstration that even in PHP it is possible to use the SAX method for reading XML documents.
Reading XML with
SimpleXML
We said, that reading XML using the SAX method is not suitable for normal
use. Now let's see a better way to use the SimpleXML
class.
The SimpleXML
class is intended for easy conversion of an XML
document into objects in PHP. Unlike the XMLReader
class, however,
we do not read the document in a loop one element at a time, but we load it
whole into the object structure. This is very convenient because the slowest
operation is performed by a standard library, which is optimized for this
purpose.
We will use the same data again:
<?xml version="1.0" encoding="UTF-8"?> <users> <user age="22"> <name>Paul Goodman</name> <registration_date>3/21/2000</registration_date> </user> <user age="31"> <name>John Newlands</name> <registration_date>10/30/2012</registration_date> </user> <user age="16"> <name>Thomas Heart</name> <registration_date>1/12/2011</registration_date> </user> </users>
The script listing the data is very short. Here, in contrast to the SAX
method, we only need to create one Users
class:
<?php $data = new Users('data.xml'); echo $data, "\n"; class Users { private $list; public function __construct($xmlFile) { $this->list = new SimpleXMLElement($xmlFile, NULL, TRUE); } public function __toString() { $out = array(); foreach($this->list as $user) { $out[] = sprintf("%-20s %2d %10s", $user->name, $user['age'], $user->registration_date); } return implode("\n", $out); } }
If necessary, we can, of course, add methods for searching for users, password verification, etc. For our purpose, a simple list of users will suffice. After running the script, the following will appear in the browser:
Paul Goodman 22 3/21/2000 John Newlands 31 10/30/2012 Thomas Heart 16 1/12/2011
As we can see, reading a document with the SimpleXMLElement
class is much easier than reading with the SAX method. This class is also about
10 times faster than XMLReader
and better documented. It is
therefore much more convenient for processing common XML documents.
The complete code of all the examples from this lesson can be downloaded as always at the bottom of the article:-)
In the next lesson, Working with INI files in PHP, we'll learn about text files in the INI format and how to work with them in PHP.
Did you have a problem with anything? Download the sample application below and compare it with your project, you will find the error easily.
Download
By downloading the following file, you agree to the license terms
Downloaded 2x (2.43 kB)
Application includes source codes in language PHP