subscribe: Posts | Comments

Management of XML in PHP 5: Use DOM

0 comments

Warnings
This tutorial assumes that you know (at least basically) XML.

The examples of XML documents used in this document should not be regarded as models of structuring data … They were chosen to illustrate specific issues relating to the DOM API. We will work on an XML document consists of a list of some countries, grouped by continent.

The information contained in this tutorial relate to the version 5.0.0 of PHP. In this version, functions DOM XML does not launch DomException except in case of error. Simple errors PHP (various levels) are used.

This tutorial is intended to evolve, and you are urged to participate by contacting the author to serve the mistakes that you found, new sections you would like to see, and in general all the changes could be made .

1. The objects DOM
In contrast to the extension of PHP4 which was enough procedural DOM PHP5 is fully applicable. The main classes are as follows:
* DomNode – object node: documents, elements, text nodes …
* DomDocument – document object (inherits DomNode)
* DomElement – object element (inherits DomNode)
* DomAttr – object attribute (inherits DomNode)
* DomNodeList – object list DomNodes (this is not an array PHP!)
There are also objects DomException, which derive from class exception PHP5, but the current version of the extension does not use it.

2. The document DomDocument
Any processing of XML should begin with the following line, which instantiates an object DomDocument, on which we will work:
Initialization
<? PHP
$ dom = new DomDocument ();
?>

2.1. Loading
After the call the manufacturer, we have a virgin XML document without root element. You can create new material from scratch, or choose to load an XML document from a file on the local file system or from a variable string. It uses both the name of the XML file, with its absolute or relative path in the file system. Chargeons the contents of this file in our object $ dom:

Loading an XML file
<? PHP
$ dom-> load ( 'file.xml');
?>
If we had wanted to load the XML document from a variable (or a static chain), which contains the XML tree:
Loading from a string XML
<? PHP
$ dom-> loadXML ($ chaineXML);
?>
It could also open an HTML document (using the method Dom Document: loadHtmlFile) or import a document from SimpleXML (through dom_import_simplexml function, which is not a method Dom Document).

2.2. Registration
We will also be useful to register our XML document on the file system. Suffice it to proceed as follows:

Saving a document XML

<? PHP
$ dom-> save ( 'nouveauFichier.xml');
?>
Thanks to the method DomDocument: saveXML, which refers the document as a string, it would be able to recover the XML document in a PHP variable.
Recording in a variable
<? PHP
chaineXML $ = $ dom-> saveXML ();
?>
Note that in this case, you can specify a parameter a reference to an object DomNode so that only the sub-tree with root for this object is transmitted.

2.3. Validation

The DOM extension allows a very simple validation of a document relating to document DTD specified in the XML document:

Validation of an XML document

<? PHP
$ dom-> validate ();
?>
It should also be noted methods DomDocument: schemaValidate, DomDocument: schemaValidateSource, DomDocument: relaxNGValidate, DomDocument: relaxNGValidateSource, which each take a parameter (address of a file for one, string for the others), and help validate the document in relation to, respectively, an XML schema on the file system, an XML schema in a string, a document relaxNG on the file system, and a document relaxNG in a string .

All these functions return true if successful, false fails validation. In case of failure, errors PHP Warning level are generated, describing the abuses in relation to the reference document. Let's try for example to validate the following document in relation to the DTD that accompanies it:
test.xml
<? xml version = "1.0" encoding = "ISO-8859-1" standalone = "no"?>
<! DOCTYPE continents SYSTEM "test.dtd">
<continents>
<europe>
<pays3> France </ pays3>
<pays> Belgium </>
<pays> Spain </>
</ Europe>
<asie>
<pays> Japan </>
<pays> India </>
</ Asia>
<asie />
</ continents>

test.dtd
<? xml version = "1.0" encoding = "ISO-8859-1"?>
<! ELEMENT continents (europe?, Asia?, America?, Africa?, Oceania?, Antarctic?)>
<! ELEMENT europe (country) *>
<! ELEMENT country (# PCDATA)>
<! ELEMENT asia (country) *>
<! ELEMENT amerique (country) *>
<! ELEMENT africa (country) *>
<! ELEMENT oceania (country) *>
<! Antarctic ELEMENT (country) *>

This produces the following result:
Warning: file: / / / c% 3A/test.xml: 0: Element continents content does not follow the DTD
Expecting (europe?, Asia?, America?, Africa?, Oceania?, Antarctic?), Got (europe asia asia) in c: siteroot index.php on line 6

Warning: file: / / / c% 3A/test.xml: 0: Element europe content does not follow the DTD
Expecting (country) *, got (pays3 country country) in c: siteroot index.php on line 6
Warning: file: / / / c% 3A/test.xml: 0: No declaration for element pays3 in c: siteroot index.php on line 6

In the XML file, the document DTD can be included directly, or specified by SYSTEM with a relative or absolute name on the file system, or with a URL HTTP. In PHP4 version of the extension php_domxml, the reference on the file system was not accepted, and validation was not correctly (including verification of spécifiations +?, * For the number of elements n was not made).

3. Reading a document
3.1. The purpose DomNodeList

All multiple results (including knots) that you will return DOM will be in the form of an object DomNodeList. It must be borne in mind that a DomNodeList is not a table: there is no question of access to its members with an index in brackets.

The class DomNodeList implements the Iterator PHP, which means it has necessarily the current methods, next, key, valid and rewind. It does not usually directly these methods, but basically it means we can go a Iterator (and therefore an object DomNodeList) in a foreach loop. It is a way to get a reference on an object of a DomNodeList.

The other way is a method that did not come from Iterator interface, but is defined by DomNodeList: item. It takes a unique setting a numerical index. Thus the following code:
recovery of a reference from a DomNodeList
<? PHP
$ element = $ listeElements-> item (0);
?>
fetches $ element in the first object by pointing DomNodeList $ listeElements. If we provide a bad index, the method does nothing. If the result is being operated without taking precautions, one gets an error of style:
Notice: Trying to get property of non-object

3.2. Search and retrieve an element
There are several ways to find items. You can recover the root element of the document (in this case, it retrieves an object DomElement and not a DomNodeList, since there is anyway only one root element):
Recovery of the root element
<? PHP
$ root = $ dom-> documentElement;
echo $ root-> nodeName;
?>
It should be noted that objects DomNode (and therefore objects DomElement) have a nodeName property that returns … the name of the node. In the case of an element is the name of the tag. In the opposite direction of the property documentElement DomDocument objects, the elements have a property ownerDocument which is a reference in the document.

It may also seek an element of the value of its type attribute ID, if it is specified in a DTD involved and if the document has been validated (if you just want to look for an element depending on the value of its id attribute it should go through an object DomXPath. maybe, in a forthcoming version of the tutorial …)

Searching for an element
<? PHP
$ target = $ dom-> getElementsById ( "target");
?>
If you want to search by name tag, you can use DomDocument: getElementsByTagName () or DomElement: getElementsByTagName (). The first version is searching throughout the document, the second in the descendants of the element. These functions return an object DomNodeList.

test.xml

<? xml version = "1.0" encoding = "ISO-8859-1"?>
<continents>
<europe>
<pays> France </>
<pays> Belgium </>
<pays> Spain </>
</ Europe>
<asie>
<pays> Japan </>
<pays> India </>
</ Asia>
</ continents>

Research elements
<? PHP
$ dom = new DomDocument;
$ dom-> load ( "test.xml");
listePays $ = $ dom-> getElementsByTagName ( 'country');
foreach ($ listePays as $ country)
echo $ country-> firstChild-> nodeValue. "<br />";
echo "—< br /> ";
europe $ = $ dom-> getElementsByTagName ( 'europe') -> item (0);
listePaysEurope $ = $ europe-> getElementsByTagName ( 'country');
foreach ($ listePaysEurope as $ country)
echo $ country-> firstChild-> nodeValue. "<br />";
?>

Trace script
France
Belgium
Spain
Japan
India

France
Belgium
Spain
It should be noted property nodeValue DomNode objects, which in the case of our DomElement and objects associated with DomNode-> firstChild, will retrieve the value of the node text son.

3.3. Read attributes
We will now change a little our XML file (by hand) to add attributes giving the political regime of the countries mentioned (assuming that the DTD has also been amended accordingly, if one wants to take advantage of validation):
<? xml version = "1.0" encoding = "ISO-8859-1" standalone = "no"?>
<! DOCTYPE continents SYSTEM "test.dtd">
<continents>
<europe>
<pays regime="republique"> France </>
<pays regime="monarchie constitutionnelle"> Belgium </>
<pays regime="monarchie constitutionnelle"> Spain </>
</ Europe>
<asie>
<pays regime="empire"> Japan </>
<pays> India </>
</ Asia>
</ continents>
When the subject has XML element that interests us, we can read its attributes through DomElement: getAttribute (). This is the simplest way: we pass the name of the attribute to retrieve a parameter, and it recovers its value. A good habit to avoid mistakes is to verify the existence of the attribute with the function DomElement: hasAttribute (), which also takes the name of the attribute parameter, which returns a boolean that said if the attribute is present or not.

<? PHP
listePays $ = $ dom-> getElementsByTagName ( "country");
foreach ($ listePays as $ country)
(
echo $ country-> nodeValue;
if ($ country-> hasAttribute ( "Plan")) (
Echo "-". $ country-> getAttribute ( "Plan");
)
echo "<br />";
)
>
This gives us a release:

Trace script

France – REPUBLIC
Belgium – constitutional monarchy
Spain – Constitutional monarchy
Japan – empire
India

3.4. Read the text nodes
It has already seen, you can recover the value of a node with the text attribute nodeValue. From a very general, nodeValue give the value of a node, namely the content for a node text, or the value of an attribute.

We know that a node text itself ( "France", for example) is the first descendant of the element that contains it (<pays>). However, curiosity DOM, nodeValue call on the parent node text is tantamount to calling on the node text itself:
<? PHP
$ countries = $ dom-> getElementsByTagName ( "country");
foreach ($ countries as $ c)
(
echo $ c-> nodeValue. "". $ c-> firstChild-> nodeValue;
echo "<br />";
)
?>

Trace script
France France
Belgium Belgium
Spain Spain
Japan Japan
India India
It should be noted that if the XML is indented, the presence of spaces, tabs and returns to the line generates text nodes. Be careful to apply the function trim (or equivalent) texuelles your values, and / or check if your node text is empty or not.

4. Editing a document
Now let's see how to modify the various elements of an XML document that already exists.

4.1. Create a node
The method DomDocument: createElement permits, very simply, create XML elements, passing a parameter the name of the node.
Creating an element
<? PHP
nouveauPays $ = $ dom-> createElement ( "country");
?>
It should be noted that if $ nouveauPays edge now to a new element "country", that element is not yet positioned in the XML tree. It was created, but not integrated into the document.

If we want to add a node text to this element (to give a country name, if we follow our example), you should call DomDocument: createTextNode node to create the text. The method takes into setting the text to be inserted. Again, the text node is created, but not integrated into the document, or even attached to our new element.
Creating a text node
<PHP
nomPays $ = $ dom-> createTextNode ( "United Kingdom");
?>
It also noted the presence of the method DomNode: cloneNode, which creates a new node (any type) per copy of an existing node:
Creating a node per copy
<? PHP
$ = $ paysIdentique country-> cloneNode ();
?>

This last method accepts an optional argument, a boolean (FALSE default). If it is TRUE, all the nodes son will be copied as well, and therefore a part of the tree can be duplicated in this way.

4.2. Modifying an attribute
Now we must add an attribute to our new node, to clarify the political regime, in accordance with the rest of XML document which serves as an example. For this, we will use DomElement: setAttribute, which serves both to create an attribute and to change the value. The first parameter is the name of the attribute, and obviously the second is its value.

Create or edit attribute
<? PHP
$ nouveauPays-> setAttribute ( "regime", "constitutional monarchy");
?>
You can delete an attribute with DomElement: removeAttribute (with the attribute name as a parameter).

4.3. Insert a node in the document
We have seen how to create the elements and nodes text, but we still need to place them in the XML document, and the right place. The insertion is done by the method DomNode: appendChild, which adds the node passed as a parameter to the list of children of the node on which it is called. The script adds the following text node $ nomPays our new $ nouveauPays node, and then adds it to the node "europe".

Insertion of new elements
<? PHP
$ nouveauPays-> appendChild ($ nomPays);
europe $ = $ dom-> getElementsByTagName ( "europe") -> item (0);
$ europe-> appendChild ($ nouveauPays);
?>

4.4. Remove a knot
Forget the more sad: you no longer want your node, and rather than abandoning the edge of the highway at the start of summer vacation, you want to euthanize. For this, you use the method terrible DomNode: removeChild, calling on the parent node to remove and moving in a reference parameter on the node to remove. Of course, all descendants of the node will also be deleted exterminated. As required, thus remove the node that we have worked so hard to create!

Obliterating of our beautiful brand new node
<? PHP
$ europe-> removeChild ($ nouveauPays);
?>

5. Simple example (and unnecessary): conversion XML / PHP objects
The following function takes a parameter the name of an XML file, accessible valid, and extract an object PHP5 Embodying the architecture of XML document. Each object element has four members: its name (string), its value CDATA (string, empty if necessary), an associative array "attributes" which takes couples name / value attributes and a table "children" incorporates elements son. The comments and reading this tutorial should be sufficient to understanding the code. Note that this feature is of limited value, because we want to adapt to any kind of document, it is therefore a poor overlay of the function DomDocument-> load. By against a function of this type, specialized for a particular class, can be extremely useful in an application PHP5 using XML files as data sources or settings.

XML conversion to object
<? PHP
fileToObject function ($ fileName) (

/ / Creation of the new document object
$ dom = new DomDocument ();

/ / Loading from file
$ dom-> load ($ fileName);

/ / Validation from the DTD referenced in the document.
/ / If you make a mistake, we will not further
if (! @ $ dom-> validate ()) (
return false;
)

/ / Object creation result
$ object = new stdClass ();

/ / Reference address of the source file
$ object-> source = $ fileName;

/ / Retrieves the root element, it puts in a Member
/ / The object called "root"
$ root = $ dom-> documentElement;
$ object-> root = new stdClass ();

/ / Call a recursive function reflecting element XML
/ / And pass hand to her children, while browsing the XML tree.
getElement ($ root, $ object-> root);

return $ object;
)
?>

And this is the recursive function which runs through the XML tree:
XML conversion to object – the trail tree

<? PHP
function getElement ($ dom_element, $ object_element) (

/ / Get the name of the element
$ object_element-> name = $ dom_element-> nodeName;

/ / Recovery of the value CDATA,
/ / Removing spaces formatting.
$ object_element-> textValue = trim ($ dom_element-> firstChild-> nodeValue);

/ / Attributes Recovery
if ($ dom_element-> hasAttributes ()) (
$ object_element-> attributes = array ();
foreach ($ dom_element-> attributes as $ attName => $ dom_attribute) (
$ object_element-> attributes [$ attName] = $ dom_attribute-> value;
)
)
/ / Recovery of son, and browsing the XML tree
/ / Wants length> 1 because the son is always first
/ / Node text
if ($ dom_element-> childNodes-> length> 1) (
$ object_element-> children = array ();
foreach ($ dom_element-> childNodes as $ dom_child) (
if ($ dom_child-> nodeType == XML_ELEMENT_NODE) (
$ child_object = new stdClass ();
getElement ($ dom_child, $ child_object);
array_push ($ object_element-> children, $ child_object);
)
)
)
)
?>
Let's now being created by this function, if applied to the XML file on which we worked:
display of the object result
<? PHP
Echo "<pre>";
print_r (fileToObject ( "test.xml"));
echo "</ pre>";
?>
trace of execution
stdClass Object
(
[source] => test.xml
[root] => stdClass Object
(
[name] => continents
[textValue] =>
[children] => Array
(
[0] => stdClass Object
(
[name] => europe
[textValue] =>
[children] => Array
(
[0] => stdClass Object
(
[name] => countries
[textValue] => France
[attributes] => Array
(
[regime] => republique
)
)
[1] => stdClass Object
(
[name] => countries
[textValue] => Belgium
[attributes] => Array
(
[regime] => constitutional monarchy
)
)
[2] => stdClass Object
(
[name] => countries
[textValue] => Spain
[attributes] => Array
(
[regime] => constitutional monarchy
)
)
)
)
[1] => stdClass Object
(
[name] => asia
[textValue] =>
[children] => Array
(
[0] => stdClass Object
(
[name] => countries
[textValue] => Japan
[attributes] => Array
(
[regime] => empire
)
)
[1] => stdClass Object
(
[name] => countries
[textValue] => India
)
)
)
)
)
)
The second function we will see is the reverse: from an object created by the former, she writes in an XML file (assuming that the rights granted are writing to the server, of course).

Now the function which takes a parameter object as established above, and who writes on the disk in the form of an XML document. This is the reverse conversion, in short.
conversion object to XML
<? PHP
ObjectToFile function ($ xmlObject) (

/ / Create a new document object
$ dom = new DomDocument ();

/ / Create a root element
$ root = $ dom-> createElement ($ xmlObject-> root-> name);
$ dom-> appendChild ($ root);

/ / Call a recursive function which built the XML element
/ / From the object, along the entire tree of the object.
setElement ($ dom, $ xmlObject-> root, $ root);

/ / Update file original source
$ dom-> save ($ xmlObject-> source);
echo $ xmlObject-> source;
)
?>
As previously, this function uses a recursive function paths tree in depth:
conversion object to XML – tee tree
<? PHP
setElement function ($ dom_document, $ object_element, $ dom_element) (

/ / Recovery of the value of the element CDATA
if (isset ($ object_element-> textValue)) (
$ cdata = $ dom_document-> createTextNode ($ object_element-> textValue);
$ dom_element-> appendChild ($ cdata);
)
/ / Recovery attributes
if (isset ($ object_element-> attributes)) (
foreach ($ object_element-> attributes as $ attName => $ attValue) (
$ dom_element-> setAttribute ($ attName, $ attValue);
)
)
/ / Construction elements son, and the trail of the tree
if (isset ($ object_element-> children)) (
foreach ($ object_element-> children as $ childObject) (
$ = $ dom_document child-> createElement ($ childObject-> name);
setElement ($ dom_document, $ childObject, $ child);
$ dom_element-> appendChild ($ child);
)
)
)
?>