-
Hajipur, Bihar, 844101
PHP provides multiple ways to handle XML, and one of the most efficient for large or streamed XML files is the Expat parser. Unlike SimpleXML or DOM, which load the entire XML document into memory, the Expat parser is event-driven. It reads XML data line by line, triggering events when it encounters the start or end of a tag, or character data.
This tutorial explains how to use the Expat parser to read XML efficiently, handle events, and extract data in a controlled, memory-friendly way.
Expat is a low-level XML parser built into PHP. Instead of creating objects for each element, it notifies your script whenever it encounters:
The start of an element (start tag)
The end of an element (end tag)
Character data (text content inside tags)
You define handler functions to tell PHP what to do when these events occur.
We will use the following students.xml file for examples:
<students>
<student id="1">
<name>Riya Sharma</name>
<age>20</age>
<course>Web Development</course>
</student>
<student id="2">
<name>Ananya Singh</name>
<age>22</age>
<course>Data Science</course>
</student>
<student id="3">
<name>Meera Patel</name>
<age>21</age>
<course>UI/UX Design</course>
</student>
</students>
First, create an XML parser using xml_parser_create().
<?php
$parser = xml_parser_create();
?>
This initializes the parser and prepares it to read an XML file.
You need to define functions that handle start tags, end tags, and character data.
<?php
function startTag($parser, $name, $attrs) {
echo "Start Tag: $name<br>";
if (!empty($attrs)) {
foreach ($attrs as $key => $value) {
echo "Attribute: $key = $value<br>";
}
}
}
function endTag($parser, $name) {
echo "End Tag: $name<br>";
}
function characterData($parser, $data) {
$data = trim($data);
if (!empty($data)) {
echo "Data: $data<br>";
}
}
?>
startTag() is called whenever a start tag is found.
endTag() is called whenever an end tag is encountered.
characterData() is called for the text content inside tags.
The $attrs parameter in startTag() contains any attributes of the tag.
Next, tell the parser which functions to use for each event:
<?php
xml_set_element_handler($parser, "startTag", "endTag");
xml_set_character_data_handler($parser, "characterData");
?>
This links the parser events to your custom functions.
You can read the XML file in chunks to efficiently handle large files:
<?php
$fp = fopen("students.xml", "r") or die("Could not open XML file");
while ($data = fread($fp, 4096)) {
if (!xml_parse($parser, $data, feof($fp))) {
die("XML Error: " . xml_error_string(xml_get_error_code($parser)));
}
}
fclose($fp);
xml_parser_free($parser);
?>
fread() reads the file in chunks of 4096 bytes.
xml_parse() processes each chunk. The third parameter feof($fp) tells the parser if it’s the last piece of the file.
xml_parser_free() releases memory once parsing is complete.
Memory-efficient: Does not load the entire file into memory.
Fast: Processes data as it is read.
Real-time processing: Can handle XML streams or large files.
Flexible: You define exactly how to handle each element or data piece.
Here’s a simple example to print all student names:
<?php
$parser = xml_parser_create();
function startTag($parser, $name, $attrs) {
global $insideName;
if ($name == "NAME") {
$insideName = true;
}
}
function endTag($parser, $name) {
global $insideName;
if ($name == "NAME") {
$insideName = false;
}
}
function characterData($parser, $data) {
global $insideName;
$data = trim($data);
if ($insideName && !empty($data)) {
echo "Student Name: $data<br>";
}
}
xml_set_element_handler($parser, "startTag", "endTag");
xml_set_character_data_handler($parser, "characterData");
$fp = fopen("students.xml", "r");
while ($data = fread($fp, 4096)) {
xml_parse($parser, $data, feof($fp)) or
die("XML Error: " . xml_error_string(xml_get_error_code($parser)));
}
fclose($fp);
xml_parser_free($parser);
?>
Here, we use a global variable $insideName to track whether the parser is currently inside a <name> tag.
Expat lets you access attributes when a start tag is found. For example, to print student IDs:
function startTag($parser, $name, $attrs) {
if ($name == "STUDENT") {
echo "Student ID: " . $attrs['ID'] . "<br>";
}
}
The $attrs array contains all attributes of the current tag.
It’s important to handle XML errors to prevent the parser from crashing:
if (!xml_parse($parser, $data, feof($fp))) {
$code = xml_get_error_code($parser);
$line = xml_get_current_line_number($parser);
die("XML Error $code at line $line: " . xml_error_string($code));
}
This provides the error code and line number where parsing failed.
| Feature | SimpleXML | DOM | Expat |
|---|---|---|---|
| Load entire file | Yes | Yes | No |
| Memory usage | Low–Moderate | High | Low |
| Event-driven | No | No | Yes |
| Modify XML | Yes | Yes | Limited |
| Best for | Small files | Editing/creating XML | Large/streamed XML |
Expat is ideal when performance and memory efficiency are crucial, like reading logs, streaming data, or very large XML files.
The Expat parser in PHP is an event-driven XML parser that processes files efficiently, line by line. You learned:
How to create an Expat parser with xml_parser_create()
Defining start tag, end tag, and character data handler functions
Attaching handlers with xml_set_element_handler() and xml_set_character_data_handler()
Reading large XML files in chunks
Handling attributes and errors
Expat is a powerful choice for scenarios where memory usage and speed are more important than the convenience of object-based XML access.
Create an Expat parser that prints all start and end tags from students.xml.
Modify the parser to print only the <name> values of all students.
Using Expat, print the id attribute of each <student> element.
Write a parser that prints both the tag name and its text content for all elements in students.xml.
Create handler functions to detect when the parser enters a <course> tag and print its value.
Read students.xml in chunks of 2048 bytes using Expat and print each <student> name.
Write an Expat parser that counts the total number of <student> elements in the XML file.
Handle missing or malformed XML gracefully, printing the error code and line number.
Modify the parser to print all attributes of any element dynamically, not just id.
Create a parser that prints the text inside nested tags, such as <details> or <address>, if present in the XML.