Parsing XML With PHP SimpleXML

Needed to stay busy (a long story) and figured I’d kill two birds with one stone, so I parsed some XML for a friend I used to work with. PHP is my favorite scripting language, so of course I opted to use that. Researching this on the internet I came across simpleXML. Unfortunately, I can be slow at times, and I had to look at a lot of articles across the interweb. The one that finally helped to make things click was at this page: Reading XML with PHP

Anyway, I was having issues with namespaces (perhaps another blog entry) and cheated by stripping the XML of all namespace references. So, the sample XML has no namespaces, and has been rewritten to protect the innocent (and maybe to help someone else).  Also, the sample is many nodes deep, something that seems to be lacking in most examples I’ve run across.

Sample XML:

<?xml version="1.0" encoding="UTF-8"?>
<topLevel>
  <somethingHere>text,text,text</somethingHere>
  <troll>SUBMARINE</troll>
  <statement>HELLO</statement>
  <secondLevel>
    <thirdLevelA>
      <fourthLevelA>
        <firstNumber>1234567</firstNumber>
        <whatIsTruth>true</whatIsTruth>
        <statement>GiveMeLiberty</statement>
        <fifthLevelA>
          <sixthLevelA>
            <number>7654321</number>
            <statement>or</statement>
            <someLevelDate>05-26-2012</someLevelDate>
            <seventhLevelA>
              <eighthLevelA>
                <animal>ELEPHANT</animal>
                <minorType>1ACB</minorType>
                <troubleOptionCode>LAMP</troubleOptionCode>
                <!-- Linux - Thanks Linus -->
                <statement>OrGiveMe</statement>
                <ninthLevelA>
                  <ninthLevelChild>9LC_A1</ninthLevelChild>
                  <ninthLevelChild>9LC_A2</ninthLevelChild>
                  <ninthLevelChild>9LC_A3</ninthLevelChild>
                  <ninthLevelChild>9LC_A4</ninthLevelChild>
                  <ninthLevelChild>9LC_A5</ninthLevelChild>
                  <ninthLevelChild>9LC_A6</ninthLevelChild>
                </ninthLevelA>
              </eighthLevelA>
              <eighthLevelA>
                <animal>FISH</animal>
                <minorType>2DEF</minorType>
                <troubleOptionCode>WAMP</troubleOptionCode>
                <!-- Windows -->
                <statement>Death</statement>
                <ninthLevelA>
                  <ninthLevelChild>9LC_B1</ninthLevelChild>
                  <ninthLevelChild>9LC_B2</ninthLevelChild>
                  <ninthLevelChild>9LC_B3</ninthLevelChild>
                </ninthLevelA>
              </eighthLevelA>
              <eighthLevelA>
                <animal>BIRD</animal>
                <minorType>UCLA</minorType>
                <troubleOptionCode>MAMP</troubleOptionCode>
                <!-- Mac -->
                <statement>3ACTIVE3</statement>
                <ninthLevelA>
                  <ninthLevelChild>9LC_C1</ninthLevelChild>
                  <ninthLevelChild>9LC_C2</ninthLevelChild>
                </ninthLevelA>
              </eighthLevelA>
            </seventhLevelA>
          </sixthLevelA>
        </fifthLevelA>
        <topLevelAddresses>
          <address>
            <city>Colchester</city>
            <addressChild>CT</addressChild>
            <postalCode>06415</postalCode>
          </address>
        </topLevelAddresses>
      </fourthLevelA>
    </thirdLevelA>
    <thirdLevelB>
      <fourthLevelB>
        <fourthLevelBChild>4child_1</fourthLevelBChild>
        <notDate>05-26-2012</notDate>
      </fourthLevelB>
      <fourthLevelB>
        <fourthLevelBChild>4child_2</fourthLevelBChild>
        <notDate>05-26-2012</notDate>
      </fourthLevelB>
      <fourthLevelB>
        <fourthLevelBChild>4child_3</fourthLevelBChild>
        <notDate>05-26-2012</notDate>
      </fourthLevelB>
      <fourthLevelB>
        <fourthLevelBChild>4child_4</fourthLevelBChild>
        <notDate>05-26-2012</notDate>
      </fourthLevelB>
    </thirdLevelB>
  </secondLevel>
</topLevel>
And now the PHP with comments (NOTE: My coding is not bullet proof. Use at your own risk. Also, as with many languages, there are many ways to do the same thing in PHP. I’m not saying I did it the best or most efficient way, just that it worked for me. If it helps you, all the better.):
<?php
$file_location = 'idz/test/';

$csvList = "xmlName\tTrans\tStatement\tFirst#\tsixthLevelAChild#\tAnimalTypes\tDiffTypes\t4thLevelBChildren\tninthLevel\r\n"; // write column names - tab delimited list

$idArray = array("1xml.xml"); // I saved the XML above as 1xml.xml in the idz/test/ folder relative to the script
$idArrayCount = count($idArray);

foreach($idArray as $value){ // for every value in the array - just in case I have a list
    $file_name = str_replace('.xml', '', $value); // strip off .xml
    $url = $file_location . $value; // concat file location (what folder the files are in) and file

    $csvList .= $file_name . "\t";

    if(!$xml=simplexml_load_file($url)){ // try to load file
        trigger_error('Error reading XML file',E_USER_ERROR); // if file fails to load outputs error message (on my computer in the php error file)
    } // end if not load file

    $level4A = $xml->secondLevel->thirdLevelA->fourthLevelA; // make a shortcut to fouthLevelA (to cut down on line length in some instances)
    $eighthLevelA = $level4A->fifthLevelA->sixthLevelA->seventhLevelA->eighthLevelA; // make a shortchut to eighthLevelA using $level4A and remaining xpath    

    $eighthLevelCount = count($eighthLevelA); // count occurrences of eighth level nodes
    $fourthLevelBCount = count($xml->secondLevel->thirdLevelB->fourthLevelB); // count occurrences of fourth level B nodes

    $csvList .= $xml->troll . "\t" . $level4A->statement . $eighthLevelA[0]->statement . $eighthLevelA[1]->statement . "\t" . $level4A->firstNumber . "\t" . $level4A->fifthLevelA->sixthLevelA->number . "\t";

    for($i = 0; $i < $eighthLevelCount; $i++){
        $csvList .= $eighthLevelA[$i]->animal . " ";
    } // end for i less than 8th level count
    $csvList .= "\t";

    for($i = 0; $i < $eighthLevelCount; $i++){
        $csvList .= $eighthLevelA[$i]->minorType . " ";
    } // end for i less than 8th level count
    $csvList .= "\t";

    $fourthLevelB = $xml->secondLevel->thirdLevelB->fourthLevelB;
    for($i = 0; $i < $fourthLevelBCount; $i++){
        $csvList .= $fourthLevelB[$i]->fourthLevelBChild . " ";
    } // end for i less than 4th level count
    $csvList .= "\t";

    // I used the var name troublecount because it was a bit of trouble figuring out the embedded loop
    for($i = 0; $i < $eighthLevelCount; $i++){ // loop up to count of eighth level nodes
        $troubleCount[$i] = count($eighthLevelA[$i]->ninthLevelA->ninthLevelChild); // count children of ninth level of this eightLevelA node
        for($j = 0; $j < $troubleCount[$i]; $j++){ // loop up to count of ninth level children
            $csvList .= $eighthLevelA[$i]->ninthLevelA->ninthLevelChild[$j] . " "; // concatenate ninth level children to list
        } // end for j less than trouble count
    } // end for i less than 8th level count
    $csvList .= "\r\n"; // end of line (in Microsoft OS)
} // end foreach idArray

echo $csvList;
?>

Click Here To Download Example XML And Code

Output when code run on sample xml (tab delimited):

xmlName	Trans	Statement	First#	sixthLevelAChild#	AnimalTypes	DiffTypes	4thLevelBChildren	ninthLevel
1xml	SUBMARINE	GiveMeLibertyOrGiveMeDeath	1234567	7654321	ELEPHANT FISH BIRD 	1ACB 2DEF UCLA 	4child_1 4child_2 4child_3 4child_4 	9LC_A1 9LC_A2 9LC_A3 9LC_A4 9LC_A5 9LC_A6 9LC_B1 9LC_B2 9LC_B3 9LC_C1 9LC_C2
This entry was posted in Computer Stuff and tagged , , , . Bookmark the permalink.

1 Response to Parsing XML With PHP SimpleXML

  1. nouveau maillot barca says:

    I’ve been surfing online more than 2 hours today, yet I never found any interesting article like yours. It is pretty worth enough for me. In my view, if all website owners and bloggers made good content as you did, the internet will be a lot more useful than ever before.

Leave a Reply

Your email address will not be published. Required fields are marked *

Enter Captcha Here : *

Reload Image