Needed to stay busy (a long story) and figured I’d kill two birds with one stone, so I parsed some XML for a friend I used to work with. PHP is my favorite scripting language, so of course I opted to use that. Researching this on the internet I came across simpleXML. Unfortunately, I can be slow at times, and I had to look at a lot of articles across the interweb. The one that finally helped to make things click was at this page: Reading XML with PHP
Anyway, I was having issues with namespaces (perhaps another blog entry) and cheated by stripping the XML of all namespace references. So, the sample XML has no namespaces, and has been rewritten to protect the innocent (and maybe to help someone else). Also, the sample is many nodes deep, something that seems to be lacking in most examples I’ve run across.
Sample XML:
<?xml version="1.0" encoding="UTF-8"?> <topLevel> <somethingHere>text,text,text</somethingHere> <troll>SUBMARINE</troll> <statement>HELLO</statement> <secondLevel> <thirdLevelA> <fourthLevelA> <firstNumber>1234567</firstNumber> <whatIsTruth>true</whatIsTruth> <statement>GiveMeLiberty</statement> <fifthLevelA> <sixthLevelA> <number>7654321</number> <statement>or</statement> <someLevelDate>05-26-2012</someLevelDate> <seventhLevelA> <eighthLevelA> <animal>ELEPHANT</animal> <minorType>1ACB</minorType> <troubleOptionCode>LAMP</troubleOptionCode> <!-- Linux - Thanks Linus --> <statement>OrGiveMe</statement> <ninthLevelA> <ninthLevelChild>9LC_A1</ninthLevelChild> <ninthLevelChild>9LC_A2</ninthLevelChild> <ninthLevelChild>9LC_A3</ninthLevelChild> <ninthLevelChild>9LC_A4</ninthLevelChild> <ninthLevelChild>9LC_A5</ninthLevelChild> <ninthLevelChild>9LC_A6</ninthLevelChild> </ninthLevelA> </eighthLevelA> <eighthLevelA> <animal>FISH</animal> <minorType>2DEF</minorType> <troubleOptionCode>WAMP</troubleOptionCode> <!-- Windows --> <statement>Death</statement> <ninthLevelA> <ninthLevelChild>9LC_B1</ninthLevelChild> <ninthLevelChild>9LC_B2</ninthLevelChild> <ninthLevelChild>9LC_B3</ninthLevelChild> </ninthLevelA> </eighthLevelA> <eighthLevelA> <animal>BIRD</animal> <minorType>UCLA</minorType> <troubleOptionCode>MAMP</troubleOptionCode> <!-- Mac --> <statement>3ACTIVE3</statement> <ninthLevelA> <ninthLevelChild>9LC_C1</ninthLevelChild> <ninthLevelChild>9LC_C2</ninthLevelChild> </ninthLevelA> </eighthLevelA> </seventhLevelA> </sixthLevelA> </fifthLevelA> <topLevelAddresses> <address> <city>Colchester</city> <addressChild>CT</addressChild> <postalCode>06415</postalCode> </address> </topLevelAddresses> </fourthLevelA> </thirdLevelA> <thirdLevelB> <fourthLevelB> <fourthLevelBChild>4child_1</fourthLevelBChild> <notDate>05-26-2012</notDate> </fourthLevelB> <fourthLevelB> <fourthLevelBChild>4child_2</fourthLevelBChild> <notDate>05-26-2012</notDate> </fourthLevelB> <fourthLevelB> <fourthLevelBChild>4child_3</fourthLevelBChild> <notDate>05-26-2012</notDate> </fourthLevelB> <fourthLevelB> <fourthLevelBChild>4child_4</fourthLevelBChild> <notDate>05-26-2012</notDate> </fourthLevelB> </thirdLevelB> </secondLevel> </topLevel>
<?php $file_location = 'idz/test/'; $csvList = "xmlName\tTrans\tStatement\tFirst#\tsixthLevelAChild#\tAnimalTypes\tDiffTypes\t4thLevelBChildren\tninthLevel\r\n"; // write column names - tab delimited list $idArray = array("1xml.xml"); // I saved the XML above as 1xml.xml in the idz/test/ folder relative to the script $idArrayCount = count($idArray); foreach($idArray as $value){ // for every value in the array - just in case I have a list $file_name = str_replace('.xml', '', $value); // strip off .xml $url = $file_location . $value; // concat file location (what folder the files are in) and file $csvList .= $file_name . "\t"; if(!$xml=simplexml_load_file($url)){ // try to load file trigger_error('Error reading XML file',E_USER_ERROR); // if file fails to load outputs error message (on my computer in the php error file) } // end if not load file $level4A = $xml->secondLevel->thirdLevelA->fourthLevelA; // make a shortcut to fouthLevelA (to cut down on line length in some instances) $eighthLevelA = $level4A->fifthLevelA->sixthLevelA->seventhLevelA->eighthLevelA; // make a shortchut to eighthLevelA using $level4A and remaining xpath $eighthLevelCount = count($eighthLevelA); // count occurrences of eighth level nodes $fourthLevelBCount = count($xml->secondLevel->thirdLevelB->fourthLevelB); // count occurrences of fourth level B nodes $csvList .= $xml->troll . "\t" . $level4A->statement . $eighthLevelA[0]->statement . $eighthLevelA[1]->statement . "\t" . $level4A->firstNumber . "\t" . $level4A->fifthLevelA->sixthLevelA->number . "\t"; for($i = 0; $i < $eighthLevelCount; $i++){ $csvList .= $eighthLevelA[$i]->animal . " "; } // end for i less than 8th level count $csvList .= "\t"; for($i = 0; $i < $eighthLevelCount; $i++){ $csvList .= $eighthLevelA[$i]->minorType . " "; } // end for i less than 8th level count $csvList .= "\t"; $fourthLevelB = $xml->secondLevel->thirdLevelB->fourthLevelB; for($i = 0; $i < $fourthLevelBCount; $i++){ $csvList .= $fourthLevelB[$i]->fourthLevelBChild . " "; } // end for i less than 4th level count $csvList .= "\t"; // I used the var name troublecount because it was a bit of trouble figuring out the embedded loop for($i = 0; $i < $eighthLevelCount; $i++){ // loop up to count of eighth level nodes $troubleCount[$i] = count($eighthLevelA[$i]->ninthLevelA->ninthLevelChild); // count children of ninth level of this eightLevelA node for($j = 0; $j < $troubleCount[$i]; $j++){ // loop up to count of ninth level children $csvList .= $eighthLevelA[$i]->ninthLevelA->ninthLevelChild[$j] . " "; // concatenate ninth level children to list } // end for j less than trouble count } // end for i less than 8th level count $csvList .= "\r\n"; // end of line (in Microsoft OS) } // end foreach idArray echo $csvList; ?>
Click Here To Download Example XML And Code
Output when code run on sample xml (tab delimited):
xmlName Trans Statement First# sixthLevelAChild# AnimalTypes DiffTypes 4thLevelBChildren ninthLevel 1xml SUBMARINE GiveMeLibertyOrGiveMeDeath 1234567 7654321 ELEPHANT FISH BIRD 1ACB 2DEF UCLA 4child_1 4child_2 4child_3 4child_4 9LC_A1 9LC_A2 9LC_A3 9LC_A4 9LC_A5 9LC_A6 9LC_B1 9LC_B2 9LC_B3 9LC_C1 9LC_C2
I’ve been surfing online more than 2 hours today, yet I never found any interesting article like yours. It is pretty worth enough for me. In my view, if all website owners and bloggers made good content as you did, the internet will be a lot more useful than ever before.