How to parse complex XML to a long format data frame in R -


i tried parse xml r data frame.

xml.text <-    '<?xml version="1.0" encoding="utf-8" standalone="yes"?> <recordgroup>     <period>60</period>     <record>         <datetime>01102015000000</datetime>         <field>             <id>equipos.0cr02-1.ae</id>             <value>34.405000</value>         </field>         <field>             <id>equipos.0cr02-1.api</id>             <value>160.794000</value>         </field>     </record>     <record>         <datetime>01102015001500</datetime>     <field>       <id>equipos.0cr02-1.ae</id>       <value>38.309000</value>     </field>     <field>       <id>equipos.0cr02-1.api</id>       <value>152.800000</value>     </field>   </record> </recordgroup>'  library(xml) xml <- xmlparse(xml.text) indata <- xmltodataframe(getnodeset(xml, "//recordgroup/record")[1]) 

i work 1 record. , result creates table 2 columns (datetime , field) , 1 row. text between tags below field joined together:

    datetime                                                      field 1 01102015000000 equipos.0cr02-1.ae34.405000\nequipos.0cr02-1.api160.794000 

as datetime both field structures needed long format table structure this:

    datetime            id               value 1 01102015000000 equipos.0cr02-1.ae    34.405000 2 01102015000000 equipos.0cr02-1.api  160.794000 3 01102015001500 equipos.0cr02-1.ae    38.309000 4 01102015001500 equipos.0cr02-1.api  152.800000 ... 

your xml little messed up, can fix it:

library(xml) xml <- xmlparse(xml.text) xmlout <- do.call(rbind, xpathapply(xml,'//recordgroup/record', xmltodataframe)) 

which gives you:

            text                  id      value 1 01102015000000                <na>       <na> 2           <na>  equipos.0cr02-1.ae  34.405000 3           <na> equipos.0cr02-1.api 160.794000 4 01102015001500                <na>       <na> 5           <na>  equipos.0cr02-1.ae  38.309000 6           <na> equipos.0cr02-1.api 152.800000 

you can clean using tidyr , dplyr:

library(tidyr) library(dplyr)  xmlout %>% fill(text) %>%            na.omit              text                  id      value 2 01102015000000  equipos.0cr02-1.ae  34.405000 3 01102015000000 equipos.0cr02-1.api 160.794000 5 01102015001500  equipos.0cr02-1.ae  38.309000 6 01102015001500 equipos.0cr02-1.api 152.800000 

Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -