Accueil / Blog / Métier / 2014 / Talend Tutorial - Advanced XML transformation – Part 2 - table rows to XML

Talend Tutorial - Advanced XML transformation – Part 2 - table rows to XML

Par Gaël Pegliasco publié 03/03/2014, édité le 21/11/2014
Contributeurs : Gaël Pegliasco
One difficult point when using Talend Data Integration is working with XML data. Why? Because Talend DI has a "tabular" approach to data based on records, similar to SQL. But XML represents data as a tree, which is ideal to store hierarchical data, lists of lists and such. Moving from a two-dimensional tabular data representation to a tree representation is not so easy. But fortunately, Talend DI provides great tools for this. One of these is the tAdvancedFileOutputXML that we propose to discover in this tutorial.

Tutorial presentation

We have seen in the previous article « Talend Tutorial - Advanced XML transformation – Part 1 - XML to table rows » how to split an XML file into multiple tabular files. We propose you to do the opposite job now : converting many tabular flows into a XML file.

So, in our previous article about Advanced XML transformation we have splitted a XML list of « customer » elements having each many sub-elements « address » into 2 lists, one containing the customers, and another containing the addresses of each customer in order to store these in a SQL Database, for example.

We are going to do the opposite job, merging customers records and customers addresses into one big XML file.

You can download data used in this article from this link.

Our records look like these :

Customers.csv
id;CustomerName;LabelState;RegTime;Fresh;Frozen
1;Griffith Paving and Sealcoatin;Connecticut;03-11-2006;67852.0;61521.4852
2;Bill's Dive Shop;zona;19-11-2004;88792.0;15434.1
Addresses.csv
id;CustomerAddress
1;talend apres 91
1;511 Maple Ave. Apt. 1B
1;1799 Rosemary Way
1;1859 Green Bay Rd.1
2;310 Walker Ave.
2;844 Spruce St.
2;965 Marion Place Apt. 65C
2;511 Hill

And we want this XML output:

<?xml version="1.0" encoding="UTF-8"?> 
<customers_list>

<customer id="1">
<CustomerName>Griffith Paving and Sealcoatin</CustomerName>
<LabelState>Connecticut</LabelState>
<RegTime>03-11-2006</RegTime>
<Fresh>67852.0</Fresh>
<Frozen>61521.4852</Frozen>
<CustomerAddresses>
<CustomerAddress>talend apres 91</CustomerAddress>
<CustomerAddress>511 Maple Ave. Apt. 1B</CustomerAddress>
<CustomerAddress>1799 Rosemary Way</CustomerAddress>
<CustomerAddress>1859 Green Bay Rd.1</CustomerAddress>
</CustomerAddresses>
</customer>
<customer id="2">
<CustomerName>Bill's Dive Shop</CustomerName>
<LabelState>zona</LabelState>
<RegTime>19-11-2004</RegTime>
<Fresh>88792.0</Fresh>
<Frozen>15434.1</Frozen>
<CustomerAddresses>
<CustomerAddress>310 Walker Ave.</CustomerAddress>
<CustomerAddress>844 Spruce St.</CustomerAddress>
<CustomerAddress>965 Marion Place Apt. 65C</CustomerAddress>
<CustomerAddress>511 Hill</CustomerAddress>
</CustomerAddresses>
</customer>
</customers_list>

Job creation

First, create a new job and drag/drop the following components :

  • 2 tFileInputDelimited, names customers and addresses
  • 1 tMap and connect the 2 files above to it, customers being the main flow
  • 1 tLogRow
  • 1 tAdvancedFileOutputXML

tabular_to_xml_1.png

You have to define the schema of each tFileInputDelimited components as described below :

tabular_to_xml_2.png

tabular_to_xml_3.png

Once this has been done, connect these to the tMap and create an inner join link matching all records occurences in main flow.
Then, add an output relation containing all customers attributes completed with the address data
as described in the screenshoot below.

tabular_to_xml_4.png


Name this output relation « customers_with_addresses ». Connect the tLogRow to it in order to check output data.
You should have :

1|Griffith Paving and Sealcoatin|Connecticut|03-11-2006|67852.0|61521.4852|talend apres 91
1|Griffith Paving and Sealcoatin|Connecticut|03-11-2006|67852.0|61521.4852|511 Maple Ave. Apt. 1B
1|Griffith Paving and Sealcoatin|Connecticut|03-11-2006|67852.0|61521.4852|1799 Rosemary Way
1|Griffith Paving and Sealcoatin|Connecticut|03-11-2006|67852.0|61521.4852|1859 Green Bay Rd.1
2|Bill's Dive Shop|zona|19-11-2004|88792.0|15434.1|310 Walker Ave.
2|Bill's Dive Shop|zona|19-11-2004|88792.0|15434.1|844 Spruce St.
2|Bill's Dive Shop|zona|19-11-2004|88792.0|15434.1|965 Marion Place Apt. 65C
2|Bill's Dive Shop|zona|19-11-2004|88792.0|15434.1|511 Hill

Now, connect the tAdvancedFileOutputXML component to the tLogRow.
In its component view, select the output file you want to create. Then double click the component to edit its schema.

tabular_to_xml_5.png

  • Rename the root element to « customers_list ».
  • Add a customer element and drag/drop all source columns except "id" and "CustomerAddress" under « customer element » as sub-element.
  • Drag/Drop « id » column as an attribute of "customer" element.
  • Then, create a « CustomerAddresses » element and drag/drop « CustomerAddress » column as a sub-element of it.


Now, the job is near to be finished. We have to describe on wich item data the component should loop. This means describing which item will be repeated. We iterate over customer's addresses, so we need to define "CustomerAddress" as the loop element.

But this won't be enought.

If we stop the configuration here, all addresses will be grouped into the « CustomerAddresses » element, below the first customer element. No more customer element will be created:

 

<customers_list>
  <customer id="1">
    <CustomerName>Griffith Paving and Sealcoatin</CustomerName>
    <LabelState>Connecticut</LabelState>
    <RegTime>03-11-2006</RegTime>
    <Fresh>67852.0</Fresh>
    <Frozen>61521.4852</Frozen>
    <CustomerAddresses>
      <CustomerAddress>talend apres 91</CustomerAddress>
      <CustomerAddress>511 Maple Ave. Apt. 1B</CustomerAddress>
      <CustomerAddress>1799 Rosemary Way</CustomerAddress>
      <CustomerAddress>1859 Green Bay Rd.1</CustomerAddress>
      <CustomerAddress>310 Walker Ave.</CustomerAddress>
      <CustomerAddress>844 Spruce St.</CustomerAddress>
      <CustomerAddress>965 Marion Place Apt. 65C</CustomerAddress>
      <CustomerAddress>511 Hill</CustomerAddress>
    </CustomerAddresses>
  </customer>
</customers_list>

To prevent from this behaviour we have to explain to the component that we want addresses to be grouped for each distinct customer.

For this, right click on the customer element in the output tree and set it as « group element ».
This will explain to the component that each time a customer element change, a new customer element need to be created and new addresses will be appended to it.

Now, save your work and run your job.
You will get an output XML file as the one described in the top of this article.

You could also do this task using a tXMLMap element in place of the tMap.

If you want to discover the tXMLMap or learn how to convert XML to tabular rows, please have a look at part one of this tutorial.


Ce tutoriel vous a plu ? Consultez notre formation d'initiation à Talend Data Integration.

ABONNEZ-VOUS À LA NEWSLETTER !
Voir aussi
Makina Corpus devient partenaire Gold de Talend Makina Corpus devient partenaire Gold de Talend 12/05/2009

Talend Tutoriel : comprendre les connexions iterate 20/12/2013

Le lien iterate est décrit assez succinctement dans le manuel utilisateur de Talend (User Guide). ...

Préparez votre agenda de formation pour la rentrée ! Préparez votre agenda de formation pour la rentrée ! 21/12/2015

Drupal8, PostgreSQL, Python scientifique... Découvrez nos nouvelles formations et les dernières ...

Geocoder avec Talend Open Studio 22/05/2014

Mettre en place un job de geocodage d'adresses (depuis un fichier XLS) dans l'ETL Talend Open ...

Makina Corpus participe à la conférence mondiale sur OpenStreetMap Makina Corpus participe à la conférence mondiale sur OpenStreetMap 09/09/2019

Du 21 au 23 septembre se tient à Heidelberg, en Allemagne, la douzième conférence annuelle sur ...