Tuesday, 18 December 2012

Generating Metadata for XML Files using Talend Open Studio

In the last post here, I have shown, How to generate metadata for CSV/Delimited files. Today, I am going to generate metadata for XML files. Most of the organizations use XML files to transfer the data as information coded in XML is easy to read and understand, plus it can be processed easily by computers.

Let us start by creating a metadata for single schema XML file.

Here is the sample XML file for which we will create the metadata.

<?xml version="1.0" encoding="UTF-8"?>
        <Employee department="10">
        <name>Anuj Mittal</name>
    <Employee department="11">
        <name>Anshika Arora</name>
    <Employee department="10">
        <name>Raju Gupta</name>
    <Employee department="13">
        <name>Anurag Sinha</name>
    <Employee department="13">
        <name>David Zynga</name>

1. Right Click File XML under the Metadata section in Repository pane and Click Create file xml.

2. In the pop-up window, enter the name o metadata. You can also enter additional information in the Purpose and Description boxes. There are other configuration options for Version, Status, and Path, but these are not mandatory, so we will leave them for now. Click on Next button.

3. Select Input XML and click Next.

4. Locate the XML file for which metadata needs to be created. Select Encoding as UTF-8.You can also see the schema of the File in the Schema Viewer section. 

5. Our XMLfile is a series of Employees. So, in order to extract data for each Employee, we need to set this as our loop element. We want the Studio to loop over all of the Employees when the job runs. In order to configure this, we need to map the Employee to the Xpath loop expression box. Click on the Employee element in the Source Schema pane and drag it to the Xpath loop expression box:

The Loop limit field determines how many times the job will loop over the selected element. By default, this is configured to 50, you can also change this to 0, which is the number used to configure no limit.

Now, we can configure the fields we want to extract from the XML file. Drag the @department, id, name,sex, and salary to the Fields to extract pane.

Click Next.

6. Enter the name of the schema and update the data type, null ability and length as per requirements and click on Finish.

This is How we create metadata for XML files. However, we can not generate re usable metadata configuration for Multi schema XML files. In the next post, we will see How to create metadata for multi schema XML files using tFileInputMSXML component.

This article is written by +Vikram Takkar  and published on www.vikramtakkar.com, please let me know, if you see this article on any other website/blog.

No comments:

Post a Comment