Monday, 17 December 2012

Generating Metadata for CSV (Delimited) using Talend Open Studio


Generating Metadata for CSV (Delimited)

Metadata is defined as "data about data". It describes the data but isn't the data itself. In terms of Taled open Studio, metadata refers to reusable configurations that describe the data, its attributes, or its containers. For example, we could define metadata in the Studio that describes an XML schema, a web service definition, or an FTP connection.

Once  the metadata is defined, it can be used across multiple jobs. It also allows a single place to update metadata configurations for many jobs. For example, if the password to an FTP account changes and this FTP connection is used in 10 different jobs, the details would have to be updated 10 times. However, if you store this configuration in a single metadata component, it only needs to be updated once.

Let's work through an example of metadata configuration for CSV file.
Here is sample Input CSV file.


empid;empname;salary;dept_id
101;Anuj Mittal;10000;10
102;Vinay Gupta;15000;11
103;Akshay Arora;25000;12
104;Sharukh Khan;125000;13
105;Katrina Kaif;55000;11
106;Amir Khan;120000;12


1. Open Talend Open studio and Right click "File Delimited" under the "Metadata" section in the Repository pane. Click Create file delimited.
 
 

2. In the pop-up window, enter the name o metadata. You can also enter additional information in the Purpose and Description boxes. There are other configuration options for Version, Status, and Path, but these are not mandatory, so we will leave them for now. Click on Next button.

 

3.  Locate the file for which metadata needs to be created and click Next. You can also see the contents of the File in the File Viewer section.
 

4. Select Encoding to UTF-8. Also provide the Field Separator and Row separator characters. You can also select the Header and Footer rows to be skipped in Rows To Skip section.
  
 

5. Enter the name of the schema and update the data type, null ability and length as per requirements and click on Finish.
 
 

You can see the metadata in the Repository pane, Under the Metadata Section.
 


In the next post, I will show you, How to create metadata for single and multi schema XML files.

2 comments:

  1. Great article about metadata of a flat file :)

    With Talend Open Studio, you can generate metadata for a lot of format (csv, xls, xml, ldif) and for multiple databases sources.

    Please find more information about the metadata on https://help.talend.com/display/TalendOpenStudioforDataIntegrationUserGuide521EN/7.+Managing+Metadata (it's the official Talend Documentation)

    ReplyDelete
  2. Thanks Oliver for the information.

    ReplyDelete