Monday, 7 January 2013

Database Lookup using tFlowtoItrerate component in Talend Open Studio

In continuation to my previous post here on tFlowtoIterate component, I am going to demonstrate another example of tFlowtoIterate component to perform the Dynamic Database lookup.

Today I am going to demonstrate a job where we read from file having list of department ids and corresponding to these department ids, we will lookup department name from departments table.

First, look at the input file which contains list of department ids and database table departments which contains department id and name. This file will be used as main input file and table will be used for looking up department names corresponding to the department ids in the file.

Input File

Departments Table

Select * from departments

Now our aim is to read the input file and perform the dynamic lookup from departments table to retrieve department name and display in Run log. To achieve this create and new job and perform following steps.

1. Drag tFileInputDelimited component to Job designer pane from Palette pane.

2. Open the Component properties of the tFileInputDelimited component and provide the filepath of our input file having department ids.

Our input file contains header as well. Hence, enter the header row number in the header textbox.

3. Now drag tDenormalize component from Palette pane to Job designer pane. Denormalize helps synthesize the input flow. It is basically use to group the records to single row. In our example we will receive the input as records and tDenormalize will transform following records to a single row: Refer example below:

Input Records from tFileInputDelimited

Output from tDenormalize


4. Right click tFileInputDelimited component, select Row > Main and connect it to tDenormalize.
5. Now, we need to configure the tDenormalize component. Open the component properties and click on  to add the column on which we want to denormalize. In the Column field enter “department_id” and comma,” as delimiter (we can use any character as delimiter but in our example we will pass these values to IN operator in the select query to perform dynamic lookup. Hence we are using comma as delimiter.

Also, if you want to merge the same value from the input, check the checkbox “Merge same value”. In our example we are receiving  department_id 14 twice and with this option checked, tDenormalize component will pass the department_id 14 only once.

6. Search for iterate in the Palette window and drop a tFlowToIterate component onto the Job Designer.

7. Create a Row > Main connector between the denormalize and the flow to iterate components

The iterate connection has the effect of invoking the database extract once for every row produced by the denormalize component. In our job, as we are turning a single column of data into a single row of data, we know that the database component will be run only once.

8. From the Metadata section in the Repository window, drop the table metadata of the table which is going to be used in the Dynamic lookup operation on to the Job Designer, selecting tMySQLInput from the resulting pop-up window. In our example, we are going to use departments table to lookup department name from the table.

Click here, for more information on “How to create database/table metadata in Talend”.

9. Right click tFlowtoIterate component and select Row > Iterate connector and connect it to tMySQLInput component. This will help us to pass the data to input database component.

The database component does not take a regular main input. You can validate this by right-clicking on the denormalize component, selecting Row > Main, and dropping the connector onto the database component.

As you can see, Talend will not allow this action to happen. The database component will, however, accept an iterate connector and we can use the tFlowToIterate component that turns a flow input into an iteration as a vehicle for passing the dynamic data through, as a variable.

10. Open the component properties of tMySQLInput component and click on edit query button and enter the following query:

FROM `departments` where `departments`.`departmentId` in ("+row2.department_id+")"

This will pass the value of row2.department_id from tFlowtoIterate to the query and help to retrieve only those values from database. As I mentioned earlier, tFlowtoIterate will recieve data from tDenormalize component i.e. single record having comma seperated department ids (“10,11,12,13,14,15,16”).

10. From the Palette window, search for log and drop a tLogRow component onto the designer.

11. Right-click on the database component, select Row > Main,and drop the connector onto the tLogRow component. The output of the database component will now display on the tLogRow component when the job runs.

12. This completes our Job. It time to run and check the output in Run console.

From the above screenshot of output, you can see that all departments corresponding to department ids has been populated. Also, notice that department id 17 is not populated in the output as this was not present in the departments table.

You may also like to read..

No comments:

Post a Comment