Friday 13 October 2017

Lookup transformation in SSIS

Look up Transformation is very useful transformation SSIS component it performs lookup operation by connecting input value with data-table or table dataset columns. It compares source data with existing table dataset and filters matching ones and un-matching ones.
It Performs lookup by joining data input column with a column in the reference dataset. In another word, we can say that this transformation takes a record from source and match this record with the reference data. In this case, there are two possibilities.
Ø  Matched records
Ø  Unmatched records
Lookup transformation has two outputs. Matched records and unmatched records. This performs Equijoins between the source and reference dataset.



Here I am using [AdventureWorks2012_CS] database.

Let’s see the example of Lookup
Here I am taking the Address table. In this table, address are a store. This is the master table. Every week we are getting data from a different vendor which contain the address of the customer. Our requirement is to load the data in the master table. Source data source has the information new customer as well as the existing customer. Our job is if the if the customer is existing then need to update the records in the master table and the new customer then needs to redirect to another table for review.


See the example
Open SSDT
  

Double click on the Dataflow task.
In data flow, I am taking OLEDB source. My source is a table which is in another database. See below.
  

This table has 999 records.


Configure this source.
Now I am taking lookup transformation.
  

Now I am configuring the lookup transformation.
Double click on the lookup transformation.
  



 First, we will discuss Cache Mode.
 There is three type of cache mode in Lookup transformation.
Ø  Full cache
Ø  Partial cache
Ø  No-cache

Full cache

This is the most commonly used approach in Lookup Transformation. If we selected this option then, entire lookup (or reference) table will be preloaded into the cache (Memory) and Lookup Transformation will perform a lookup from the Memory instead of Dataset. This works well when we have less number of rows in the lookup table. This is fast.

Partial cache

If we selected this option then, Lookup Transformation starts the transformation with an empty cache. When a new row comes from the data flow, the Lookup Transformation first checks for the matching values in its cache. If it is not found in the cache then it will check in the lookup table. And if no match is found, it queries the lookup table. If the match is found in the lookup table then, the value will be cached (stored in the memory) for the next time. If the lookup table is very big then we can use this approach. We can set the size of the partial cache.

No cache

 If we selected this option then, Lookup Transformation will not use cache to store the lookup table at any stage. When a new row comes from the data flow, the Lookup Transformation directly check in the lookup table for the matching values.

Connection Type

There is two type of connection manager in the look up transformation.
Ø  OLEDB connection manager
Ø  Cache connection manager

OLEDB connection manager

The lookup transformation support below database providers for the OLE DB connection manager:
Ø  SQL Server
Ø  Oracle
Ø  DB2

Cache connection manager

You are the thing that if our reference dataset set is not in SQL Server, Oracle or DB2 then how we can use lookup transformation because apart from these as we discuss able lookup oledb not support. No worries we can achieve this by using Cache connection manager.
An in-memory cache store the reference dataset and stores a hash table that indexes the data. The cache remains in memory until the execution of the package is completed. We can persist the cache to a cache file (.caw).
When we persist the cache to a file, the system loads the cache faster. This improves the performance of the Lookup transformation and the package. Remember, that when we use a cache file, we are working with data that is not as current as the data in the database.

Specify how to handle rows with no matching entries

Here we need to select the action if the records are not matched with reference dataset. Below are the actions.

Ignore Failure

If we selected this option then, when a new row is passed from the data, flow Lookup Transformation will continue its processing even though it doesn’t find any matching row in the lookup table.

Fail component

If we selected this option then, when a new row is passed from the data, flow Lookup Transformation will fail the package if it doesn’t find any matching row in the lookup table.

Redirect rows to error output

 If we selected this option then, Lookup Transformation will direct the rows that don’t find any matching row in the lookup table to standard error output.

Redirect rows to no match output

If we selected this option then, Lookup Transformation will direct the rows that don't find any matching row in the lookup table to No Match In real-time we mostly use this option.
We will see all details in next post.
Here I am selecting full cache and redirect to no match output.


Now select connection table.
  
Now configuring the connection of OLEDB connection for referring table.
Now select the columns tab.

Here we need to select the column. Based on the column which we won't look up. Here I lookup based on the AddressID. If the Address is matched then we consider that the record is matched. If addressed not in the reference dataset that treated as not match record.
When we are selecting partial catch then we need to select the Advance tab.
  
Here we can set cache size.
Error Output tab we can set error direction.
  

Now click ok.
Now I am taking two multicast transformations.
To see the output I am taking data viewer.
  
Now the package is ready to run.
Before running the package see the data in the database.

The number of records in the source table.
  
The number of matched records in source and reference dataset.
Now I am executing this package.


If we select Fail component (Specify how to handle rows with no matching entries)

If we select Fail component Specify how to handle rows with no matching entries. See below

Now I am executing the package.
Let’s see the output.

Package got failed because If we selected this option then when a new row is passed from the data, flow Lookup Transformation will fail the package if it doesn’t find any matching row in the lookup table.
We can see the error in progress tab

Below is the log.

If we select Ignore Failure (Specify how to handle rows with no matching entries)

If we select ignore failure Specify how to handle rows with no matching entries. See below

Let’s execute the package.
  
The package is not failed all records are redirected to matched records. If we selected this option then, when a new row is passed from the data, flow Lookup Transformation will continue its processing even though it doesn’t find any matching row in the lookup table.

If we select Redirect rows to no match output (Specify how to handle rows with no matching entries)

If we select Redirect rows to no match output, specify how to handle rows with no matching entries. See below

Now I am expecting the package.
   
The package is executed successfully and returns the matched record.
If we selected this option then, Lookup Transformation will direct the rows that don’t find any matching row in the lookup table to standard error output. If no records matched in the reference table then it will throw an error.
Let’s see the example.
I have an address_source table in Test Database.
See records in the database.
  
Matched records in the database.
  
Now I am doing the mapping with the new table.
In OLEDB source I have made the changes.
  
Now I configure the lookup transformation with
  
Now I am taking a multicast and connect error output.

Now I am executing the package.
Let’s see the result.


Hope this is helpful to understand the Lookup transformation. Please leave your feedback in the comment box.

In next post, I will share more details about cache mode.

5 comments:

If you have any doubt, please let me know.

Popular Posts