ETL DB Tech Solutions: 2014-10-12

Thursday, October 16, 2014

Collect performance data in informatica.

Collect performance data: -

It is check box option available in Session properties under Performance tab.

What is the importance of this option and in which situations we can use this.

Let’s say we have complex mapping with more than 3 active transformations (like Joiner, Router, Filter etc.,). ex:-

When mapping executed it is succeeded but no records are loaded into the Target. By checking the Source and Target statistics,Records are pulled from source but no records are loaded/ rejected. In this case how to identify which active transformation is filtering the data. To know this we need to see how the data is flowing between transformations.

Let’s check the option Collect performance data available under performance tab.

Run the session and go to workflow monitor and follow below steps. I will suggest do this before running the session, if the session takes less time to execute.

One can see a set of transformation used in the mapping like below.

Click on the + symbol to expand follow the below steps.

Let’s understand the graph with mapping.

As you can see in the above flow, from source to Filter 4 records has passed (can see @ Filter_inputrows = 4) and filter has passed all the 4 records (can see @ Filter_outputrows = 4).

Similarly 4 rows received by java transformation and it has returned 11 rows out of it (as java is splitting records). This way we can used this option to trace the data flow between transformations in the mapping.

Let’s see the below case.

We can see that 4 records pushed from source and after filter records are not coming out. This way we can go to specific transformation and find out the reason.

very useful information about Netezza

ERROR: to_timestamp(): bad value -26 for day of the month.

Solution:- Check the format defined.

Ex:- Select to_timestamp('2002-09-26^15:00:00','YYYY-MMDD^HH24:MI:SS')

Correction Select to_timestamp('2002-09-26^15:00:00','YYYY-MM-DD^HH24:MI:SS')

If you see the above function usage, the timestamp passing and defined don’t match (highlighted in different color)

ERROR:-

When using the to_timestamp conversion function, which takes an input and a template string, the number of spaces in both strings must match. The following example uses the character“^” to represent a space.

to_timestamp('2002-09-26^15:00:00','YYYY-MM-DD^HH24:MI:SS')

In the previous example, the conversion function works as expected. In the following example, there is an extra space in the template string:

To_timestamp('2002-09-26^15:00:00','YYYY-MM-DD^^HH24:MI:SS').

In this case, the extra space before the “HH” causes the function to ignore the “1” in the “15” string, thus returning an incorrect result.

Problem:- SQL operations on a float column can yield inconsistent results if not used properly.

Soulution:- The hashing of floating point data can result in certain SQL operations that return inconsistent results if not used correctly. Inconsistent results can occur if floating point columns are used as the distribution key, as a join column, or in group by, order by, and distinct select operations.

Do not use floating point columns for such operations.

Note: A floating point column is a column defined in the CREATE statement with a data type of FLOAT, DOUBLE, FLOAT4, or FLOAT8.

Information:- By default Neetzza system treats time values without timezones as GMT times. An example follows:

create table TIME1(TM timetz); (a time with timezone column)
insert into TIME1 values (‘1:00’); (a time value without a timezone)

select * from TIME1; (shows the value that has a timezone offset of (GMT)).

Information:-
Certain operations do not work on synonyms. For example, you cannot generate statistics on synonyms, nor can you truncate a synonym.

Performance :- For system efficiency, avoid the use of joins with external tables.
If one try to perform they may face the below error

Error: 1000000161: query does not support complex External Table scan

Information :- Update in Netezza with self join.

UPDATE product_dim B
SET    B.prod_nm = NAME
FROM   (SELECT product_id,
               NAME
        FROM   product_dim
               LEFT OUTER JOIN product_dis
                            ON product_id = productid)A
WHERE  A.product_id = B.product_id

Wednesday, October 15, 2014

Implementing java transformation with example

JAVA transformation:-

Explanation:-Java transformation in Informatica PowerCenter uses Java programming language to transform the data.It is not necessary to write the entire Java code or use an external Java development environment as the Java code can be entered as snippets in the PowerCenter Designer client.

The code in Java transformation can invoke Informatica's custom expressions, user-defined functions, unconnected transformations and mapping variables. Java methods, variables, third-party API's, built-in Java packages and static code can be invoked as well.
Java transformation can be re-usable and it can be defined as both active or passive Informatica object.

Scenario:-

Below is the requirement.

Source:-

NAME ID Requirement

RAVI 1 (no need to repeat as it ID is 1)

KUMAR 3 (repeat 3 times as it ID is 3)

John 4 (repeat 4 times as it ID is 4)

Required Out Put:-

Name ID

RAVI 1

KUMAR 3

John 4

Step1:- The final Mapping looks like below

When we create Java transformation it will ask us to specify (Active or Passive). For our current requirement lets select it as Active (as its changing the no. of input records coming from source).

Step2:- Create Ports in Java

In this section we need to create ports (for input and output) as per our requirement we can uncheck either of one port.

Step3:-
In this section we can Import third-party Java packages, built-in Java packages, or custom Java packages.(for this scenario we don’t need any).

Step4:- Write the code and compile it.

Let’s write the required code and compile it.

For this requirement we need to increment the value of ‘I’ till it is less than ‘PROD_ID’.

Step5:- Let’s run the workflow and check the result.

Target Output:-

Scenario 2:-

Source Data:-

ID NAME

1 NETEZZA, ORACLE

3 SQL Server, DB2, Teradata

Required Output:-

ID NAME

1 NETEZZA

1 ORACLE

3 SQL Server

3 DB2

3 Teradata

Follow the above steps in Scenario1 and replace the below code(in the code section) to achieve the above output.

String speech = PROD_NM;  
 String[] result = speech.split(",");
 for (int x=0; x<result.length; x++) {
PROD_NM = result[x];
PRD_ID = PRD_ID;
generateRow(); 
 }

Secenario 3:-

Source Data

ID NAME

1,2 NETEZZA,ORACLE

3,4,5 SQL Server, DB2, Teradata

Required Output:-

ID NAME

1 NETEZZA

2 ORACLE

3 SQL Server

4 DB2

5 Teradata

Follow the above steps in Scenario1 and replace the below code (in the code section) to achieve the above output.

String str_var=PROD_NM;
String str_var1=PRD_ID;
String[] arr;
String[] arr1;
String delimiter = ",";
arr = str_var.split(delimiter);
arr1 = str_var1.split(delimiter);
for (int i =0; i < arr.length; i++){
PROD_NM = arr[i];
PRD_ID = arr1[i];
generateRow();
}

Scenario:-

Source data

NETEZZA,ORACLE |1,2 |1|COBRAND

SQL Server, DB2, Teradata |3,4,5 |2|LENDING

NUMBER |8,9,10|3|LENDING

ANT,KANTI,RANTI |6 |4|LENDING

Expected Output:-

NETEZZA 1 1

ORACLE 2 1

DB2 4 2

SQL Server 3 2

Teradata 5 2

NUMBER 8 3

null 10 3

null 9 3

RANTI null 4

ANT 6 4

KANTI null 4

Solution:-

String str_var=PROD_NM;
String str_var1=PRD_ID;
String[] arr;
String[] arr1;
String delimiter = ",";
arr = str_var.split(delimiter);
arr1 = str_var1.split(delimiter);
int len = 0;
if(arr.length &lt;= arr1.length)
{
    len = arr.length;
}
else
{
    len = arr1.length;
}
for (int i = 0; i &lt; len; i++)
{
    PRD_ID  = arr1[i];
    PROD_NM = arr[i];
    ID = ID;
    generateRow();
}
if (arr.length &lt; arr1.length)
{
    for (int j = len; j &lt; arr1.length; j++)
    {
        PRD_ID  = arr1[j];
        PROD_NM = "null";
        ID = ID;
        generateRow();
    }
}
else
{
    for (int k = len; k &lt; arr.length; k++)
    {
        PROD_NM = arr[k];
        PRD_ID  = "null";
        ID  = ID;
        generateRow();
    }
    }                      {
    for (int k = len; k &lt; arr.length; k++)
    {
        PROD_NM = arr[k];
        PRD_ID = "null";
        generateRow();
    }
}

Tuesday, October 14, 2014

Informatica Scenario split source records and loading into 3 targets (based on some logic)

Informatica Scenario:- Let’s say I have 200 record in source table and I have 3 Target table A,B,C. I have to insert first 1 to 10 records in 'A' , then 11 to 20 in 'B' and 21 to 30 in C. Then again from 31 to 40 in A, 41 to 50 in B and 51 to 60 in C. So on up to last record.

Solution:- In the above requirement we need to Informatica and load it into 3 targets
(each group with 10 Records).The below is the mapping. Let’s see in steps.

STEP1:- First let’s see how to create sequence and reset it, after it reaches to 30 count (because of the above Requirement 10 records as a set for 3 targets).

As you can see this can be done in one Variable port. Let’s see how this works .


IIF(v_SEQ=30.

(this means the variable port reaches to 30),then set the variable port back to 1, else v_SEQ+1  increment the variable).
Lets parameter the value ‘30’ to $$SPLT_LIMT. And go to Mappings --> then Parameter and variable
click it and declare the variable/parameter there.



Why we have to use Mapping Parameters:-
what if tomorrow requirement changes to (load 1st records to target A, Second records to target B and finally
3 records to target C). In this case if we don't use mapping parameters we have to modify mapping.  
In this case Just change in Param File:-  Assign 3 to the variable $$SPLT_LIMT in the parameter file,

no need to change anything in the mapping.

Step3:-
Create 3 groups and divide the records as per the requirement in RTR (Router
Transformation).



As we did for Expression transformation we can parameterize the router Group conditions as follows.

o_SEQ >0 and o_SEQ <11 (change the hard coded values 0 and 11)
 TO  
o_SEQ> $$GRP1_LOW_LMIT AND  o_SEQ <$$GRP1_UP_LMIT

(declare this in Mapping parameter and variables and add it to your parameter file).

Similarly do it for rest of the groups as well.So let’s run our mapping and see the output.


Total SRC records 201 divided equally (10 records per set).
The Parameter file Looks something like this.

SAMP_MAPP.pram
[GLOBAL]


$$GRP1_LOW_LMIT=0
$$GRP1_UP_LMIT=11
$$GRP1_LOW_LMIT=10
$$GRP1_UP_LMIT=21
$$GRP1_LOW_LMIT=20
$$GRP1_UP_LMIT=31
$$SPLT_LIMT=30
Based on your requirement change the values in the parameter file.
This way we created a generic mapping. (the mapping needs to change only in case if you want to add one more target).

Working With JAVA Transformation:-

http://netezzamigration.blogspot.com/2014/10/implementing-java-transformation-with.html

Sunday, October 12, 2014

Repeat Records multiple times based on some column

Oracle Query to Repeat a Number n Times

below are the scenarios using oracle hierarchy query's (ex:-

START WITH and CONNECT BY, Level etc.,)

Scenario:-
TABLE TBl

VALUE	CNT
foo	1
bar	3
baz	2

Expected Output:-

VALUE	CNT	NDX
foo	1	1
baz	2	1
baz	2	2
bar	3	1
bar	3	2
bar	3	3

For Netezza:- use the below method

WITH tbl_populate(value, cnt, ndx)
     AS (SELECT value,
                cnt,
                1
         FROM   tbl
         UNION ALL
         SELECT t.value,
                t.cnt,
                tp.ndx + 1
         FROM   tbl t
                JOIN tbl_populate tp
                  ON tp.value = t.value
                     AND tp.ndx + 1 <= t.cnt)
SELECT *
FROM   tbl_populate
ORDER  BY cnt, ndx

Implementing using Oracle:-
SELECT value,
       cnt
FROM   tbl p,
       (SELECT ROWNUM repeat
        FROM   dual
        CONNECT BY LEVEL <= (SELECT Max(cnt)
                             FROM   tbl))r
WHERE  p.tbl >= r.repeat;

Scenario:- I have name like INDIA in a column. I want display 1st line A,

2nd line N, 3rd line D, 4th line I, 5th line A in oracle data 
base?

Solution:-

SELECT Substr('INDIA', LEVEL, 1) AS INDIA 
FROM   dual 
CONNECT BY LEVEL <= Length('INDIA');

Scenario:-

Source data

ID     NAME

101    X,Y,Z

102  XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG

Desired output:-

101 X

101 Y

101 Z

102 XXX

102 Y

102 ZZ

102 AAAAA

102 B

102 CCC

102 D                        till the end.....

Oracle Solution 1:-

SELECT   NAME , 
       regexp_substr(NAME, '[^,]+', 1, LEVEL) AS single_element  , 
       LEVEL                                 AS element_no 
FROM   ( 
SELECT ROWNUM AS id1  , 
       NAME
FROM   TABLE )  
       CONNECT BY instr(NAME, ',', 1, LEVEL-1) > 0 
AND    id1 = PRIOR id 
AND    PRIOR dbms_random.value IS NOT NULL;

Oracle Solution 2:-

SELECT NAME AS original_string  , 
       extractvalue( xt.column_value, 'element' ) AS single_element 
FROM   t , 
       TABLE( xmlsequence(  extract( xmltype(  '' 
              || replace( t.str, ',', '' ) 
              ||  ''  ),  '/coll/element'  )  )  ) xt;

Subscribe to: Posts ( Atom )