Wednesday 29 October 2014

Calling SQL Stored procedure from R


The Following lines in R calls a stored  procedure in SQL SEREVR 

library(RODBC)
conn <- odbcDriverConnect('driver={SQL Server};server=HostName;database=DatabaseName;uid=useName;pwd=Password')
query <- paste("exec  dbo.R_getData ");
res<-sqlQuery(conn, query);

Monday 30 June 2014

Enable oozie workflow for new mapReduce API

If you are using new MapReduce API, and wants to execute implement DAG with oozie workflow, then you
may face the below exception :

java.lang.RuntimeException: Error in configuring object
 at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.jav
Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 9 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: class TestMapper not org.apache.hadoop.mapred.Mapper
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:899)
 
Caused by: java.lang.RuntimeException: class TestMapper not org.apache.hadoop.mapred.Mapper
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:893)
 ... 16 more

To solve the above issue, we need to have two properties in workflow.xml,

<property>
    <name>mapred.reducer.new-api</name>
    <value>true</value>
  </property>
  <property>
    <name>mapred.mapper.new-api</name>
    <value>true</value>
</property>


now replace the workflow.xml in HDFS with updated one, the issue will be resolved.....



Thursday 15 May 2014

Accesing HBase Data using Hive Query Language ( with Probable Exceptions)


1. create a table in HBase

hbase(main):001:0> create 'hbaseTable','cf1'
0 row(s) in 1.4830 seconds

2. insert data into table

hbase(main):002:0> put 'hbaseTable','row1','cf1:name','giri'
0 row(s) in 0.0800 seconds

hbase(main):003:0> put 'hbaseTable','row2','cf1:name','Anamika'

0 row(s) in 0.0070 seconds

3. scan the table data

hbase(main):004:0> scan 'hbaseTable'
ROW                   COLUMN+CELL                                               
 row1                 column=cf1:name, timestamp=1400133482419, value=giri      
 row2                 column=cf1:name, timestamp=1400133502249, value=Anamika   
2 row(s) in 0.0360 seconds

4. Now we need to add the below jar files to hive.

guava-11.0.2.jar,
hive-hbase-handler-0.10.0.24.jar,  
hbase-0.94.5.jar, 
zookeeper-3.4.5.23.jar

we have number of ways to do this,

one way is to add jar files to

export HIVE_AUX_JARS_PATH=/usr/lib/guava-11.0.2.jar:/usr/lib/hive-hbase-handler-0.10.0.24.jar/ ...... remaining jars

other way directly add jars in the hive console,

hive> add jar /usr/lib/hbase/lib/guava-11.0.2.jar;
Added /usr/lib/hbase/lib/guava-11.0.2.jar to class path
Added resource: /usr/lib/hbase/lib/guava-11.0.2.jar

..... add remaining jars also.

now create hive table using the below syntax:

hive> CREATE TABLE hiveTable(key int, name string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name")
TBLPROPERTIES ("hbase.table.name" = "hbaseTable");

now you can use hiveql language to query HBase data.

Troubleshooting:

You may get the below exception :

java.lang.ClassNotFoundException: org.apache.hadoop.hbase.MasterNotRunningException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
… 21 more 

or you may get any zookeeper related issues.

you can resolve the above issues by setting the below 2 properties in hive prompt;


set hbase.zookeeper.quorum=your zookeeper nodes;

set zookeeper.znode.parent=hbase-unsecure; 






Thursday 20 February 2014

Tracking All The Background Process in a shell script

Let's assume we have a scenario to write a shell script, where need to execute some statements  in parallel, after finishing those need to execute few more statements.

for example if i have a shell script,

cmd1 &
cmd2 &
cmd3 

i want to execute cmd3 after finishing cmd1 and cmd2.

This scenario may look simple, but it is not.

When i run statements in parallel using & operator, then that statement will be executed in background, so the other set of statements will be executed immediately, these will not wait for background processes.

i have solved this problem with the below script,

i am running one infinite loop, there i am tracking all the background processes, if all the background processes are done, then only i am running the final statements.

cmd1 &
cmd2 &
while true
do
if [ `jobs | grep Running | wc -l` -eq 0 ]; then
cmd 3
break;
fi;
done















Wednesday 19 February 2014

Checking Oracle Connection Status Using Shell Script

We can check the oracle status in a number of ways using shell script.

I am mentioning one of the ways to check the connectivity,

Whenever we connect to oracle database successfully, then it will produce success message which contains "Connected to" as sub-string, i am checking for this string using grep command


  if sqlplus schemaname/password@databasename < /dev/null | grep 'Connected to'; then

      echo "Database Connection is OK .......Starting Export Process ...."

 else
  
      echo "Database Connection is not successful .." 

      exit;



Monday 17 February 2014

Setting up Passwordless SSH

SSH is often used to login from one system to another without requiring passwords. This will be required when you run a cluster, it will not ask you for the password again and again.


 steps:

1. First log in on Host1 with user user1 and generate a pair of authentication keys. 
    Do not enter a passphrase:

    user1@Host1:~> ssh-keygen –t rsa


          2. Now use ssh to create a directory ~/.ssh as user user2 on Host2 .
                ( if directory already exists, then no issues with that)

     user1@Host1:~>ssh user2@Host2 mkdir –p .ssh

     user2@Host2’s password:

3. Finally append Host1's new public key to user2@Host2:.ssh/authorized_keys and 
     enter Host2's password one last time:

     user1@Host1:~>cat /home/hadoop/.ssh/id_rsa.pub | ssh user2@Host2 'cat >>         .ssh/authorized_keys'

    hadoop@Host2’s password:

    now you can login to Host2 using user2 from Host1 machine

    user1@Host1:~>ssh user2@Host2 hostname

   NOTE: If you face any issue while logging in, please make the below changes:
    •    Put the public key in .ssh/authorized_keys2
    •     Change the permissions of .ssh to 700
    •      Change the permissions of .ssh/authorized_keys2 to 640