Posts

Showing posts from February, 2016

IMPORT RDBMS TABLE IN HDFS AS ORC FILE

Sqoop support only few file format(text, sequence, avro..etc), And If you want to store RDBMS data in HDFS in ORC(which is very compressed and fast file format as facebook said & used) you need to do this task in 2 steps. 1st import RDBMS data as text file and then Insert that data in ORC formatted table. (NOTE: We can do this using spark also). Here I am explaining how to do this using sqoop betch. I am using cdh5.4.0-hadoop-2.6, chd5.4.0-hive, apache-sqoop1.4.2 Hope you have all installed, you can do this with apache hadoop, hive also but some time it gives error because of version dependency. As per my knowledge If you can metch perfect hadoop& hive version then you'll not get any kind of error, other wise you have to face many error since apache foundation continuously improving every tools. If you are not sure then best to go with CHD, you can download tall file & install saperately. http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.4.0.tar.gz http://archiv