Last Updated: August 22, 2016
·
16.08K
· vidyasagar

Sqoop - Data transfer tool from RDBMS to Hadoop Box

TO IMPORT & EXPORT DATA FROM RDBMS (MYSQL,ORACLE, etc) INTO HDFS / HIVE / HBASE

Pre-requisite

  1. Apache Hadoop
  2. Apache Sqoop (compatible with Hadoop version)
  3. Apache Hive (optional)
  4. Apache HBase (optional)
  5. Apache HCatalog (optional)
  6. JDBC/ODBC connector

For all RDBMS, Connection URL changes and remaining all command line arguments remains same. You need to download specific JDBC/ODBC connector JAR and copy it to $SQOOP_HOME/lib


MySQL


Download mysql-connector-java.jar and place in $SQOOP_HOME/lib folder

cp mysql-connector-java-5.1.18-bin.jar /usr/local/hadoop/sqoop-1.4.3-cdh4.4.0/lib/

Expecting you have data in MySQL tables.

Retrieving list of Databases available in MySQL from SQOOP

sqoop list-databases --connect jdbc:mysql://localhost:3306/  --username root -P

MySQL to HDFS Import

Have Primary key:

sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /path/to/directoryName

No Primary key:

sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /path/to/directoryName  -m 1

MySQL to Hive Import

Have Primary key:

sqoop-import  --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName  --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home

No Primary key:

sqoop-import  --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName  --hive-table tableName --create-hive-table --hive-import --hive-home  path/to/hive_home -m 1

MySQL to HBase Import

Have Import All columns:

sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName  --column-family hbase_table_col1 --hbase-create-table

HBase import few columns

sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col1 --hbase-create-table

To HBase with Primary key:

sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-row-key column1 –hbase-create-table

To Hbase with no primary key:

sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col --hbase-row-key column1 --hbase-create-table

Export from HDFS to MySQL:

Same for all Hive/HBase/HDFS: Because Hive tables are nothing but directories in HDFS. So you're just exporting a directory to MySQL

sqoop export --connect jdbc:mysql://localhost:3306/test_db --table tableName  --export-dir /user/hive/warehouse/tableName --username root --password password -m 1 --input-fields-terminated-by '\001'

SQL Server


Connection URL:

sqoop import --connect 'jdbc:sqlserver://<IP(or)hostname>;username=dbuser;password=dbpasswd;database=<DB>' --table <table> --target-dir /path/to/hdfs/dir --split-by <KEY> -m 1

Download Connector from Microsoft website

http://www.microsoft.com/en-us/download/confirmation.aspx?id=11774

Place it in $SQOOP_HOME/lib


Oracle


Connection URL:

sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=myhost)(port=1521))(connect_data=(service_name=myservice)))" \
--username USER --table SCHEMA.TABLE_NAME --hive-import --hive-table SCHEMA.TABLE_NAME \
--num-mappers 1 --verbose -P \

IBM DB2


Download the DB2Driver and place it in $SQOOP_HOME/lib

sqoop import --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2.my.com:50000/testdb --username db2user --db2pwd --table db2tbl --split-by tbl_primarykey --target-dir sqoopimports

sqoop export --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2.my.com:50000/myDB --username db2user --password db2pwd --table db2tbl --export-dir /sqoop/dataFile.csv

Different Connection Strings for Different RDBMS


Database version --direct support? connect string matches

HSQLDB 1.8.0+ No jdbc:hsqldb:*//

MySQL 5.0+ Yes jdbc:mysql://

Oracle 10.2.0+ No jdbc:oracle:*//

PostgreSQL 8.3+ Yes (import only) jdbc:postgresql://

3 Responses
Add your response

Hi ,
please any one can give me solution for this issue,
What is my problem is , I am using the VMWare CDH4 ubuntu 10.10 version. so i have to move the relational data to hadoop. For this first of all we have need to setup the mysql and hadoop and sqoop tool, So in cloudra vmware can have the setup for hadoop and sqoop but mysql server is not yet have. For that i am trying to setup and installing the mysql server into ubuntu10.10 version in vmware cloudera(CDH4) but the mysql server is not able to installing in my ubuntu system now what can i have to do.
I am using the sudo apt-get install mysql-server commend
it will be throws the 404 Not Found IP error

Please any one can help to me,

Thanks in advance .

over 1 year ago ·

Hello Satish,

The problem is with your VM, May be it may not have internet connection. So Please check whether VM is connecting to internet or not.

Thanks,
Vidya

over 1 year ago ·

Hi Vidhya, small clarification needed in the below queries.
1. i have a 4 tables in the RDBMS, but i would like to load the data in to single hbase table and each table ll load as column family in the hbase table.
2. As far as my knowledge we cannot load multiple column family in one sqoop job. what could be the alternate solution rather than load one by one sqoop job as my dataset is too huge.

over 1 year ago ·