Sqoop
Sqoop is a command-line interface application for transferring data between relational databases and Hadoop.
Sqoop supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables in Hive or HBase. Exports can be used to put data from Hadoop into a relational database. Sqoop got the name from "SQL-to-Hadoop".
Command Line Instructions
sqoop import \
--connect jdbc:mysql://localhost:3306/nseProd \
--username=qt \
--password=password \
--table=tradingDays \
--target-dir /mysql/nseProd \
--m 1
Sqoop uses the primary key to decide how many mappers to use, and for splitting the rows among mappers.
Instead of full tables Sqoop supports a query based importer:
--query 'select year, month, day from tradingDays where year=2016 and $CONDITIONS'
Can define jobs:
sqoop job \
--create myjob \
--import \
--connect jdbc:mysql://localhost:3306/nseProd \
--username=qt \
--password=password \
--table=tradingDays \
--target-dir /mysql/nseProd \
--m 1
Which can be incremental: --incremental lastmodified —check-column ts
Jobs can then be executed: sqoop job --exec myjob
No Comments