I use sqoop to import data from mysql to hadoop in csv form, it works well when use table argument. However, when I use query argument, it can only import the first column, the other columns are missed.
Here you are my command.
sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/sqoop \
--username root \
--password root \
--query ' select age, job from person where $CONDITIONS ' \
--bindir /tmp/sqoop-hduser/compile \
--fields-terminated-by ',' \
--target-dir /Users/hduser/hadoop_data/onedaydata -m1
In the csv file, it shows only the age. Does anyone know how to solve it? Thanks
Read this documentation from sqoop User Guide, When you use $condition you must specift the splitting column.
Sqoop can also import the result set of an arbitrary SQL query. Instead of using the --table, --columns and --where arguments, you can specify a SQL statement with the --query argument. When importing a free-form query, you must specify a destination directory with --target-dir.
If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop.
Your query must include the token $CONDITIONS which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by.
For example:
$ sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
--split-by a.id --target-dir /user/foo/joinresults
Alternately, the query can be executed once and imported serially, by specifying a single map task with -m 1:
$ sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
-m 1 --target-dir /user/foo/joinresults
Try this:
sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/sqoop \
--username root \
--password root \
**--columns "First_Column" \**
--bindir /tmp/sqoop-hduser/compile \
--fields-terminated-by ',' \
--target-dir /Users/hduser/hadoop_data/onedaydata -m1
Whenever you are using --query
parameter, you need to specify the --split-by
parameter with the column that should be used for slicing your data into multiple parallel tasks. The another required parameter is --target-dir
, which specifies the directory on HDFS where your data should be stored.
Solution: Try to include --split-by
argument to your sqoop command and see if the error is resolved.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.