Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394), страница 83
Текст из файла (страница 83)
If you don’t select a tool, Sqoop doesnot know what to do. help is the name of one such tool; it can print out the list ofavailable tools, like this:% sqoop helpusage: sqoop COMMAND [ARGS]Available commands:codegencreate-hive-tableevalexporthelpimportimport-all-tablesjoblist-databaseslist-tablesmergemetastoreversion402|Chapter 15: SqoopGenerate code to interact with database recordsImport a table definition into HiveEvaluate a SQL statement and display the resultsExport an HDFS directory to a database tableList available commandsImport a table from a database to HDFSImport tables from a database to HDFSWork with saved jobsList available databases on a serverList available tables in a databaseMerge results of incremental importsRun a standalone Sqoop metastoreDisplay version informationSee 'sqoop help COMMAND' for information on a specific command.As it explains, the help tool can also provide specific usage instructions on a particulartool when you provide that tool’s name as an argument:% sqoop help importusage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]Common arguments:--connect <jdbc-uri>--driver <class-name>--hadoop-home <dir>--help-P--password <password>--username <username>--verbose...Specify JDBC connect stringManually specify JDBC driver class to useOverride $HADOOP_HOMEPrint usage instructionsRead password from consoleSet authentication passwordSet authentication usernamePrint more information while workingAn alternate way of running a Sqoop tool is to use a tool-specific script.
This script willbe named sqoop-toolname (e.g., sqoop-help, sqoop-import, etc.). Running these scriptsfrom the command line is identical to running sqoop help or sqoop import.Sqoop ConnectorsSqoop has an extension framework that makes it possible to import data from—andexport data to—any external storage system that has bulk data transfer capabilities. ASqoop connector is a modular component that uses this framework to enable Sqoopimports and exports.
Sqoop ships with connectors for working with a range of populardatabases, including MySQL, PostgreSQL, Oracle, SQL Server, DB2, and Netezza. Thereis also a generic JDBC connector for connecting to any database that supports Java’sJDBC protocol.
Sqoop provides optimized MySQL, PostgreSQL, Oracle, and Netezzaconnectors that use database-specific APIs to perform bulk transfers more efficiently(this is discussed more in “Direct-Mode Imports” on page 411).As well as the built-in Sqoop connectors, various third-party connectors are availablefor data stores, ranging from enterprise data warehouses (such as Teradata) to NoSQLstores (such as Couchbase). These connectors must be downloaded separately and canbe added to an existing Sqoop installation by following the instructions that come withthe connector.A Sample ImportAfter you install Sqoop, you can use it to import data to Hadoop. For the examples inthis chapter, we’ll use MySQL, which is easy to use and available for a large number ofplatforms.Sqoop Connectors|403To install and configure MySQL, follow the online documentation.
Chapter 2 (“Instal‐ling and Upgrading MySQL”) in particular should help. Users of Debian-based Linuxsystems (e.g., Ubuntu) can type sudo apt-get install mysql-client mysqlserver. Red Hat users can type sudo yum install mysql mysql-server.Now that MySQL is installed, let’s log in and create a database (Example 15-1).Example 15-1. Creating a new MySQL database schema% mysql -u root -pEnter password:Welcome to the MySQL monitor. Commands end with ; or \g.Your MySQL connection id is 235Server version: 5.6.21 MySQL Community Server (GPL)Type 'help;' or '\h' for help. Type '\c' to clear the current inputstatement.mysql> CREATE DATABASE hadoopguide;Query OK, 1 row affected (0.00 sec)mysql> GRANT ALL PRIVILEGES ON hadoopguide.* TO ''@'localhost';Query OK, 0 rows affected (0.00 sec)mysql> quit;ByeThe password prompt shown in this example asks for your root user password.
This islikely the same as the password for the root shell login. If you are running Ubuntu oranother variant of Linux where root cannot log in directly, enter the password you pickedat MySQL installation time. (If you didn’t set a password, then just press Return.)In this session, we created a new database schema called hadoopguide, which we’ll usethroughout this chapter. We then allowed any local user to view and modify the contentsof the hadoopguide schema, and closed our session.1Now let’s log back into the database (do this as yourself this time, not as root) and createa table to import into HDFS (Example 15-2).Example 15-2. Populating the database% mysql hadoopguideWelcome to the MySQL monitor.
Commands end with ; or \g.Your MySQL connection id is 257Server version: 5.6.21 MySQL Community Server (GPL)1. Of course, in a production deployment we’d need to be much more careful about access control, but thisserves for demonstration purposes. The grant privilege shown in the example also assumes you’re running apseudodistributed Hadoop instance. If you’re working with a distributed Hadoop cluster, you’d need to enableremote access by at least one user, whose account would be used to perform imports and exports via Sqoop.404|Chapter 15: SqoopType 'help;' or '\h' for help.
Type '\c' to clear the current input statement.mysql> CREATE TABLE widgets(id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,-> widget_name VARCHAR(64) NOT NULL,-> price DECIMAL(10,2),-> design_date DATE,-> version INT,-> design_comment VARCHAR(100));Query OK, 0 rows affected (0.00 sec)mysql> INSERT INTO widgets VALUES (NULL, 'sprocket', 0.25, '2010-02-10',-> 1, 'Connects two gizmos');Query OK, 1 row affected (0.00 sec)mysql> INSERT INTO widgets VALUES (NULL, 'gizmo', 4.00, '2009-11-30', 4,-> NULL);Query OK, 1 row affected (0.00 sec)mysql> INSERT INTO widgets VALUES (NULL, 'gadget', 99.99, '1983-08-13',-> 13, 'Our flagship product');Query OK, 1 row affected (0.00 sec)mysql> quit;In this listing, we created a new table called widgets.
We’ll be using this fictional productdatabase in further examples in this chapter. The widgets table contains several fieldsrepresenting a variety of data types.Before going any further, you need to download the JDBC driver JAR file for MySQL(Connector/J) and add it to Sqoop’s classpath, which is simply achieved by placing it inSqoop’s lib directory.Now let’s use Sqoop to import this table into HDFS:% sqoop import --connect jdbc:mysql://localhost/hadoopguide \> --table widgets -m 1...14/10/28 21:36:23 INFO tool.CodeGenTool: Beginning code generation...14/10/28 21:36:28 INFO mapreduce.Job: Running job: job_1413746845532_000814/10/28 21:36:35 INFO mapreduce.Job: Job job_1413746845532_0008 running inuber mode : false14/10/28 21:36:35 INFO mapreduce.Job: map 0% reduce 0%14/10/28 21:36:41 INFO mapreduce.Job: map 100% reduce 0%14/10/28 21:36:41 INFO mapreduce.Job: Job job_1413746845532_0008 completedsuccessfully...14/10/28 21:36:41 INFO mapreduce.ImportJobBase: Retrieved 3 records.Sqoop’s import tool will run a MapReduce job that connects to the MySQL databaseand reads the table.
By default, this will use four map tasks in parallel to speed up theA Sample Import|405import process. Each task will write its imported results to a different file, but all in acommon directory. Because we knew that we had only three rows to import in thisexample, we specified that Sqoop should use a single map task (-m 1) so we get a singlefile in HDFS.We can inspect this file’s contents like so:% hadoop fs -cat widgets/part-m-000001,sprocket,0.25,2010-02-10,1,Connects two gizmos2,gizmo,4.00,2009-11-30,4,null3,gadget,99.99,1983-08-13,13,Our flagship productThe connect string (jdbc:mysql://localhost/hadoopguide) shownin the example will read from a database on the local machine. If adistributed Hadoop cluster is being used, localhost should not bespecified in the connect string, because map tasks not running on thesame machine as the database will fail to connect. Even if Sqoop isrun from the same host as the database sever, the full hostname shouldbe specified.By default, Sqoop will generate comma-delimited text files for our imported data.
De‐limiters can be specified explicitly, as well as field enclosing and escape characters, toallow the presence of delimiters in the field contents. The command-line argumentsthat specify delimiter characters, file formats, compression, and more fine-grained con‐trol of the import process are described in the Sqoop User Guide distributed with Sqoop,2as well as in the online help (sqoop help import, or man sqoop-import in CDH).Text and Binary File FormatsSqoop is capable of importing into a few different file formats. Text files (the default)offer a human-readable representation of data, platform independence, and the simpleststructure.