简体   繁体   English

tsv格式文件的读取配置单元中的架构

[英]Schema on read in hive for tsv format file

I am new on hadoop. 我是hadoop的新手。 I have data in tsv format with 50 columns and I need to store the data into hive. 我有50列的tsv格式的数据,我需要将数据存储到配置单元中。 How can I create and load the data into table on the fly without manually creating table using create table statementa using schema on read? 如何在不使用读取模式使用create table statementa手动创建表的情况下快速创建数据并将其加载到表中?

Hive requires you to run a CREATE TABLE statement because the Hive metastore must be updated with the description of what data location you're going to be querying later on. Hive要求您运行CREATE TABLE语句,因为Hive Metastore必须使用稍后将要查询的数据位置的描述进行更新。

Schema-on-read doesn't mean that you can query every possible file without knowing metadata beforehand such as storage location and storage format. 读取模式并不意味着您可以在不事先了解元数据(例如存储位置和存储格式)的情况下查询每个可能的文件。

SparkSQL or Apache Drill, on the other hand, will let you infer the schema from a file, but you must again define the column types for a TSV if you don't want everything to be a string column (or coerced to unexpected types). 另一方面,SparkSQL或Apache Drill允许您从文件中推断模式,但是如果您不希望所有内容都为字符串列(或强制为意外类型),则必须再次为TSV定义列类型。 。 Both of these tools can interact with a Hive metastore for "decoupled" storage of schema information 这两个工具都可以与Hive Metastore进行交互,以“分离”存储架构信息

you can use Hue : 您可以使用Hue:

http://gethue.com/hadoop-tutorial-create-hive-tables-with-headers-and/ http://gethue.com/hadoop-tutorial-create-hive-tables-with-headers-and/

or with Spark you can infer the schema of csv file and you can save it as a hive table. 或使用Spark您可以推断csv文件的架构,并将其另存为配置单元表。

val df=spark.read
  .option("delimiter", "\t")
  .option("header",true)
  .option("inferSchema", "true") // <-- HERE
  .csv("/home/cloudera/Book1.csv")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM