[英]HIVE Query Deleting source data blob
I am using a Azure HdInsight(3.1.3.577). 我正在使用Azure HdInsight(3.1.3.577)。
running the HIVEQL statement: LOAD DATA INPATH '/myData/employee.txt' INTO TABLE employee; 运行HIVEQL语句:LOAD DATA INPATH'/myData/employee.txt'INTO TABLE员工;
loads the data correctly but also has the side effect of removing the source text file. 正确加载数据,但也具有删除源文本文件的副作用。 This behavior is puzzling to me.
这种行为令我感到困惑。
In the documentation ( https://cwiki.apache.org/confluence/display/Hive/GettingStarted ) there is the following "loading data from HDFS will result in moving the file/directory. As a result, the operation is almost instantaneous." 在文档( https://cwiki.apache.org/confluence/display/Hive/GettingStarted )中,存在以下“从HDFS加载数据将导致文件/目录的移动。因此,该操作几乎是瞬时的。 ”
My confusion is why this would be efficient, given that the HDFS (Azure blobstore) has to be loaded afresh with the source data for each run. 我的困惑是为什么这样做会很有效,因为每次运行都必须使用源数据重新加载HDFS(Azure blobstore)。
Hive uses HDFS to store it's tables data under default location "/user/hive/warehouse". Hive使用HDFS将其表数据存储在默认位置“ / user / hive / warehouse”下。
When the data is already exists in HDFS, we create a External table and provides the HDFS path using LOCATION keyword. 当数据已经存在于HDFS中时,我们将创建一个外部表,并使用LOCATION关键字提供HDFS路径。 This will not result in moving the file / directory to default location.
这不会导致将文件/目录移动到默认位置。
By doing this Hive assumes that the data exists in the given path and it doesn't owns the data. 通过这样做,Hive假定数据存在于给定的路径中,并且不拥有该数据。
Even if you drop the table, the data specified in the path still exists. 即使删除表,路径中指定的数据仍然存在。
try this, 尝试这个,
create external table myTable (Userid string, name string)
row format delimited
fields terminated by '\t'
LOCATION '/myData/employee.txt' ;
Location should be in hdfs; 位置应在hdfs中;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.