简体   繁体   English

HIVE查询删除源数据Blob

[英]HIVE Query Deleting source data blob

I am using a Azure HdInsight(3.1.3.577). 我正在使用Azure HdInsight(3.1.3.577)。

running the HIVEQL statement: LOAD DATA INPATH '/myData/employee.txt' INTO TABLE employee; 运行HIVEQL语句:LOAD DATA INPATH'/myData/employee.txt'INTO TABLE员工;

loads the data correctly but also has the side effect of removing the source text file. 正确加载数据,但也具有删除源文本文件的副作用。 This behavior is puzzling to me. 这种行为令我感到困惑。

In the documentation ( https://cwiki.apache.org/confluence/display/Hive/GettingStarted ) there is the following "loading data from HDFS will result in moving the file/directory. As a result, the operation is almost instantaneous." 在文档( https://cwiki.apache.org/confluence/display/Hive/GettingStarted )中,存在以下“从HDFS加载数据将导致文件/目录的移动。因此,该操作几乎是瞬时的。 ”

My confusion is why this would be efficient, given that the HDFS (Azure blobstore) has to be loaded afresh with the source data for each run. 我的困惑是为什么这样做会很有效,因为每次运行都必须使用源数据重新加载HDFS(Azure blobstore)。

Hive uses HDFS to store it's tables data under default location "/user/hive/warehouse". Hive使用HDFS将其表数据存储在默认位置“ / user / hive / warehouse”下。

When the data is already exists in HDFS, we create a External table and provides the HDFS path using LOCATION keyword. 当数据已经存在于HDFS中时,我们将创建一个外部表,并使用LOCATION关键字提供HDFS路径。 This will not result in moving the file / directory to default location. 这不会导致将文件/目录移动到默认位置。

By doing this Hive assumes that the data exists in the given path and it doesn't owns the data. 通过这样做,Hive假定数据存在于给定的路径中,并且不拥有该数据。

Even if you drop the table, the data specified in the path still exists. 即使删除表,路径中指定的数据仍然存在。

try this, 尝试这个,

create external table myTable (Userid string, name string)
row format delimited
fields terminated by '\t'
LOCATION '/myData/employee.txt' ; 

Location should be in hdfs; 位置应在hdfs中;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 删除内部Hive表不会删除基础Azure Blob存储的内容 - Dropping internal Hive table is not deleting the contents of the underlying Azure Blob Storage ADF Copy 活动中的数据一致性检查,Source 作为 Teradata 表上的查询,Sink 作为 Azure Blob 中的 CSV - Data Consistency check in ADF Copy activity with Source as Query on Teradata Table and Sink as CSV in Azure Blob 使用 hive 查询进行数据解析 - Data parsing using hive query HDinsight配置单元输出到Blob - HDinsight hive output to blob 在 MVC 3 中删除 Azure Blob - Deleting an Azure Blob in MVC 3 将数据工厂用于来自Blob存储的复制活动作为源时出错 - Error using data factory for copyactivity from blob storage as source Azure 映射数据流:无法使用 blob 存储数据集作为源 - Azure Mapping Data Flow : Not able to use blob storage dataset as a source 从Hive到Blob文件再到Azure DataWarehouse处理数据时,数据长度出现问题 - Issue in data-length while processing data from Hive to Blob file to Azure DataWarehouse 数据工厂:从Blob复制到SQL后如何执行查询? - Data factory: how to execute query after copy from blob to SQL? 使用jQuery从Azure Blob存储中查询JSON数据 - Query JSON data from Azure Blob Storage with jQuery
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM