简体   繁体   English

如何从HDFS读取配置单元数据

[英]How to read hive data from HDFS

I have hive warehouse in HDFS hdfs://localhost:8020/user/hive/warehouse. 我在HDFS hdfs:// localhost:8020 / user / hive / warehouse中有配置单元仓库。

I have a database mydb inside hdfs like hdfs://localhost:8020/user/hive/warehouse/mydb.db 我在hdfs内有一个数据库mydb,例如hdfs:// localhost:8020 / user / hive / warehouse / mydb.db

How can I create a table & insert data into it using Pyspark 如何使用Pyspark创建表并将数据插入到表中

Please suggest 请建议

Using hive context you will be able to create the table in Hive, Please see the below code to acheive that. 使用hive上下文,您将可以在Hive中创建表。请参见下面的代码以实现该目的。

import findspark
findspark.init()
import pyspark
from pyspark.sql import HiveContext

//hivecontext
sqlCtx= HiveContext(sc)

//Loading a csv file into dataframe
spark_df = sqlCtx.read.format('com.databricks.spark.csv').options(header='true', inferschema='true').load("./data/documents_topics.csv")

//registering temp table
spark_df.registerTempTable("TABLE_Y")

//Creating table out of an existing temp created from data frame table
sqlCtx.sql("CREATE TABLE TABLE_X AS SELECT * from TABLE_Y")

//creating a brand new table in Hive
sqlCtx.sql("CREATE TABLE SomeSchema.TABLE_X (customername string, id string, ts timestamp) STORED AS DESIREDFORMAT")

Hope you can understand with the comments in the code, let me know if you ran into issues. 希望您能理解代码中的注释,如果遇到问题,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM