简体   繁体   English

PySpark - 将数据帧写入 Hive 表

[英]PySpark - Write data frame into Hive table

I have an empty Hive table.我有一个空的 Hive 表。 I have 18 jobs that I am running, and each one could have a data frame that I would need to add into the Hive table with a parquet file.我有 18 个正在运行的作业,每个作业都可能有一个数据框,我需要将其添加到 Hive 表中,并使用 parquet 文件。

What I have is something like this:我所拥有的是这样的:

df2.write.parquet(SOME_HDFS_DIR/my_table_dir)

But this doesn't seem quite right.但这似乎不太正确。 Do I have to add some .parquet file name and keep appending it each time?我是否必须添加一些.parquet文件名并每次都添加它? I have seen some syntax is Scala but not Python.我见过一些语法是 Scala 但不是 Python。

df.write.parquet will overwrite the parquet files of the location, but with the option, df.write.parquet将覆盖该位置的镶木地板文件,但可以选择,

df.write.mode('append').parquet('path')

then it will create a new parquet file to the path and so you can read the data from the table.然后它将为路径创建一个新的镶木地板文件,因此您可以从表中读取数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM