简体   繁体   English

两个 Spark 数据帧的并集

[英]Union of two Spark dataframes

I tried to do an union between two Spark DataFrame in Python, one of them sometime is empty, I did a test if, to return that full.我试图在 Python 中的两个 Spark DataFrame 之间进行联合,其中一个有时是空的,我做了一个测试,如果返回完整。 This following a small code for example, it return an error:例如下面的一个小代码,它返回一个错误:

>>> from pyspark.sql.types import *
>>> fulldataframe = [StructField("FIELDNAME_1",StringType(), True),StructField("FIELDNAME_2", StringType(), True),StructField("FIELDNAME_3", StringType(), True)]
>>> schema = StructType([])
>>>
>>> dataframeempty = sqlContext.createDataFrame(sc.emptyRDD(), schema)
>>> resultunion = sqlContext.createDataFrame(sc.emptyRDD(), schema)
>>> if (fulldataframe.isEmpty()):
...     resultunion = dataframeempty
... elif (dataframeempty.isEmpty()):
...     resultunion = fulldataframe
... else:
...     resultunion=fulldataframe.union(dataframeempty)
...


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'isEmpty'
>>>

Someone can tell me where's the fault ?谁能告诉我错在哪里?

Count can take long.计数可能需要很长时间。 In Scala:在斯卡拉:

dataframe.rdd.isEmpty()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM