简体   繁体   English

如何将空的 pandas dataframe 转换为 Pyspark Z6A8064B5DF4794555005D53C74

[英]How can I convert an empty pandas dataframe to Pyspark dataframe?

I'd like a safe way to convert a pandas dataframe to a pyspark dataframe which can handle cases where the pandas dataframe is empty (lets say after some filter has been applied). I'd like a safe way to convert a pandas dataframe to a pyspark dataframe which can handle cases where the pandas dataframe is empty (lets say after some filter has been applied). For example the following will fail:例如以下将失败:

Assumes you have a spark session假设您有火花 session

import pandas as pd
raw_data = []
cols = ['col_1', 'col_2', 'col_3']
types_dict = {
    'col_1': str,
    'col_2': float,
    'col_3': bool
}
pandas_df = pd.DataFrame(raw_data, columns=cols).astype(types_dict)
spark_df = spark.createDataframe(pandas_df)

Resulting error: ValueError: can not infer schema from empty dataset结果错误: ValueError: can not infer schema from empty dataset

One option is to build a function which could iterate through the pandas dtypes and construct a Pyspark dataframe schema, but that could get a little complicated with structs and whatnot. One option is to build a function which could iterate through the pandas dtypes and construct a Pyspark dataframe schema, but that could get a little complicated with structs and whatnot. Is there a simpler solution?有没有更简单的解决方案?

How can I convert an empty pandas dataframe to a Pyspark dataframe and maintain the column datatypes?如何将空的 pandas dataframe 转换为 Pyspark Z6A8064B5DF47945050DZ553 列 C450550DZ553 和维护C4

If I understand correctly your problem try something with try-except block.如果我正确理解您的问题,请尝试使用 try-except 块。

def test(df):
       try:
          """
          What ever the operations you want on your df.
          """
       except:
          df = pd.DataFrame({'col_1': pd.Series(dtype='str'),
               'col_2': pd.Series(dtype='float'),
               'col_3': pd.Series(dtype='bool'),
               })
return df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 Pandas DataFrame 对象的 PySpark RDD 转换为单个 Spark ZC699575A5E8AFD9E22A7ECC8 - How can I convert a PySpark RDD of Pandas DataFrame objects into a single Spark Dataframe? 如何将空的 pandas Dataframe 转换为 Polars Dataframe - how to convert an empty pandas Dataframe into a polars Dataframe 如何将非常大的 pyspark dataframe 转换为 pandas? - How to convert a very large pyspark dataframe into pandas? 如何将此字典转换为 Pandas dataframe? - How can I convert this dictionary into a Pandas dataframe? How to convert a sql query to Pandas Dataframe and PySpark Dataframe - How to convert a sql query to Pandas Dataframe and PySpark Dataframe 如何将pandas数据帧转换为具有rdd属性的pyspark数据帧? - How to convert pandas dataframe to pyspark dataframe which has attribute to rdd? 将pyspark groupedData转换为pandas DataFrame - Convert pyspark groupedData to pandas DataFrame 如何将pyspark.sql.dataframe.DataFrame转换回databricks笔记本中的sql表 - How can I convert a pyspark.sql.dataframe.DataFrame back to a sql table in databricks notebook 如何计算 pandas dataframe 中空值的百分比? - How can I calculate the percentage of empty values in a pandas dataframe? 如何向Pandas数据框添加空的稀疏序列? - How can I add an empty sparse series to a Pandas dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM