简体   繁体   English

如何使用 pyspark 获取 pandas dataframe

[英]How to get pandas dataframe using pyspark

I want to convert "pyspark.sql.dataframe.DataFrame" data to pandas. At the last line, "ConnectionRefusedError: [WinError 10061] Connection failed because the destination computer refused the connection" error occured.我想将“pyspark.sql.dataframe.DataFrame”数据转换为pandas。最后一行出现“ConnectionRefusedError: [WinError 10061] Connection failed because the destination computer refused the connection”错误。 How can I fix it?我该如何解决?

from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession, Row
import pandas as pd
import numpy as np
import os
import sys

# spark setting
# local
conf = SparkConf().set("spark.driver.host", "127.0.0.1")
sc = SparkContext(conf=conf)

# session
spark = SparkSession.builder.master("local[1]").appName("test_name").getOrCreate()

# file
path = "./data/fhvhv_tripdata_2022-10.parquet"
# header가 있는 경우 option 추가
data = spark.read.option("header", True).parquet(path)

# Error ocurred
pd_df = data.toPandas()

enter image description here在此处输入图像描述

I want to convert "pyspark.sql.dataframe.DataFrame" data to pandas.我想将“pyspark.sql.dataframe.DataFrame”数据转换为 pandas。

First, ensure you're running pyspark 3.2 or higher, as that's where koalas was added natively.首先,确保您运行的是 pyspark 3.2 或更高版本,因为这是本机添加考拉的地方。

Then, Connection errors could be many things, but have nothing to do with pandas. Your code is correct.然后,连接错误可能是很多事情,但与 pandas 无关。您的代码是正确的。 It's the.network/configuration that is not.它不是.network/configuration。 For example, on Windows, you'll need to configure external binary called winutils .例如,在 Windows 上,您需要配置名为winutils的外部二进制文件。

Note: You don't need a SparkContext here.注意:这里不需要 SparkContext。 You can pass options via SparkSession builder.您可以通过 SparkSession 构建器传递选项。

Otherwise, you're not using Hadoop. So, don't use Spark at all How to read a Parquet file into Pandas DataFrame?否则,您不会使用 Hadoop。因此,根本不要使用 Spark How to read a Parquet file into Pandas DataFrame?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM