pandas to_csv function 从 Spark UDF 调用时不写入 Blob 存储

Question

I am using a Spark UDF to read some data from a GET endpoint and write them as a CSV file to a Azure BLOB location.我正在使用 Spark UDF 从 GET 端点读取一些数据并将它们作为 CSV 文件写入 Azure BLOB 位置。

My GET endpoint takes 2 query parameters,param1 and param2.我的 GET 端点采用 2 个查询参数，param1 和 param2。 So initially, I have a dataframe paramDF that has two columns param1 and param2.所以最初，我有一个 dataframe paramDF，它有两列 param1 和 param2。

param1   param2
12        25
45        95

Schema:    paramDF:pyspark.sql.dataframe.DataFrame
           param1:string
           param2:string

Now I write a UDF that accept the two parameters, register it, and then invoke this UDF for each row in the dataframe. UDF is as below:现在我写了一个接受这两个参数的UDF，注册它，然后为dataframe中的每一行调用这个UDF。UDF如下：

    def executeRestApi(param1,param2):
      dlist=[]
      try:
        print(DataUrl.format(token=TOKEN, q1=param1,q2=param2))
        response=requests.get(DataUrl.format(token=TOKEN, oid=param1,wid=param2))
        if(response.status_code==200):
          metrics=response.json()['data']['metrics']
          dic={}
          dic['metric1'] = metrics['metric1']
          dic['metric2'] = metrics['metric2']
          dlist.append(dic)
        
    pandas.DataFrame(dlist).to_csv("../../dbfs/mnt/raw/Important/MetricData/listofmetrics.csv",header=True,index=False,mode='x')
    return "Success"
          
   except Exception as e:
        return "Failure"

Register the UDF:注册 UDF：

udf_executeRestApi = udf(executeRestApi, StringType())

Finally the call the UDF this way最后以这种方式调用 UDF

paramDf.withColumn("result",udf_executeRestApi(col("param1"),col("param2"))

I dont see any errors while calling the UDF, in fact the UDF returns the value "Success" correctly.我在调用 UDF 时没有看到任何错误，实际上 UDF 正确返回值“Success”。 Only thing is that the files are not written to Azure BLOB storage, no matter what I try.唯一的问题是文件没有写入 Azure BLOB 存储，无论我尝试什么。 UDFs' are primarily meant for custom functionality(and return a value).However,in my case, I am trying to execute the GET API call and the write operation using the UDF(and that is my main intention here). UDF' 主要用于自定义功能（并返回一个值）。但是，就我而言，我正在尝试使用 UDF 执行 GET API 调用和写入操作（这是我的主要意图）。

There is no issue with my pandas.DataFrame().tocsv(),as the same line, when tried separately,with a simple list is writing data to the BLOB correctly.我的 pandas.DataFrame().tocsv() 没有问题，因为同一行，当单独尝试时，使用一个简单的列表正确地将数据写入 BLOB。

What could be going wrong here?这里可能出了什么问题？

Note: Env is Spark on Databricks.注意：Env 是 Databricks 上的 Spark。 There isn't any problem with the indentation, even though it looks untidy here.缩进没有任何问题，尽管它在这里看起来不整洁。

Answer 1

Try calling display on the dataframe尝试在 dataframe 上调用display

pandas to_csv function 从 Spark UDF 调用时不写入 Blob 存储

问题描述

1 个解决方案

解决方案1
0 2023-01-26 00:38:38

pandas to_csv function 从 Spark UDF 调用时不写入 Blob 存储

问题描述

1 个解决方案

解决方案1 0 2023-01-26 00:38:38

解决方案1
0 2023-01-26 00:38:38