繁体   English   中英

Azure Blob 存储错误无法解析 spark 中的日期

[英]Azure Blob storage error can't parse a date in spark

我正在尝试使用 python 将 azure datalake gen2 中分配的文件读取到 spark dataframe 中。

代码是

from pyspark import SparkConf
from pyspark.sql import SparkSession


# create spark session
key = "some_key"
appName = "DataExtract"
master = "local[*]"
sparkConf = SparkConf() \
    .setAppName(appName) \
    .setMaster(master) \
    .set("fs.azure.account.key.myaccount.dfs.core.windows.net", key)

spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()

data_csv="abfs://test-file-system@myaccount.dfs.core.windows.net/data.csv"
data_out = "abfs://test-file-system@myaccount.dfs.core.windows.net/data_out.csv"

# read csv
df = self.spark_session.read.csv(data_csv)

# write csv
df.write.csv(data_out)

该文件已读取并写入良好,但出现以下错误

ERROR AzureBlobFileSystemStore: Failed to parse the date Thu, 09 Sep 2021 10:12:34 GMT

日期似乎是文件创建日期。
如何解析日期以避免出现错误?

我尝试重现同样的问题,发现正是这些行导致了错误。

 data_csv="abfs://test-file-system@myaccount.dfs.core.windows.net/data.csv" data_out = "abfs://test-file-system@myaccount.dfs.core.windows.net/data_out.csv" # read csv df = self.spark_session.read.csv(data_csv) ```

这是当我尝试替换上面的代码行时对我有用的代码。 abfsabfss

from pyspark import SparkConf
from pyspark.sql import SparkSession

# create spark session
key = "<Your Storage Account Key>"
appName = "<Synapse App Name>"
master = "local[*]"
sparkConf = SparkConf() \
.setAppName(appName) \
.setMaster(master) \
.set("fs.azure.account.key.<Storage Account Name>.dfs.core.windows.net", key)  

spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()

data_csv="abfss://<ContainerName>@<Storage Account Name>.dfs.core.windows.net/<Directory>"

# read csv
df1 = spark.read.option('header','true')\
.option('delimiter', ',')\
.csv(data_csv + '/sample1.csv')

df1.show()

# write csv
df2 = df1.write.csv(data_csv + '/<Give the name of blob you want to write to>.csv')

否则你甚至可以尝试下面的代码,它非常适合我

from pyspark.sql import SparkSession
from pyspark.sql.types import *

account_name = "<StorageAccount Name>"
container_name = "<Storage Account Container Name>"
relative_path = "<Directory path>"
adls_path = 'abfss://%s@%s.dfs.core.windows.net/%s'%(container_name,account_name,relative_path)

dataframe1 = spark.read.option('header','true')\
.option('delimiter', ',')\
.csv(adls_path + '/sample1.csv')

dataframe1.show()

dataframe2 = dataframe1.write.csv(adls_path + '/<Give the name of blob you want to write to>.csv')

参考: Synapse Spark – 使用 Synapse Spark 从 Azure Data Lake Storage Gen 2 读取 CSV 个文件,使用 Python - SQL Stijn (sql-stijn.com)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM