[英]I can read from local file in py spark but i can't write data frame in local file
df.write.csv("sdf")
" 21/07/24 15:27:23 ERROR FileFormatWriter: Aborting job a9914f88-3ab9-480a-984f-33d0e598c0fc. java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/ String;I)Z at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:645) at org.apache.hadoop.fs .FileUtil.canRead(FileUtil.java:1230) at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1435) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:493) at org.apache.hadoop. fs.FileSystem.listStatus(FileSystem.java:1868) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(Checksum FileSystem.java:678) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) at org.apache.hadoop.mapreduce.lib.output .FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:332) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:402) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java :375) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:182) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:220) at org .apache.spark.sql.execution.datasources.Insert IntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) at org.apache.spark.sql.execution.command.DataWritingCommandExec. sideEffectResult(commands.scala:106) at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1( SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151 ) 在 org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) 在 org.apache.spark.sql.execution.SparkSlan.4execute8(SparkPlan.execute.scala:215) 552156588:176) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) at org .apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org. apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache. spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.883688262367 88.DataFrameWriter.runCommand(DataFrameWriter.scala:989) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:979) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)在 sun.reflect.NativeMethodAccessorImpl.invoke(未知來源) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(未知來源) 在 java.lang.reflect.Method.invoke(未知來源) 在 py4j.reflection.MethodInvoker.invoke(MethodInvoker.88213284869458 :244) 在 py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 在 py4j.Gateway.invoke(Gateway.java:282) 在 py4j.commands.AbstractCommand.invokeMethod(AbstractComm and.java:132) 在 py4j.commands.CallCommand.execute(CallCommand.java:79) 在 py4j.GatewayConnection.run(GatewayConnection.java:238) 在 java.Traceback(Unknown Sourceback.Thread)最后調用):文件“”,第 1 行,在文件“C:\spark\python\pyspark\sql\readwriter.py”中,第 1372 行,在 csv self._jwrite.csv(path) 文件“C:\spark\ python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py",第 1305 行,調用文件“C:\spark\python\pyspark\sql\utils.py”,第 111 行,deco 返回f(*a, **kw) 文件“C:\spark\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py”,第 328 行,在 get_return_value py4j.protocol.Py4JJavaError:一個錯誤調用 o40.csv 時發生。 :org.apache.spark.SparkException:作業中止。 at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) at org. apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) at org.apache. spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark. sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) 在 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala: 151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark .sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) at org.apache.spark.sql.DataFrameWriter.$ anonfun$runCommand$1(DataFrameWriter.scala:989) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$ .withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withA ctive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) at org.apache.spark.sql.DataFrameWriter.save( DataFrameWriter.scala:293) at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:979) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(未知來源)位於 java.lang.reflect.Method.invoke(未知來源)位於 py4j.reflection.MethodInvoker.invoke(MethodInvoker.882132469458 88:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j. commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop. io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$Windows. access(NativeIO.java:645) at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1230) at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1435) at org.apache.hadoop.fs.RawLocalFile System.listStatus(RawLocalFileSystem.java:493) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) at org.apache.hadoop.fs .ChecksumFileSystem.listStatus(ChecksumFileSystem.java:678) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) at org.apache.hadoop. mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:332) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:402) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter. commitJob(FileOutputCommitter.java:375)在 org.88352839 602088.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:182) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:220)
除了獲取 winutils.exe 和設置 hadoop_home。 請檢查您的 bin 中是否有 hadoop.dll 二進制文件。 如果沒有,則從 github repo 下載它。
https://github.com/cdarlint/winutils/blob/master/hadoop-3.2.1/bin/hadoop.dll
它對我有用。
將 hadoop.dll 添加到 HADOOP_HOME 路徑可以解決問題.. 謝謝虔誠
對於PySpark 3.3.1
, Win10
Java 18
。
winutils.exe
和hadoop.dll
的bin
文件夾(從這里https://github.com/cdarlint/winutils或這里https://github.com/steveloughran/winutils )(我的版本是3.0.0
或任何其他更高版本然后3.0.0
),將文件夾放入C:\hadoop
System variables
並創建HADOOP_HOME
並設置為C\hadoop
。System vars
下的path
,如%HADOOP_HOME%\bin
。 如果在寫入文件 json 或 parquet 等過程中出現任何問題,請嘗試將hadoop.dll
文件放入System32
文件夾中。 就是這樣。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.