簡體 English 中英

列的值為 null 並交換為 pyspark dataframe

[英]Values of the columns are null and swapped in pyspark dataframe

原文 2022-02-16 10:41:20 0 2 python/ pandas/ dataframe/ apache-spark/ pyspark

我正在使用 pyspark==2.3.1。 我已經使用 pandas 對數據進行了數據預處理，現在我想將我的預處理 function 從 pandas 轉換為 pyspark。但是在使用 pyspark 讀取數據 CSV 文件時，很多值實際上變成了 85363418 的一些值如果我嘗試對此 dataframe 執行任何操作，那么它會將列的值與其他列交換。 我還嘗試了不同版本的 pyspark。請讓我知道我做錯了什么。 謝謝

來自 pyspark 的結果：

“property_type”列的值有 null 但實際 dataframe 有一些值而不是 null。

CSV 文件：

但是 pyspark 可以很好地處理小數據集。 IE

2 個解決方案

在我們中，我們遇到了類似的問題。 您需要檢查的事項

檢查您的數據是否為 " [雙引號] pypark 會認為這是定界符並且數據看起來格式不正確
檢查您的 csv 數據是否為多行我們通過提及以下配置來處理這種情況

spark.read.options(header=True, inferSchema=True, escape='"').option("multiline",'true').csv(schema_file_location)

您是否限制使用 CSV 文件格式？ 試試鑲木地板。 只需使用.to_parquet()而不是.to_csv()將您的 DataFrame 保存在 pandas 中。 Spark 非常適合這種格式。

"Pyspark 數據框有效地獲取大部分為空值的列"

[英]Pyspark dataframe get columns that has mostly null values efficiently

如何在PySpark DataFrame中刪除具有空值的所有列？

[英]How to drop all columns with null values in a PySpark DataFrame?

從PySpark DataFrame中的非空列中選擇值

[英]Selecting values from non-null columns in a PySpark DataFrame

填寫 pyspark dataframe null 值

[英]Filling pyspark dataframe null values

使用中值和平均值估算的 PySpark 空值能夠處理 pyspark 數據幀中的非數字列

[英]PySpark null values imputed using median and mean being able to handle non numeric columns in pyspark dataframe

將列值轉換為 pyspark dataframe 中的列

[英]transform columns values to columns in pyspark dataframe

從 pyspark dataframe 中刪除 null 列

[英]Remove null columns from a pyspark dataframe

創建一個 PySpark function 以確定 dataframe 中的兩個或多個選定列是否具有 Z37A6259CC0C1DAEFF299

[英]Create a PySpark function that determines if two or more selected columns in a dataframe have null values

如何在pyspark數據框中返回具有空值的行？

[英]How to return rows with Null values in pyspark dataframe?

在從 Kafka 服務器接收 json 后更改 pyspark 數據幀的兩列的數據類型，但得到空值

[英]Change the datatype for two columns of a pyspark dataframe after receiving json from Kafka server but am getting null values

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 "Pyspark 數據框有效地獲取大部分為空值的列" 如何在PySpark DataFrame中刪除具有空值的所有列？從PySpark DataFrame中的非空列中選擇值填寫 pyspark dataframe null 值使用中值和平均值估算的 PySpark 空值能夠處理 pyspark 數據幀中的非數字列將列值轉換為 pyspark dataframe 中的列從 pyspark dataframe 中刪除 null 列創建一個 PySpark function 以確定 dataframe 中的兩個或多個選定列是否具有 Z37A6259CC0C1DAEFF299 如何在pyspark數據框中返回具有空值的行？在從 Kafka 服務器接收 json 后更改 pyspark 數據幀的兩列的數據類型，但得到空值

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM