簡體 English 中英

如何使用 pyspark 計算連續值？

[英]How do I count consecutive values with pyspark?

原文 2020-05-13 15:08:07 5 1 python/ pyspark

我正在嘗試計算出現在 Pyspark 列中的連續值。 我的 dataframe 中有“a”列，並希望創建“b”列。

+---+---+
|  a|  b|
+---+---+
|  0|  1|
|  0|  2|
|  0|  3|
|  0|  4|
|  0|  5|
|  1|  1|
|  1|  2|
|  1|  3|
|  1|  4|
|  1|  5|
|  1|  6|
|  2|  1|
|  2|  2|
|  2|  3|
|  2|  4|
|  2|  5|
|  2|  6|
|  3|  1|
|  3|  2|
|  3|  3|
+---+---+

我試圖在一些 window 上創建滯后 function 的列“b”，但沒有成功。

w = Window\
  .partitionBy(df.some_id)\
  .orderBy(df.timestamp_column)

df.withColumn(
  "b",
  f.when(df.a == f.lag(df.a).over(w),
         f.sum(f.lit(1)).over(w)).otherwise(f.lit(0))
)

1 個解決方案

我可以使用以下代碼解決此問題：

df.withColumn("b",
  f.row_number().over(Window.partitionBy("a").orderBy("timestamp_column"))

如何使用 Pyspark 和 NLTK 計算 POS 標簽？

[英]How Do I Count POS Tags Using Pyspark and NLTK?

如何按多列分組並在 PySpark 中計數？

[英]How do I group by multiple columns and count in PySpark?

如何根據 PySpark 中的不同行條件進行計數？

[英]How do I count based on different rows conditions in PySpark?

如何將字典中的多個值添加到 PySpark Dataframe

[英]How do I add multiple values from a dictionary to a PySpark Dataframe

Pyspark：如何提取每個鍵的最低值？

[英]Pyspark: How do I extract lowest values per key?

如何計算熊貓的價值

[英]How do I count values in pandas

如何使用pandas dataframe計算字符串中數字'1'側面最長的連續'0'

[英]How do I count the longest consecutive '0' flanked by number '1' in is string using pandas dataframe

如何計算 Pyspark 中先前出現的次數

[英]How to do a count the number of previous occurence in Pyspark

PySpark按條件計數值

[英]PySpark count values by condition

計算 pandas 中的連續重復值

[英]Count consecutive repeated values in pandas

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何使用 Pyspark 和 NLTK 計算 POS 標簽？如何按多列分組並在 PySpark 中計數？如何根據 PySpark 中的不同行條件進行計數？如何將字典中的多個值添加到 PySpark Dataframe Pyspark：如何提取每個鍵的最低值？如何計算熊貓的價值如何使用pandas dataframe計算字符串中數字'1'側面最長的連續'0' 如何計算 Pyspark 中先前出現的次數 PySpark按條件計數值計算 pandas 中的連續重復值

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM