如何在 PySpark 中复制 Pandas 的 between_time function

Question

I want to replicate the between_time function of Pandas in PySpark.我想在 PySpark 中复制between_time的 between_time function。 Is it possible since in Spark the dataframe is distributed and there is no indexing based on datetime?是否有可能因为在 Spark 中 dataframe 是分布式的并且没有基于日期时间的索引？

i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
ts.between_time('0:45', '0:15')

Is something similar possible in PySpark? PySpark 中是否有类似的可能？

pandas.between_time - API pandas.between_time - API

Answer 1

If you have a timestamp column, say ts , in a Spark dataframe, then for your case above, you can just use如果您在 Spark dataframe 中有一个时间戳列，例如ts ，那么对于上述情况，您可以使用

import pyspark.sql.functions as F

df2 = df.filter(F.hour(F.col('ts')).between(0,0) & F.minute(F.col('ts')).between(15,45))

如何在 PySpark 中复制 Pandas 的 between_time function

问题描述

1 个解决方案

解决方案1
0 2020-12-12 06:49:43

如何在 PySpark 中复制 Pandas 的 between_time function

问题描述

1 个解决方案

解决方案1 0 2020-12-12 06:49:43

解决方案1
0 2020-12-12 06:49:43