简体   繁体   English

如何在 PySpark 中复制 Pandas 的 between_time function

[英]How to replicate the between_time function of Pandas in PySpark

I want to replicate the between_time function of Pandas in PySpark.我想在 PySpark 中复制between_time的 between_time function。 Is it possible since in Spark the dataframe is distributed and there is no indexing based on datetime?是否有可能因为在 Spark 中 dataframe 是分布式的并且没有基于日期时间的索引?

i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
ts.between_time('0:45', '0:15')

Is something similar possible in PySpark? PySpark 中是否有类似的可能?

pandas.between_time - API pandas.between_time - API

If you have a timestamp column, say ts , in a Spark dataframe, then for your case above, you can just use如果您在 Spark dataframe 中有一个时间戳列,例如ts ,那么对于上述情况,您可以使用

import pyspark.sql.functions as F

df2 = df.filter(F.hour(F.col('ts')).between(0,0) & F.minute(F.col('ts')).between(15,45))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM