简体   繁体   English

如何根据 pyspark/python 中的值过滤 dataframe?

[英]How to filter dataframe based on values in pyspark/python?

I have a dataframe like below.我有一个如下所示的 dataframe。 I want to read the dataframe and filter the records based on start time and store in different dataframes.我想读取 dataframe 并根据开始时间过滤记录并存储在不同的数据框中。

INPUT DF输入方向

name      start_time
AA        2022-11-16
AAA       2022-11-15
BBB       2022-11-14

For eg: I need to store each record based on start time, which means all, 16 th date start time records should go to one dataframe and so on.例如:我需要根据开始时间存储每条记录,这意味着所有,第 16 个日期开始时间记录应该 go 到一个 dataframe 等等。

OUTPUT DF OUTPUT 东风

df1 = ["Store 2022-11-16 record"]
df2 = ["Store 2022-11-15 record"]
df3 = ["Store 2022-11-14 record"]

Well, technially a duplicate but idk how to report that but I think this works:好吧,技术上是重复的,但我不知道如何报告,但我认为这可行:

df = pd.DataFrame({"name" : ["AA", "AAA", "BBB"], 
"start_time" : ["2022-11-16"," 2022-11-15", "2022-11-14"]})

dfs = dict(tuple(df.groupby('start_time')))

dfs

you can select each DataFrame by the start time:你可以 select 每个 DataFrame 由开始时间:

print (dfs['2022-11-14''])

    name    start_time
2   BBB 2022-11-14

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM