How to filter dataframe based on values in pyspark/python?

Question

I have a dataframe like below. I want to read the dataframe and filter the records based on start time and store in different dataframes.

INPUT DF

name      start_time
AA        2022-11-16
AAA       2022-11-15
BBB       2022-11-14

For eg: I need to store each record based on start time, which means all, 16 th date start time records should go to one dataframe and so on.

OUTPUT DF

df1 = ["Store 2022-11-16 record"]
df2 = ["Store 2022-11-15 record"]
df3 = ["Store 2022-11-14 record"]

Answer 1

Well, technially a duplicate but idk how to report that but I think this works:

df = pd.DataFrame({"name" : ["AA", "AAA", "BBB"], 
"start_time" : ["2022-11-16"," 2022-11-15", "2022-11-14"]})

dfs = dict(tuple(df.groupby('start_time')))

dfs

you can select each DataFrame by the start time:

print (dfs['2022-11-14''])

    name    start_time
2   BBB 2022-11-14

How to filter dataframe based on values in pyspark/python?

Question

1 answers

solution1
0 2022-11-17 13:58:18

How to filter dataframe based on values in pyspark/python?

Question

1 answers

solution1 0 2022-11-17 13:58:18

solution1
0 2022-11-17 13:58:18