I have a dataframe with multiple columns and I need to divide it into a vector of groups defined by a parameter (columns etc)
R has a split function as below:
dataframe A
> date c1 c2 c3 c4
> 2021 1 1 a ss
> 2022 1 1 b sa
> 2023 3 1 b sd
data_splitting= split(A, by=c('C1', 'C2'), keep.by=FALSE)
result in R vector :
Vector
> 1.1
> 2021 a ss
> 2022 b sa
> 3.1
> 2023 b sd
I need the similar functionality in python
Thanks Kostas
This can be achieved in Pandas through groupby
in pandas
.
import pandas as pd
test_a = pd.DataFrame(dict(
date=(2021, 2022, 2023),
c1=(1,1,3),
c2=(1,1,1),
c3=("a", "b", "b"),
c4 =("ss", "sa", "sd")
))
split_a = test_a.groupby(["c1", "c2"])
Now split_a
would be an iterator which holds the data frames above. You can recover them through iterating over them to create a list of data frames:
for indx, split_data in split_a:
print("Index:", indx)
print(split_data)
# if you need the values, just use split_data.values
Since you're applying a prediction for each group; this can be done through an apply over the group by. As a simple example, lets do a function which returns the number of rows in a dataframe:
def nrows(df):
return df.shape[0]
Then running this with apply will run the "prediction function" over each group:
def nrows(df):
return df.shape[0]
nrows_by_group = test_a.groupby(["c1", "c2"]).apply(nrows)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.