简体   繁体   中英

Python function with similar functionality as R split function

I have a dataframe with multiple columns and I need to divide it into a vector of groups defined by a parameter (columns etc)

R has a split function as below:

dataframe A

 >   date  c1 c2 c3  c4 

 >   2021   1  1  a  ss

 >   2022   1  1  b  sa

 >   2023   3  1  b  sd

data_splitting= split(A, by=c('C1', 'C2'), keep.by=FALSE)

result in R vector :

Vector

  >  1.1 

  >  2021 a ss

  >  2022 b sa

  >  3.1

  >  2023 b sd

I need the similar functionality in python

Thanks Kostas

This can be achieved in Pandas through groupby in pandas .

import pandas as pd

test_a = pd.DataFrame(dict(
  date=(2021, 2022, 2023),
  c1=(1,1,3),
  c2=(1,1,1),
  c3=("a", "b", "b"),
  c4 =("ss", "sa", "sd")
))


split_a = test_a.groupby(["c1", "c2"])

Now split_a would be an iterator which holds the data frames above. You can recover them through iterating over them to create a list of data frames:

for indx, split_data in split_a:
    print("Index:", indx)
    print(split_data)
    #  if you need the values, just use split_data.values

Since you're applying a prediction for each group; this can be done through an apply over the group by. As a simple example, lets do a function which returns the number of rows in a dataframe:

def nrows(df):
    return df.shape[0]

Then running this with apply will run the "prediction function" over each group:

def nrows(df):
    return df.shape[0]

nrows_by_group = test_a.groupby(["c1", "c2"]).apply(nrows)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM