简体   繁体   中英

Use groupby and merge to create new column in pandas

So I have a pandas dataframe that looks something like this.

    name  is_something
0   a     0
1   b     1
2   c     0
3   c     1
4   a     1
5   b     0
6   a     1
7   c     0
8   a     1

Is there a way to use groupby and merge to create a new column that gives the number of times a name appears with an is_something value of 1 in the whole dataframe? The updated dataframe would look like this:

    name  is_something no_of_times_is_something_is_1
0   a     0            3
1   b     1            1
2   c     0            1
3   c     1            1
4   a     1            3
5   b     0            1
6   a     1            3
7   c     0            1  
8   a     1            3

I know you can just loop through the dataframe to do this but I'm looking for a more efficient way because the dataset I'm working with is quite large. Thanks in advance!

If there are only 0 and 1 values in is_something column only use sum with GroupBy.transform for new column filled by aggregate values:

df['new'] = df.groupby('name')['is_something'].transform('sum')
print (df)
  name  is_something  new
0    a             0    3
1    b             1    1
2    c             0    1
3    c             1    1
4    a             1    3
5    b             0    1
6    a             1    3
7    c             0    1
8    a             1    3

If possible multiple values first compare by 1 , convert to integer and then use transform with sum :

df['new'] = df['is_something'].eq(1).view('i1').groupby(df['name']).transform('sum')

Or we just map it

df['New']=df.name.map(df.query('is_something ==1').groupby('name')['is_something'].sum())
df
  name  is_something  New
0    a             0    3
1    b             1    1
2    c             0    1
3    c             1    1
4    a             1    3
5    b             0    1
6    a             1    3
7    c             0    1
8    a             1    3

You could do:

df['new'] = df.groupby('name')['is_something'].transform(lambda xs: xs.eq(1).sum())
print(df)

Output

  name  is_something  new
0    a             0    3
1    b             1    1
2    c             0    1
3    c             1    1
4    a             1    3
5    b             0    1
6    a             1    3
7    c             0    1
8    a             1    3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM