Using this data set (some cols and hundreds of rows omitted for brevity) . . .
Year Ceremony Award Winner Name
0 1927/1928 1 Best Actress 0.0 Louise Dresser
1 1927/1928 1 Best Actress 1.0 Janet Gaynor
2 1937 10 Best Actress 0.0 Janet Gaynor
3 1927/1928 1 Best Actress 0.0 Gloria Swanson
4 1929/1930 3 Best Actress 0.0 Gloria Swanson
5 1950 23 Best Actress 0.0 Gloria Swanson
I used the following command . . .
ba_dob.loc[ba_dob.Winner == 0.0, :].groupby('Name').Winner.count()
To create the following series . . .
Name
Ali MacGraw 1
Amy Adams 1
Angela Bassett 1
Angelina Jolie 1
Anjelica Huston 1
Ann Harding 1
Ann-Margret 1
Anna Magnani 1
Anne Bancroft 4
Anne Baxter 1
Anne Hathaway 1
Annette Bening 3
Audrey Hepburn 4
I tried adding the series to the original dataframe like so . . .
ba_dob['New_Col'] = ba_dob.loc[ba_dob.Winner == 0.0, :].groupby('Name').Winner.count()
I got an column of NaN values.
I've read the other posts suggesting that there might be some faulty indexing at work, but I'm not sure how that would shake out. More specifically, why would Pandas not be able to line up the indexes, as the groupby and count are coming from the same table. Is there something else afoot?
I think you need size
, not count
, because count
exclude NaN
s:
Last map
column Name
by Series
created by groupby
:
m = ba_dob.Winner == 0.0
ba_dob['new'] = ba_dob['Name'].map(ba_dob[m].groupby('Name').Winner.size())
print (ba_dob)
Year Ceremony Award Winner Name new
0 1927/1928 1 Best Actress 0.0 Louise Dresser 1
1 1927/1928 1 Best Actress 1.0 Janet Gaynor 1
2 1937 10 Best Actress 0.0 Janet Gaynor 1
3 1927/1928 1 Best Actress 0.0 Gloria Swanson 3
4 1929/1930 3 Best Actress 0.0 Gloria Swanson 3
5 1950 23 Best Actress 0.0 Gloria Swanson 3
Another solution:
ba_dob['new'] = ba_dob['Name'].map(ba_dob.loc[m, 'Name'].value_counts())
You can join your result on the initial data frame
New_col = df.loc[df.Winner == 0.0, :].groupby('Name').Winner.count().rename('New_col')
df = df.join(New_col, on='Name')
Output :
Award Ceremony Name Winner Year New_col
0 Best Actress 1927/1928 Louise Dresser 0.0 0 1
1 Best Actress 1927/1928 Janet Gaynor 1.0 1 1
2 Best Actress 1937 Janet Gaynor 0.0 2 1
3 Best Actress 1927/1928 Gloria Swanson 0.0 3 3
4 Best Actress 1929/1930 Gloria Swanson 0.0 4 3
5 Best Actress 1950 Gloria Swanson 0.0 5 3
You can also use map
mapper = ba_dob.loc[ba_dob.Winner == 0.0, :].groupby('Name').Winner.count()
ba_dob['New_Col'] = ba_dob['Name'].map(mapper)
You get
Year Ceremony Award Winner Name New_Col
0 1927/1928 1 BestActress 0.0 Louise Dresser 1
1 1927/1928 1 BestActress 1.0 Janet Gaynor 1
2 1937 10 BestActress 0.0 Janet Gaynor 1
3 1927/1928 1 BestActress 0.0 Gloria Swanson 3
4 1929/1930 3 BestActress 0.0 Gloria Swanson 3
5 1950 23 BestActress 0.0 Gloria Swanson 3
Think you need to use reset_index() ,which removes hierarchy and creates two fields Name & Count.Post that select 'Count' field to add it to dataframe. Something like
ba_dob['New_Col'] = ba_dob.loc[ba_dob.Winner == 0.0, :].groupby('Name').Winner.count().reset_index()['count']
您的groupby
不会覆盖整个DataFrame
,只会覆盖Winner == 0
的行,所以当然对于这些行,您将获得NaN
。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.