简体   繁体   中英

Adding series to Pandas dataframe yields column of NaN

Using this data set (some cols and hundreds of rows omitted for brevity) . . .

    Year    Ceremony    Award          Winner   Name    
0   1927/1928   1       Best Actress    0.0     Louise Dresser  
1   1927/1928   1       Best Actress    1.0     Janet Gaynor
2   1937        10      Best Actress    0.0     Janet Gaynor
3   1927/1928   1       Best Actress    0.0     Gloria Swanson  
4   1929/1930   3       Best Actress    0.0     Gloria Swanson
5   1950        23      Best Actress    0.0     Gloria Swanson  

I used the following command . . .

ba_dob.loc[ba_dob.Winner == 0.0, :].groupby('Name').Winner.count()

To create the following series . . .

Name
Ali MacGraw                1
Amy Adams                  1
Angela Bassett             1
Angelina Jolie             1
Anjelica Huston            1
Ann Harding                1
Ann-Margret                1
Anna Magnani               1
Anne Bancroft              4
Anne Baxter                1
Anne Hathaway              1
Annette Bening             3
Audrey Hepburn             4

I tried adding the series to the original dataframe like so . . .

ba_dob['New_Col'] = ba_dob.loc[ba_dob.Winner == 0.0, :].groupby('Name').Winner.count()

I got an column of NaN values.

I've read the other posts suggesting that there might be some faulty indexing at work, but I'm not sure how that would shake out. More specifically, why would Pandas not be able to line up the indexes, as the groupby and count are coming from the same table. Is there something else afoot?

I think you need size , not count , because count exclude NaN s:

Last map column Name by Series created by groupby :

m = ba_dob.Winner == 0.0
ba_dob['new'] = ba_dob['Name'].map(ba_dob[m].groupby('Name').Winner.size())
print (ba_dob)
        Year  Ceremony         Award  Winner            Name  new
0  1927/1928         1  Best Actress     0.0  Louise Dresser    1
1  1927/1928         1  Best Actress     1.0    Janet Gaynor    1
2       1937        10  Best Actress     0.0    Janet Gaynor    1
3  1927/1928         1  Best Actress     0.0  Gloria Swanson    3
4  1929/1930         3  Best Actress     0.0  Gloria Swanson    3
5       1950        23  Best Actress     0.0  Gloria Swanson    3

Another solution:

ba_dob['new'] = ba_dob['Name'].map(ba_dob.loc[m, 'Name'].value_counts())

You can join your result on the initial data frame

New_col = df.loc[df.Winner == 0.0, :].groupby('Name').Winner.count().rename('New_col')
df = df.join(New_col, on='Name')

Output :

    Award           Ceremony    Name            Winner  Year New_col
0   Best Actress    1927/1928   Louise Dresser  0.0     0    1
1   Best Actress    1927/1928   Janet Gaynor    1.0     1    1
2   Best Actress    1937        Janet Gaynor    0.0     2    1
3   Best Actress    1927/1928   Gloria Swanson  0.0     3    3
4   Best Actress    1929/1930   Gloria Swanson  0.0     4    3
5   Best Actress    1950        Gloria Swanson  0.0     5    3

You can also use map

mapper = ba_dob.loc[ba_dob.Winner == 0.0, :].groupby('Name').Winner.count()
ba_dob['New_Col'] = ba_dob['Name'].map(mapper)

You get

    Year        Ceremony    Award       Winner  Name            New_Col
0   1927/1928   1           BestActress 0.0     Louise Dresser  1
1   1927/1928   1           BestActress 1.0     Janet Gaynor    1
2   1937        10          BestActress 0.0     Janet Gaynor    1
3   1927/1928   1           BestActress 0.0     Gloria Swanson  3
4   1929/1930   3           BestActress 0.0     Gloria Swanson  3
5   1950        23          BestActress 0.0     Gloria Swanson  3

Think you need to use reset_index() ,which removes hierarchy and creates two fields Name & Count.Post that select 'Count' field to add it to dataframe. Something like

 ba_dob['New_Col'] = ba_dob.loc[ba_dob.Winner == 0.0, :].groupby('Name').Winner.count().reset_index()['count']

您的groupby不会覆盖整个DataFrame ,只会覆盖Winner == 0的行,所以当然对于这些行,您将获得NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM