[英]Adding series to Pandas dataframe yields column of NaN
使用此数据集(为简洁起见,省略了一些cols和数百行)。 。 。
Year Ceremony Award Winner Name
0 1927/1928 1 Best Actress 0.0 Louise Dresser
1 1927/1928 1 Best Actress 1.0 Janet Gaynor
2 1937 10 Best Actress 0.0 Janet Gaynor
3 1927/1928 1 Best Actress 0.0 Gloria Swanson
4 1929/1930 3 Best Actress 0.0 Gloria Swanson
5 1950 23 Best Actress 0.0 Gloria Swanson
我使用了以下命令。 。 。
ba_dob.loc[ba_dob.Winner == 0.0, :].groupby('Name').Winner.count()
要创建以下系列。 。 。
Name
Ali MacGraw 1
Amy Adams 1
Angela Bassett 1
Angelina Jolie 1
Anjelica Huston 1
Ann Harding 1
Ann-Margret 1
Anna Magnani 1
Anne Bancroft 4
Anne Baxter 1
Anne Hathaway 1
Annette Bening 3
Audrey Hepburn 4
我尝试将系列添加到原始数据框中,就像这样。 。 。
ba_dob['New_Col'] = ba_dob.loc[ba_dob.Winner == 0.0, :].groupby('Name').Winner.count()
我得到了一列NaN值。
我已经阅读了其他帖子,表明工作中可能存在一些错误的索引,但我不确定这会如何发生。 更具体地说,为什么Pandas无法排列索引,因为groupby和count来自同一个表。 还有其他事情在发生吗?
我认为你需要size
,而不是count
,因为count
排除NaN
:
上一个map
列按groupby
创建的Series
Name
:
m = ba_dob.Winner == 0.0
ba_dob['new'] = ba_dob['Name'].map(ba_dob[m].groupby('Name').Winner.size())
print (ba_dob)
Year Ceremony Award Winner Name new
0 1927/1928 1 Best Actress 0.0 Louise Dresser 1
1 1927/1928 1 Best Actress 1.0 Janet Gaynor 1
2 1937 10 Best Actress 0.0 Janet Gaynor 1
3 1927/1928 1 Best Actress 0.0 Gloria Swanson 3
4 1929/1930 3 Best Actress 0.0 Gloria Swanson 3
5 1950 23 Best Actress 0.0 Gloria Swanson 3
另一种方案:
ba_dob['new'] = ba_dob['Name'].map(ba_dob.loc[m, 'Name'].value_counts())
您可以在初始数据框中加入结果
New_col = df.loc[df.Winner == 0.0, :].groupby('Name').Winner.count().rename('New_col')
df = df.join(New_col, on='Name')
输出:
Award Ceremony Name Winner Year New_col
0 Best Actress 1927/1928 Louise Dresser 0.0 0 1
1 Best Actress 1927/1928 Janet Gaynor 1.0 1 1
2 Best Actress 1937 Janet Gaynor 0.0 2 1
3 Best Actress 1927/1928 Gloria Swanson 0.0 3 3
4 Best Actress 1929/1930 Gloria Swanson 0.0 4 3
5 Best Actress 1950 Gloria Swanson 0.0 5 3
你也可以使用地图
mapper = ba_dob.loc[ba_dob.Winner == 0.0, :].groupby('Name').Winner.count()
ba_dob['New_Col'] = ba_dob['Name'].map(mapper)
你得到
Year Ceremony Award Winner Name New_Col
0 1927/1928 1 BestActress 0.0 Louise Dresser 1
1 1927/1928 1 BestActress 1.0 Janet Gaynor 1
2 1937 10 BestActress 0.0 Janet Gaynor 1
3 1927/1928 1 BestActress 0.0 Gloria Swanson 3
4 1929/1930 3 BestActress 0.0 Gloria Swanson 3
5 1950 23 BestActress 0.0 Gloria Swanson 3
您需要使用reset_index(),它会删除层次结构并创建两个字段Name&Count.Post,选择“Count”字段将其添加到dataframe。 就像是
ba_dob['New_Col'] = ba_dob.loc[ba_dob.Winner == 0.0, :].groupby('Name').Winner.count().reset_index()['count']
您的groupby
不会覆盖整个DataFrame
,只会覆盖Winner == 0
的行,所以当然对于这些行,您将获得NaN
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.