向pandas DataFrame添加新列会导致NaN

Question

I have a pandas DataFrame data with the following transaction data: 我有一个熊猫数据帧data具有以下的交易数据：

           A         date
0      M000833  2016-08-01
1      M000833  2016-08-01
2      M000833  2016-08-02
3      M000833  2016-08-02 
4      M000511  2016-08-05

I want a new column with the count of number of visits (multiple visits per day should be treated as 1) per consumer. 我希望每个消费者都有一个新列，其中包含访问次数（每天多次访问次数应被视为1次）。

So I tried this: 所以我尝试了这个：

import pandas as pd
data['noofvisits'] = data.groupby(['A'])['date'].nunique()

When I just run the statement without assigning it to the DataFrame, I get a pandas series with the desired output. 当我只运行语句而不将其分配给DataFrame时，我得到了一个带有所需输出的pandas系列。 However, the above statement result in: 但是，上述声明导致：

           A         date       noofvisits
0      M000833  2016-08-01         NaN         
1      M000833  2016-08-01         NaN
2      M000833  2016-08-02         NaN
3      M000833  2016-08-02         NaN
4      M000511  2016-08-05         NaN

The expected output is: 预期的产出是：

           A         date       noofvisits
0      M000833  2016-08-01         2         
1      M000833  2016-08-01         2
2      M000833  2016-08-02         2
3      M000833  2016-08-02         2
4      M000511  2016-08-05         1

What is wrong with this approach? 这种方法有什么问题？ Why does the column noofvisits results in NAs rather than the count values? 为什么列noofvisits导致NA而不是计数值？

Answer 1

Use transform to generate a Series with it's index aligned to the original df: 使用transform生成一个Series ，其索引与原始df对齐：

In[32]:
df['noofvisits'] = df.groupby(['A'])['date'].transform('nunique')
df

Out[32]: 
             A        date  noofvisits
index                                 
0      M000833  2016-08-01           2
1      M000833  2016-08-01           2
2      M000833  2016-08-02           2
3      M000833  2016-08-02           2
4      M000511  2016-08-05           1

The problem with direct assigning is that you're group ing on column 'A' so this becomes the index of the groupby aggregation, you then try to assign to your df but the indices don't agree hence the NaN column values. 直接分配的问题是你要对列'A' group ，因此这将成为groupby聚合的索引，然后尝试分配给你的df，但索引不同意因此NaN列值。

Also even if the index values did agree the shape is different anyway: 即使指数值确实同意，形状也是不同的：

In[33]:
df.groupby(['A'])['date'].nunique()

Out[33]: 
A
M000511    1
M000833    2
Name: date, dtype: int64

向pandas DataFrame添加新列会导致NaN

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-06-13 09:24:55

向pandas DataFrame添加新列会导致NaN

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-06-13 09:24:55

解决方案1
3 已采纳 2017-06-13 09:24:55