[英]How to add a list of unique values from a column to a dataframe?
所以我有一些代碼可以創建一個新的 dataframe,它計算至少具有 1 個非 null 值的唯一狀態的數量以及按年份分組的非 null 值的總數。 代碼工作正常,但我想修改它以包含列出唯一狀態的新行。
這是我的數據:
year state var1 var2
0 2018 CA NaN 2
1 2018 TX 1 NaN
2 2018 FL NaN NaN
3 2018 AL 1 2
4 2018 AL NaN 1
6 2019 CA NaN NaN
7 2019 TX 1 1
8 2019 FL NaN NaN
9 2019 AL 2 1
10 2019 AK 2 NaN
這是我當前的 output:
2018 2019
var1
Number of unique states with at least 1 non-null: 2 3
Number of respondents with non-null var: 2 3
Average: 1 1
var2
Number of unique states with at least 1 non-null: 2 2
Number of respondents with non-null var: 3 2
Average: 1.5 1
這是我正在使用的代碼:
c = df.groupby(['year', 'state']).count()
res = c.groupby('YEAR').agg([np.count_nonzero, sum]).T
res.index = res.index.set_levels(['Number of unique states with at least 1 non-null:',
'Number of respondents with non-null var:'], level=1)
z = res.swaplevel().T
res4 = pd.concat([z, pd.concat([z['Number of respondents with non-null var:'] /
z['Number of unique states with at least 1 non-null:']],
keys=['Average:'], axis=1),],
axis=1).T.swaplevel().sort_index()
這就是我想要新的 output 的樣子:
2018 2019
var1
Number of unique states with at least 1 non-null: 2 3
Unique states with at least 1 non-null: [TX, AL] [TX, AL, AK]
Number of respondents with non-null var: 2 3
Average: 1 1
var2
Number of unique states with at least 1 non-null: 2 2
Unique states with at least 1 non-null: [AL, CA] [TX, AL]
Number of respondents with non-null var: 3 2
Average: 1.5 1
基本上我希望這一行包括“至少有 1 個非空的唯一狀態:”,列出狀態的名稱。 我怎樣才能做到這一點?
我為states
創建新的 function f
並在 MultiIndex 中為 label 聚合mean
,值由DataFrame.xs
的除法選定行設置,最后rename
為新的第二級MultiIndex
:
c = df.groupby(['year', 'state']).count()
def f(x):
return x.index[x.ne(0)].droplevel(0).tolist()
df = c.groupby(['year']).agg([np.count_nonzero,f,'sum', 'mean']).T
df11 = df.xs('sum', level=1, drop_level=False).div(df.xs('count_nonzero', level=1), level=0)
df.loc[pd.IndexSlice[:,'mean'],:] = df11.rename({'sum':'mean'}).astype(np.float64).round(1)
d = {'count_nonzero':'Number of unique states with at least 1 non-null:',
'sum':'Number of respondents with non-null var:',
'f':'Unique states with at least 1 non-null',
'mean':'Average:'}
df = df.rename(d)
print (df)
year 2018 2019
var1 Number of unique states with at least 1 non-null: 2 3
Unique states with at least 1 non-null [AL, TX] [AK, AL, TX]
Number of respondents with non-null var: 2 3
Average: 1.0 1.0
var2 Number of unique states with at least 1 non-null: 2 2
Unique states with at least 1 non-null [AL, CA] [AL, TX]
Number of respondents with non-null var: 3 2
Average: 1.5 1.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.