簡體   English   中英

如何將列中的唯一值列表添加到 dataframe?

[英]How to add a list of unique values from a column to a dataframe?

所以我有一些代碼可以創建一個新的 dataframe,它計算至少具有 1 個非 null 值的唯一狀態的數量以及按年份分組的非 null 值的總數。 代碼工作正常,但我想修改它以包含列出唯一狀態的新行。

這是我的數據:

    year    state   var1    var2    
0   2018    CA       NaN     2    
1   2018    TX       1       NaN    
2   2018    FL       NaN     NaN  
3   2018    AL       1       2    
4   2018    AL       NaN     1   
6   2019    CA       NaN     NaN  
7   2019    TX       1       1    
8   2019    FL       NaN     NaN  
9   2019    AL       2       1    
10  2019    AK       2       NaN 

這是我當前的 output:

                                                          2018     2019
var1
      Number of unique states with at least 1 non-null:   2        3
      Number of respondents with non-null var:            2        3
      Average:                                            1        1
var2
      Number of unique states with at least 1 non-null:   2        2   
      Number of respondents with non-null var:            3        2
      Average:                                            1.5      1

這是我正在使用的代碼:

c = df.groupby(['year', 'state']).count()
res = c.groupby('YEAR').agg([np.count_nonzero, sum]).T
res.index = res.index.set_levels(['Number of unique states with at least 1 non-null:', 
                                  'Number of respondents with non-null var:'], level=1)

z = res.swaplevel().T
res4 = pd.concat([z, pd.concat([z['Number of respondents with non-null var:'] / 
                                z['Number of unique states with at least 1 non-null:']], 
                              keys=['Average:'], axis=1),], 
                axis=1).T.swaplevel().sort_index()

這就是我想要新的 output 的樣子:

                                                          2018         2019
var1
      Number of unique states with at least 1 non-null:   2            3
      Unique states with at least 1 non-null:             [TX, AL]     [TX, AL, AK]
      Number of respondents with non-null var:            2            3
      Average:                                            1            1
var2
      Number of unique states with at least 1 non-null:   2            2   
      Unique states with at least 1 non-null:             [AL, CA]     [TX, AL]
      Number of respondents with non-null var:            3            2
      Average:                                            1.5          1

基本上我希望這一行包括“至少有 1 個非空的唯一狀態:”,列出狀態的名稱。 我怎樣才能做到這一點?

我為states創建新的 function f並在 MultiIndex 中為 label 聚合mean ,值由DataFrame.xs的除法選定行設置,最后rename為新的第二級MultiIndex

c = df.groupby(['year', 'state']).count()

def f(x):
    return x.index[x.ne(0)].droplevel(0).tolist()

df = c.groupby(['year']).agg([np.count_nonzero,f,'sum', 'mean']).T
df11 = df.xs('sum', level=1, drop_level=False).div(df.xs('count_nonzero', level=1), level=0)
df.loc[pd.IndexSlice[:,'mean'],:] =  df11.rename({'sum':'mean'}).astype(np.float64).round(1)

d = {'count_nonzero':'Number of unique states with at least 1 non-null:', 
     'sum':'Number of respondents with non-null var:',
     'f':'Unique states with at least 1 non-null',
     'mean':'Average:'}
df = df.rename(d)
print (df)
year                                                        2018          2019
var1 Number of unique states with at least 1 non-null:         2             3
     Unique states with at least 1 non-null             [AL, TX]  [AK, AL, TX]
     Number of respondents with non-null var:                  2             3
     Average:                                                1.0           1.0
var2 Number of unique states with at least 1 non-null:         2             2
     Unique states with at least 1 non-null             [AL, CA]      [AL, TX]
     Number of respondents with non-null var:                  3             2
     Average:                                                1.5           1.0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM