[英]How to collapse multiple columns into one in pandas
我有一個填充了用戶和類別的 Pandas 數據框,但這些類別有多個列。
| user | category | val1 | val2 | val3 |
| ------ | ------------------| -----| ---- | ---- |
| user 1 | c1 | 3 | NA | None |
| user 1 | c2 | NA | 4 | None |
| user 1 | c3 | NA | NA | 7 |
| user 2 | c1 | 5 | NA | None |
| user 2 | c2 | NA | 7 | None |
| user 2 | c3 | NA | NA | 2 |
我想得到它,以便將值壓縮到單個列中。
| user | category | value|
| ------ | ------------------| -----|
| user 1 | c1 | 3 |
| user 1 | c2 | 4 |
| user 1 | c3 | 7 |
| user 2 | c1 | 5 |
| user 2 | c2 | 7 |
| user 2 | c3 | 2 |
最終,得到如下矩陣:
np.array([[3, 4, 7], [5, 7, 2]])
您可以使用pd.DataFrame.bfill
回填所選列的值。
val_cols = ['val1', 'val2', 'val3']
df['value'] = pd.to_numeric(df[val_cols].bfill(axis=1).iloc[:, 0], errors='coerce')
print(df)
user0 category val1 val2 val3 value
0 user 1 c1 3.0 NaN None 3.0
1 user 1 c2 NaN 4.0 None 4.0
2 user 1 c3 NaN NaN 7 7.0
3 user 2 c1 5.0 NaN None 5.0
4 user 2 c2 NaN 7.0 2 7.0
5 user 2 c3 NaN NaN 2 2.0
['user', 'category']
設置索引d = df.set_index(['user', 'category'])
pd.Series(d.lookup(d.index, d.isna().idxmin(1)), d.index).reset_index(name='value')
user category value
0 user 1 c1 3
1 user 1 c2 4
2 user 1 c3 7
3 user 2 c1 5
4 user 2 c2 7
5 user 2 c3 2
您可以跳過索引的重置並取消堆疊以獲得最終結果
d = df.set_index(['user', 'category'])
pd.Series(d.lookup(d.index, d.isna().idxmin(1)), d.index).unstack()
category c1 c2 c3
user
user 1 3 4 7
user 2 5 7 2
您可以簡單地fillna(0)
( df2 = df.fillna(0)
) 並使用|
操作員。
先轉換成int
df2.loc[:, ['val1','val2','val3']] = df2[['val1','val2','val3']].astype(int)
然后
df2['val4'] = df2.val1.values | df2.val2.values | df2.val3.values
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.