[英]Merge rows of a dataframe according to values in a specific column by concatenating the strings in the other columns into one
[英]python dataframe merge columns according to other column values
我想要做的是根据另一列中的值合并列最好用一个简单的例子来说明:我有一个包含 5 列的数据框:
| player_num | team_1.x | team_1.y | team_2.x | team_2.y |
|------------ |---------- |---------- |---------- |---------- |
| 1 | x_1 | y_1 | x_2 | y_2 |
| 4 | x_3 | y_3 | x_4 | y_4 |
| 8 | x_5 | y_5 | x_6 | y_6 |
我想得到下表:
| x | y |
|----- |----- |
| x_1 | y_1 |
| x_3 | y_3 |
| x_6 | y_6 |
其中,列中填充了来自 team_1.x 和 team_1.y 的值(用于编号小于 5 的球员行)和来自 team_2.x 和 team_2.y 的值(用于编号大于 5 的球员行)
您可以为此使用 Numpy 的 np.where:
import numpy as np
...
df['x'] = np.where(df['player_num'] < 5, df['team_1.x'], df['team_2.x'])
df['y'] = np.where(df['player_num'] < 5, df['team_1.y'], df['team_2.y'])
编辑:
# Extract column names and remove prefix to get a list of x,y,z, etc.
cols = [col.split('.')[1] for col in list(df) if 'team_' in col]
# Loop over and create new column for each prefix (x, y, z, etc)
for col in cols:
col1 = 'team_1.' + col
col2 = 'team_2.' + col
df[col] = np.where(df['player_num']<5, df[col1], df[col2])
您可以根据条件分离数据框,然后连接结果
l = df.loc[df["player_num"].lt(5), ["team_1.x", "team_1.y"]].rename(columns={"team_1.x": "x", "team_1.y": "y"})
g = df.loc[df["player_num"].gt(5), ["team_2.x", "team_2.y"]].rename(columns={"team_2.x": "x", "team_2.y": "y"})
df_res = pd.concat([l, g])
print(df_res)
这根本不是一个优雅的解决方案,但它应该有效......
for i in df.loc[:, 'player_num']:
index = df.loc[df.loc[:, 'player_num'] == i].index[0]
if i >= 5:
df.loc[index, 'x'] = df.loc[index, 'team_2.x']
df.loc[index, 'y'] = df.loc[index, 'team_2.y']
else:
df.loc[index, 'x'] = df.loc[index, 'team_1.x']
df.loc[index, 'y'] = df.loc[index, 'team_1.x']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.