[英]Python: how to select columns with condition?
我有一个 dataframe 如下所示:
df
id d1 d2 d3 a1 a2 a3
0 474 0.000243 0.000243 0.001395 bank bank atm
1 964 0.000239 0.000239 0.000899 bank bank bank
2 4823 0.000472 0.000472 0.000834 fuel fuel fuel
3 7225 0.002818 0.002818 0.023900 bank bank fuel
4 7747 0.001036 0.001036 0.001415 dentist dentist bank
我想 select d1
, d2
和d3
与相应的a1
, a2
或a3
之间的最小值。
df
id d a
0 474 0.000243 bank
1 964 0.000239 bank
2 4823 0.000472 fuel
3 7225 0.002818 bank
4 7747 0.001036 dentist
If want select columns by lists get column name by DataFrame.idxmin
, rename columns and then use DataFrame.lookup
in DataFrame.assign
for new columns:
col1 = ['d1','d2','d3']
col2 = ['a1','a2','a3']
pos = df[col1].idxmin(axis=1).map(dict(zip(col1, col2)))
df = df[['id']].assign(d = df[col1].min(axis=1), a = df.lookup(df.index, pos))
print (df)
id d a
0 474 0.000243 bank
1 964 0.000239 bank
2 4823 0.000472 fuel
3 7225 0.002818 bank
4 7747 0.001036 dentist
您可以在此处使用pd.wide_to_long
获取长格式 dataframe,将[d,a]
指定为存根名称。 然后 groupby id
和 index 取d
的idxmin
:
df = (pd.wide_to_long(df, stubnames=['d','a'], suffix= '\d+', i='id', j='j')
.reset_index().drop('j',1))
df = df.loc[df.groupby('id').d.idxmin().values]
print(df)
id d a
0 474 0.000243 bank
1 964 0.000239 bank
2 4823 0.000472 fuel
3 7225 0.002818 bank
4 7747 0.001036 dentist
如上所述采用pd.wide_to_long
时,dataframe 为:
pd.wide_to_long(df, stubnames=['d','a'], suffix= '\d+', i='id', j='j')
d a
id j
474 1 0.000243 bank
964 1 0.000239 bank
4823 1 0.000472 fuel
7225 1 0.002818 bank
7747 1 0.001036 dentist
474 2 0.000243 bank
964 2 0.000239 bank
4823 2 0.000472 fuel
7225 2 0.002818 bank
7747 2 0.001036 dentist
474 3 0.001395 atm
964 3 0.000899 bank
4823 3 0.000834 fuel
7225 3 0.023900 fuel
7747 3 0.001415 bank
我们只需要在id
中分组并找到最小值的索引。
@yatu 的解决方案是这里的动力 - 在我看到从宽到长的任何地方,我都会测试多索引上的堆栈是否适合:):
#set id as index:
df = df.set_index('id')
#split columns based on the numbers, and expand=True
#this converts the columns into a MultiIndex
#drop the last level, as it is empty text
df.columns = df.columns.str.split("(\d+)",expand=True).droplevel(-1)
#get indices for a min on groupby:
ind = df.stack().groupby('id').idxmin().d
#get minimum rows :
df.stack().loc[ind].droplevel(-1)
a d
id
474 bank 0.000243
964 bank 0.000239
4823 fuel 0.000472
7225 bank 0.002818
7747 dentist 0.001036
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.