繁体   English   中英

Python:如何在条件下使用 select 列?

[英]Python: how to select columns with condition?

我有一个 dataframe 如下所示:

df
    id          d1         d2          d3         a1    a2       a3
0   474     0.000243    0.000243    0.001395    bank    bank    atm
1   964     0.000239    0.000239    0.000899    bank    bank    bank
2   4823    0.000472    0.000472    0.000834    fuel    fuel    fuel
3   7225    0.002818    0.002818    0.023900    bank    bank    fuel
4   7747    0.001036    0.001036    0.001415    dentist dentist bank

我想 select d1d2d3与相应的a1a2a3之间的最小值。

df
    id      d          a
0  474  0.000243     bank
1  964  0.000239     bank
2 4823  0.000472     fuel
3 7225  0.002818     bank
4 7747  0.001036     dentist

If want select columns by lists get column name by DataFrame.idxmin , rename columns and then use DataFrame.lookup in DataFrame.assign for new columns:

col1 = ['d1','d2','d3']
col2 = ['a1','a2','a3']

pos = df[col1].idxmin(axis=1).map(dict(zip(col1, col2)))

df = df[['id']].assign(d = df[col1].min(axis=1), a = df.lookup(df.index, pos))
print (df)
     id         d        a
0   474  0.000243     bank
1   964  0.000239     bank
2  4823  0.000472     fuel
3  7225  0.002818     bank
4  7747  0.001036  dentist

您可以在此处使用pd.wide_to_long获取格式 dataframe,将[d,a]指定为存根名称。 然后 groupby id和 index 取didxmin

df = (pd.wide_to_long(df, stubnames=['d','a'], suffix= '\d+', i='id', j='j')
        .reset_index().drop('j',1))
df = df.loc[df.groupby('id').d.idxmin().values]

print(df)

     id         d        a
0   474  0.000243     bank
1   964  0.000239     bank
2  4823  0.000472     fuel
3  7225  0.002818     bank
4  7747  0.001036  dentist

如上所述采用pd.wide_to_long时,dataframe 为:

pd.wide_to_long(df, stubnames=['d','a'], suffix= '\d+', i='id', j='j')

              d        a
id   j                   
474  1  0.000243     bank
964  1  0.000239     bank
4823 1  0.000472     fuel
7225 1  0.002818     bank
7747 1  0.001036  dentist
474  2  0.000243     bank
964  2  0.000239     bank
4823 2  0.000472     fuel
7225 2  0.002818     bank
7747 2  0.001036  dentist
474  3  0.001395      atm
964  3  0.000899     bank
4823 3  0.000834     fuel
7225 3  0.023900     fuel
7747 3  0.001415     bank

我们只需要在id中分组并找到最小值的索引。

@yatu 的解决方案是这里的动力 - 在我看到从宽到长的任何地方,我都会测试多索引上的堆栈是否适合:):

#set id as index:
df = df.set_index('id')

#split columns based on the numbers, and expand=True
#this converts the columns into a MultiIndex
#drop the last level, as it is empty text
df.columns = df.columns.str.split("(\d+)",expand=True).droplevel(-1)

#get indices for a min on groupby:
ind = df.stack().groupby('id').idxmin().d

#get minimum rows : 
df.stack().loc[ind].droplevel(-1)


         a          d
id      
474     bank    0.000243
964     bank    0.000239
4823    fuel    0.000472
7225    bank    0.002818
7747    dentist 0.001036

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM