在python中使用pandas根据其他列中给出的值选择列

Question

I have a data frame as:我有一个数据框：

a   b   c   d......

1   1
3   3   3   5
4   1   1   4   6
1   0

I want to select number of columns based on value given in column "a".我想根据列“a”中给出的值选择列数。 In this case for first row it would only select column b.在这种情况下，对于第一行，它只会选择列 b。 How can I achieve something like:我怎样才能实现这样的目标：

df.iloc[:,column b:number of columns corresponding to value in column a]

My expected output would be:我的预期输出是：

a   b   c   d   e
1   1   0   0   1     # 'e' contains value in column b because colmn a = 1 
3   3   3   5   335   #  'e' contains values of column b,c,d because colm a 
4   1   1   4   1      #  = 3
1   0           NAN

Answer 1

Define a little function for this:为此定义一个小函数：

def select(df, r):
    return df.iloc[r, 1:1 + df.iat[r, 0]]

The function uses iat to query the a column for that row, and iloc to select columns from the same row.该函数使用iat查询该行的a列，并使用iloc从同一行中选择列。

Call it as such:这样称呼它：

select(df, 0)

b    1.0
Name: 0, dtype: float64

And,和，

select(df, 1)

b    3.0
c    3.0
d    5.0
Name: 1, dtype: float64

Based on your edit, consider this -根据您的编辑，考虑这个 -

df

   a  b  c  d  e
0  1  1  0  0  0
1  3  3  3  5  0
2  4  1  1  4  6
3  1  0  0  0  0

Use where / mask (with numpy broadcasting) + agg here - where使用where / mask （使用 numpy 广播）+ agg -

df['e'] = df.iloc[:, 1:]\
            .astype(str)\
            .where(np.arange(df.shape[1] - 1) < df.a[:, None], '')\
            .agg(''.join, axis=1)

df

   a  b  c  d     e
0  1  1  0  0     1
1  3  3  3  5   335
2  4  1  1  4  1146
3  1  0  0  0     0

If nothing matches, then those entries in e will have an empty string.如果没有匹配项，则e条目将具有空字符串。 Just use replace -只需使用replace -

df['e'] = df['e'].replace('', np.nan)

Answer 2

A numpy slicing approach一种numpy切片方法

a = v[:, 0]
b = v[:, 1:]
n, m = b.shape
b = b.ravel()
b = np.where(b == 0, '', b.astype(str))
r = np.arange(n) * m
f = lambda t: b[t[0]:t[1]]

df.assign(g=list(map(''.join, map(f, zip(r, r + a)))))

   a  b  c  d  e     g
0  1  1  0  0  0     1
1  3  3  3  5  0   335
2  4  1  1  4  6  1146
3  1  0  0  0  0

Answer 3

Edit : one line solution with slicing.编辑：带切片的单行解决方案。

df["f"] = df.astype(str).apply(lambda r: "".join(r[1:int(r["a"])+1]), axis=1)

# df["f"] = df["f"].astype(int)  if you need `f` to be integer

df    
    a   b   c   d   e   f
0   1   1   X   X   X   1
1   3   3   3   5   X   335
2   4   1   1   4   6   1146
3   1   0   X   X   X   0

Dataset used:使用的数据集：

df = pd.DataFrame({'a': {0: 1, 1: 3, 2: 4, 3: 1},
                   'b': {0: 1, 1: 3, 2: 1, 3: 0},
                   'c': {0: 'X', 1: '3', 2: '1', 3: 'X'},
                   'd': {0: 'X', 1: '5', 2: '4', 3: 'X'},
                   'e': {0: 'X', 1: 'X', 2: '6', 3: 'X'}})

Suggestion for improvement would be appreciated!改进建议将不胜感激！

在python中使用pandas根据其他列中给出的值选择列

问题描述

3 个解决方案

解决方案1
3 2018-01-15 05:43:54

解决方案2
2 已采纳 2018-01-15 10:14:24

解决方案3
1 2018-01-15 06:11:09

在python中使用pandas根据其他列中给出的值选择列

问题描述

3 个解决方案

解决方案1 3 2018-01-15 05:43:54

解决方案2 2 已采纳 2018-01-15 10:14:24

解决方案3 1 2018-01-15 06:11:09

解决方案1
3 2018-01-15 05:43:54

解决方案2
2 已采纳 2018-01-15 10:14:24

解决方案3
1 2018-01-15 06:11:09