列切片熊猫

Question

Here is the dummy DataFrame: 这是虚拟的DataFrame：

d = {'col_1': [1, 2], 'col_n_1': [3, 4], 'col_2': [2, 1], 'col_n_2': [6, 3]}
df = pd.DataFrame(data=d)


   col_1    col_2   col_n_1   col_n_2
0      1        2         3         6
1      2        1         4         3
2      1        1         4         5

I am looking for a nice way to extract the values from col_n_1 where col_1 == 1 and col_n_2 where col_2 == 1 in a new column that would look like: 我正在寻找一种不错的方法来从col_n_1中提取值，其中col_1 == 1和col_n_2其中col_2 == 1在新列中如下所示：

new_col
      3
      3
    4,5

Answer 1

Use where for get values by mask and then join columns together: 使用where通过掩码获取值，然后join列连接在一起：

L = ['col_1','col_2']
L1 = ['col_n_1','col_n_2']
df['new'] = (df[L1].astype(str).where(df[L].eq(1).values, axis=1)
                  .apply(lambda x: ','.join(x.dropna()), 1))

Solution if only 2 columns: 如果只有2列，则解决方案：

L = ['col_1','col_2']
L1 = ['col_n_1','col_n_2']
df1 = df[L1].astype(str).where(df[L].eq(1).values, axis=1)
df['new'] = (df1['col_n_1'] .fillna('') + ',' + df1['col_n_2'] .fillna('')).str.strip(',')

Or solution with add , and then sum , last remove trailing , : 或与解决方案添加,然后sum ，最后删除尾随, ：

df['new'] = (df[L1].astype(str).where(df[L].eq(1).values)
                  .add(', ')
                  .fillna('')
                  .sum(axis=1)
                  .str.strip(', '))

print (df)
   col_1  col_2  col_n_1  col_n_2  new
0      1      2        3        6    3
1      2      1        4        3    3
2      1      1        4        5  4,5

Answer 2

Borrow the name list from Jez 从耶兹借用名单

df[L].eq(1).rename(columns=dict(zip(L,L1))).mul((df[L1].astype(str)+',')).sum(1).str[:-1]
Out[126]: 
0      3
1      3
2    4,5
dtype: object

Answer 3

This can be accomplished with the apply() method and a lambda function. 这可以通过apply()方法和lambda函数来完成。 apply() with the index parameter set to 1 will call a given function on each row of the dataframe. apply() index参数设置为1 apply()将在数据帧的每一行上调用给定函数。 So the only trouble is writing that function -- I think the best solution is to create a list containing either the row's col_n_1 or col_n_2 , both, or neither, then joining the list with commas. 因此，唯一的麻烦是编写该函数-我认为最好的解决方案是创建一个包含该行的col_n_1或col_n_2或两者都不包含的列表，然后用逗号将列表连接起来。 Like this: 像这样：

df['new'] = df.apply(lambda row: ','.join([str(row.col_n_1)] if row.col_1 == 1 else [] + [str(row.col_n_2)] if row.col_2 == 1 else []), axis = 1)

列切片熊猫

问题描述

3 个解决方案

解决方案1
3 已采纳 2018-07-26 15:43:53

解决方案2
2 2018-07-26 15:51:57

解决方案3
0 2018-07-26 15:52:35

列切片熊猫

问题描述

3 个解决方案

解决方案1 3 已采纳 2018-07-26 15:43:53

解决方案2 2 2018-07-26 15:51:57

解决方案3 0 2018-07-26 15:52:35

解决方案1
3 已采纳 2018-07-26 15:43:53

解决方案2
2 2018-07-26 15:51:57

解决方案3
0 2018-07-26 15:52:35