将列值附加到 Pandas dataframe 中同一行的新单元格中

Question

I have a csv file that has columns name , sub_a , sub_b , sub_c , sub_d , segment and gender .我有一个 csv 文件，其中包含列name 、 sub_a 、 sub_b 、 sub_c 、 sub_d 、 segment和gender 。 I would like create a new column classes with all the classes ( sub -columns) seperated by comma that each student takes.我想创建一个新的列classes ，其中所有类（ sub列）用逗号分隔，每个学生都参加。

What would be the easiest way to accomplish this?实现这一目标的最简单方法是什么？

The result dataframe should look like this:结果 dataframe 应如下所示：

+------+-------+-------+-------+-------+---------+--------+---------------------+
| name | sub_a | sub_b | sub_c | sub_d | segment | gender | classes             |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| john | 1     | 1     | 0     | 1     | 1       | 0      | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mike | 1     | 0     | 1     | 1     | 0       | 0      | sub_a, sub_c, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mary | 1     | 1     | 0     | 1     | 1       | 1      | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| fred | 1     | 0     | 1     | 0     | 0       | 0      | sub_a, sub_c        |
+------+-------+-------+-------+-------+---------+--------+---------------------+

Answer 1

Let us try dot让我们尝试dot

s=df.filter(like='sub')
df['classes']=s.astype(bool).dot(s.columns+',').str[:-1]

Answer 2

You can use apply with axis=1您可以将apply与axis=1一起使用

For Ex.: if your dataframe like例如：如果您的 dataframe 喜欢

df
   A_a  A_b  B_b  B_c
0    1    0    0    1
1    0    1    0    1
2    1    0    1    0

you can do你可以做

df['classes'] = df.apply(lambda x: ', '.join(df.columns[x==1]), axis = 1)
df
   A_a  A_b  B_b  B_c   classes
0    1    0    0    1  A_a, B_c
1    0    1    0    1  A_b, B_c
2    1    0    1    0  A_a, B_b

To apply on specific columns you can filter first using loc要apply特定列，您可以先使用loc进行过滤

#for your sample data
df_ = df.loc[:,'sub_a':'sub_d']             #or df.loc[:,'sub_a', 'sub_b', 'sub_c', 'sub_d']
df_.apply(lambda x: ', '.join(df_.columns[x==1]), axis = 1)

Answer 3

You indeed want to iterate through the rows.您确实想要遍历行。 However, you can not directly add the classes to the DataFrame as all columns of the DataFrame need to be equally long.但是，您不能直接将类添加到 DataFrame，因为 DataFrame 的所有列都需要同样长。 So the trick is to first generate the column and then add it later:所以诀窍是先生成列，然后再添加：

subjects = ['subj_a', 'subj_b', 'subj_c']
classes_per_student [] # the empty column

for _, student in df.iterrows():
    # first create a list of the classes taken by this student
    classes = [subj for subj in subjects if student[subj]]
    # create a single string
    classes = ', '.join(classes)  
    # append to the column under construction
    classes_per_student.append(classes)

# and finaly add the column to the DataFrame
df['classes'] = classes_per_student

Answer 4

You can use apply only on the sub -columns to apply a lambda function that will join the names of the sub -columns where the values of the columns equal 1:您只能在sub列上使用apply来应用 lambda function ，它将连接sub列的名称，其中列的值等于 1：

sub_cols = ['sub_a', 'sub_b', 'sub_c', 'sub_d']
df['classes'] = df[sub_cols].apply(lambda x: ', '.join(df[sub_cols].columns[x == 1]), axis=1)

将列值附加到 Pandas dataframe 中同一行的新单元格中

问题描述

4 个解决方案

解决方案1
2 2020-05-18 17:43:45

解决方案2
1 2020-05-18 17:30:17

解决方案3
0 2020-05-18 17:37:45

解决方案4
0 2020-05-18 18:50:18

将列值附加到 Pandas dataframe 中同一行的新单元格中

问题描述

4 个解决方案

解决方案1 2 2020-05-18 17:43:45

解决方案2 1 2020-05-18 17:30:17

解决方案3 0 2020-05-18 17:37:45

解决方案4 0 2020-05-18 18:50:18

解决方案1
2 2020-05-18 17:43:45

解决方案2
1 2020-05-18 17:30:17

解决方案3
0 2020-05-18 17:37:45

解决方案4
0 2020-05-18 18:50:18