简体   繁体   English

将列值附加到 Pandas dataframe 中同一行的新单元格中

[英]Appending column values into new cell in the same row in Pandas dataframe

I have a csv file that has columns name , sub_a , sub_b , sub_c , sub_d , segment and gender .我有一个 csv 文件,其中包含列namesub_asub_bsub_csub_dsegmentgender I would like create a new column classes with all the classes ( sub -columns) seperated by comma that each student takes.我想创建一个新的列classes ,其中所有类( sub列)用逗号分隔,每个学生都参加。

What would be the easiest way to accomplish this?实现这一目标的最简单方法是什么?

The result dataframe should look like this:结果 dataframe 应如下所示:

+------+-------+-------+-------+-------+---------+--------+---------------------+
| name | sub_a | sub_b | sub_c | sub_d | segment | gender | classes             |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| john | 1     | 1     | 0     | 1     | 1       | 0      | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mike | 1     | 0     | 1     | 1     | 0       | 0      | sub_a, sub_c, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mary | 1     | 1     | 0     | 1     | 1       | 1      | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| fred | 1     | 0     | 1     | 0     | 0       | 0      | sub_a, sub_c        |
+------+-------+-------+-------+-------+---------+--------+---------------------+

Let us try dot让我们尝试dot

s=df.filter(like='sub')
df['classes']=s.astype(bool).dot(s.columns+',').str[:-1]

You can use apply with axis=1您可以将applyaxis=1一起使用

For Ex.: if your dataframe like例如:如果您的 dataframe 喜欢

df
   A_a  A_b  B_b  B_c
0    1    0    0    1
1    0    1    0    1
2    1    0    1    0

you can do你可以做

df['classes'] = df.apply(lambda x: ', '.join(df.columns[x==1]), axis = 1)
df
   A_a  A_b  B_b  B_c   classes
0    1    0    0    1  A_a, B_c
1    0    1    0    1  A_b, B_c
2    1    0    1    0  A_a, B_b

To apply on specific columns you can filter first using locapply特定列,您可以先使用loc进行过滤

#for your sample data
df_ = df.loc[:,'sub_a':'sub_d']             #or df.loc[:,'sub_a', 'sub_b', 'sub_c', 'sub_d']
df_.apply(lambda x: ', '.join(df_.columns[x==1]), axis = 1)

You indeed want to iterate through the rows.您确实想要遍历行。 However, you can not directly add the classes to the DataFrame as all columns of the DataFrame need to be equally long.但是,您不能直接将类添加到 DataFrame,因为 DataFrame 的所有列都需要同样长。 So the trick is to first generate the column and then add it later:所以诀窍是先生成列,然后再添加:

subjects = ['subj_a', 'subj_b', 'subj_c']
classes_per_student [] # the empty column

for _, student in df.iterrows():
    # first create a list of the classes taken by this student
    classes = [subj for subj in subjects if student[subj]]
    # create a single string
    classes = ', '.join(classes)  
    # append to the column under construction
    classes_per_student.append(classes)

# and finaly add the column to the DataFrame
df['classes'] = classes_per_student

You can use apply only on the sub -columns to apply a lambda function that will join the names of the sub -columns where the values of the columns equal 1:您只能在sub列上使用apply来应用 lambda function ,它将连接sub列的名称,其中列的值等于 1:

sub_cols = ['sub_a', 'sub_b', 'sub_c', 'sub_d']
df['classes'] = df[sub_cols].apply(lambda x: ', '.join(df[sub_cols].columns[x == 1]), axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM