[英]Appending column values into new cell in the same row in Pandas dataframe
I have a csv file that has columns name
, sub_a
, sub_b
, sub_c
, sub_d
, segment
and gender
.我有一个 csv 文件,其中包含列
name
、 sub_a
、 sub_b
、 sub_c
、 sub_d
、 segment
和gender
。 I would like create a new column classes
with all the classes ( sub
-columns) seperated by comma that each student takes.我想创建一个新的列
classes
,其中所有类( sub
列)用逗号分隔,每个学生都参加。
What would be the easiest way to accomplish this?实现这一目标的最简单方法是什么?
The result dataframe should look like this:结果 dataframe 应如下所示:
+------+-------+-------+-------+-------+---------+--------+---------------------+
| name | sub_a | sub_b | sub_c | sub_d | segment | gender | classes |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| john | 1 | 1 | 0 | 1 | 1 | 0 | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mike | 1 | 0 | 1 | 1 | 0 | 0 | sub_a, sub_c, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mary | 1 | 1 | 0 | 1 | 1 | 1 | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| fred | 1 | 0 | 1 | 0 | 0 | 0 | sub_a, sub_c |
+------+-------+-------+-------+-------+---------+--------+---------------------+
Let us try dot
让我们尝试
dot
s=df.filter(like='sub')
df['classes']=s.astype(bool).dot(s.columns+',').str[:-1]
You can use apply
with axis=1
您可以将
apply
与axis=1
一起使用
For Ex.: if your dataframe like例如:如果您的 dataframe 喜欢
df
A_a A_b B_b B_c
0 1 0 0 1
1 0 1 0 1
2 1 0 1 0
you can do你可以做
df['classes'] = df.apply(lambda x: ', '.join(df.columns[x==1]), axis = 1)
df
A_a A_b B_b B_c classes
0 1 0 0 1 A_a, B_c
1 0 1 0 1 A_b, B_c
2 1 0 1 0 A_a, B_b
To apply
on specific columns you can filter first using loc
要
apply
特定列,您可以先使用loc
进行过滤
#for your sample data
df_ = df.loc[:,'sub_a':'sub_d'] #or df.loc[:,'sub_a', 'sub_b', 'sub_c', 'sub_d']
df_.apply(lambda x: ', '.join(df_.columns[x==1]), axis = 1)
You indeed want to iterate through the rows.您确实想要遍历行。 However, you can not directly add the classes to the DataFrame as all columns of the DataFrame need to be equally long.
但是,您不能直接将类添加到 DataFrame,因为 DataFrame 的所有列都需要同样长。 So the trick is to first generate the column and then add it later:
所以诀窍是先生成列,然后再添加:
subjects = ['subj_a', 'subj_b', 'subj_c']
classes_per_student [] # the empty column
for _, student in df.iterrows():
# first create a list of the classes taken by this student
classes = [subj for subj in subjects if student[subj]]
# create a single string
classes = ', '.join(classes)
# append to the column under construction
classes_per_student.append(classes)
# and finaly add the column to the DataFrame
df['classes'] = classes_per_student
You can use apply
only on the sub
-columns to apply a lambda function that will join the names of the sub
-columns where the values of the columns equal 1:您只能在
sub
列上使用apply
来应用 lambda function ,它将连接sub
列的名称,其中列的值等于 1:
sub_cols = ['sub_a', 'sub_b', 'sub_c', 'sub_d']
df['classes'] = df[sub_cols].apply(lambda x: ', '.join(df[sub_cols].columns[x == 1]), axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.