[英]Splitting a column in a data frame over several columns
I'm loading a csv file that has two columns: date
and tags
.我正在加载一个包含两列的 csv 文件:
date
和tags
。 tags
contains a list of tags like so: tags
包含一个标签列表,如下所示:
date,tags
2021-09-08,"#foo, #bar"
2021-09-10,"#bar"
2021-09-15,"#bar, #baz"
2021-09-22,"#bar"
loading it with pandas will result in a data frame where all tags are put into one column like so:用 pandas 加载它会产生一个数据框,其中所有标签都放在一列中,如下所示:
date tags
0 2021-09-08 #foo, #bar
1 2021-09-10 #bar
2 2021-09-15 #bar, #baz
3 2021-09-22 #bar
So, how do I create from this a data frame, a data frame where each tag is separated into their own column:那么,我如何从这个数据框创建一个数据框,其中每个标签都被分成自己的列:
date foo bar baz
0 2021-09-08 True True False
1 2021-09-10 False True False
2 2021-09-15 False True True
3 2021-09-22 False True False
Use Series.str.get_dummies
with convert 0,1
to boolean and add to date
column by DataFrame.join
:使用
Series.str.get_dummies
并将0,1
转换为布尔值并通过DataFrame.join
添加到date
列:
df = df[['date']].join(df['tags'].str.get_dummies(', ').astype(bool))
print(df)
date #bar #baz #foo
0 2021-09-08 True False True
1 2021-09-10 True False False
2 2021-09-15 True True False
3 2021-09-22 True False False
If need remove #
add rename
with custom function:如果需要删除
#
添加自定义功能rename
:
f = lambda x: x.lstrip('#')
df = df[['date']].join(df['tags'].str.get_dummies(', ').astype(bool).rename(columns=f))
print(df)
date bar baz foo
0 2021-09-08 True False True
1 2021-09-10 True False False
2 2021-09-15 True True False
3 2021-09-22 True False False
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.