简体   繁体   English

将数据框中的一列拆分为多列

[英]Splitting a column in a data frame over several columns

I'm loading a csv file that has two columns: date and tags .我正在加载一个包含两列的 csv 文件: datetags tags contains a list of tags like so: tags包含一个标签列表,如下所示:

date,tags
2021-09-08,"#foo, #bar"
2021-09-10,"#bar"
2021-09-15,"#bar, #baz"
2021-09-22,"#bar"

loading it with pandas will result in a data frame where all tags are put into one column like so:用 pandas 加载它会产生一个数据框,其中所有标签都放在一列中,如下所示:

        date            tags
0 2021-09-08      #foo, #bar
1 2021-09-10            #bar
2 2021-09-15      #bar, #baz
3 2021-09-22            #bar

So, how do I create from this a data frame, a data frame where each tag is separated into their own column:那么,我如何从这个数据框创建一个数据框,其中每个标签都被分成自己的列:

        date    foo   bar    baz
0 2021-09-08  True   True  False
1 2021-09-10  False  True  False
2 2021-09-15  False  True   True
3 2021-09-22  False  True  False

Use Series.str.get_dummies with convert 0,1 to boolean and add to date column by DataFrame.join :使用Series.str.get_dummies并将0,1转换为布尔值并通过DataFrame.join添加到date列:

df = df[['date']].join(df['tags'].str.get_dummies(', ').astype(bool))
print(df)
         date  #bar   #baz   #foo
0  2021-09-08  True  False   True
1  2021-09-10  True  False  False
2  2021-09-15  True   True  False
3  2021-09-22  True  False  False

If need remove # add rename with custom function:如果需要删除#添加自定义功能rename

f = lambda x: x.lstrip('#')
df = df[['date']].join(df['tags'].str.get_dummies(', ').astype(bool).rename(columns=f))
print(df)
         date   bar    baz    foo
0  2021-09-08  True  False   True
1  2021-09-10  True  False  False
2  2021-09-15  True   True  False
3  2021-09-22  True  False  False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM