如何提取新的子字符串作为列列

Question

I have a pandas dataframe a colum named: entity when I print the column via:当我通过以下方式打印列时，我有一个名为实体的 Pandas 数据框：

df.entity

The output looks like this (I have 267 rows this is just the first two rows)输出看起来像这样（我有 267 行，这只是前两行）

[(East, NNP), (India, CTR), (Company, ORG)]
[(Pasteur, ZZP)]

How can I get a new column where the output is like:我怎样才能得到一个新列，其中的输出是这样的：

East, India, Company
Pasteur

Answer 1

Option 1选项 1
zip and iterators zip和迭代器

df.assign(entity=[', '.join(next(zip(*r))) for r in df.entity])

                 entity
0  East, India, Company
1               Pasteur

Option 2选项 2
A comprehension verion of @Zero's answer. @Zero 答案的理解版本。 Should be quicker.应该更快。

df.assign(entity=[', '.join([x[0] for x in r]) for r in df.entity])

                 entity
0  East, India, Company
1               Pasteur

Setup设置

df = pd.DataFrame(dict(
    entity=[
        [('East', 'NNP'), ('India', 'CTR'), ('Company', 'ORG')],
        [('Pasteur', 'ZZP')]
    ]))

Answer 2

Use apply使用apply

In [4697]: df.entity.apply(lambda x: ', '.join(t[0] for t in x))
Out[4697]:
0    East, India, Company
1                 Pasteur
Name: entity, dtype: object

Details详情

                                        entity
0  [(East, NNP), (India, CTR), (Company, ORG)]
1                             [(Pasteur, ZZP)]

Answer 3

Here is another solution这是另一个解决方案

df['New']=df.entity.apply(pd.Series).stack().apply(pd.Series).groupby(level=0)[0].agg(lambda x: ','.join(set(x)))
df
Out[74]: 
                                        entity                 New
0  [(East, NNP), (India, CTR), (Company, ORG)]  India,Company,East
1                             [(Pasteur, ZZP)]             Pasteur

Data Input数据输入

df=pd.DataFrame({'entity':[[('East', 'NNP'), ('India', 'CTR'), ('Company', 'ORG')],[('Pasteur', 'ZZP')] ]})

如何提取新的子字符串作为列列

问题描述

3 个解决方案

解决方案1
3 已采纳 2017-09-28 21:24:53

解决方案2
2 2017-09-28 21:15:36

解决方案3
2 2017-09-28 21:24:07

如何提取新的子字符串作为列列

问题描述

3 个解决方案

解决方案1 3 已采纳 2017-09-28 21:24:53

解决方案2 2 2017-09-28 21:15:36

解决方案3 2 2017-09-28 21:24:07

解决方案1
3 已采纳 2017-09-28 21:24:53

解决方案2
2 2017-09-28 21:15:36

解决方案3
2 2017-09-28 21:24:07