简体   繁体   English

如何提取新的子字符串作为列列

[英]How to extract new substring as column columns

I have a pandas dataframe a colum named: entity when I print the column via:当我通过以下方式打印列时,我有一个名为实体的 Pandas 数据框:

df.entity

The output looks like this (I have 267 rows this is just the first two rows)输出看起来像这样(我有 267 行,这只是前两行)

[(East, NNP), (India, CTR), (Company, ORG)]
[(Pasteur, ZZP)] 

How can I get a new column where the output is like:我怎样才能得到一个新列,其中的输出是这样的:

East, India, Company
Pasteur

Option 1选项 1
zip and iterators zip和迭代器

df.assign(entity=[', '.join(next(zip(*r))) for r in df.entity])

                 entity
0  East, India, Company
1               Pasteur

Option 2选项 2
A comprehension verion of @Zero's answer. @Zero 答案的理解版本。 Should be quicker.应该更快。

df.assign(entity=[', '.join([x[0] for x in r]) for r in df.entity])

                 entity
0  East, India, Company
1               Pasteur

Setup设置

df = pd.DataFrame(dict(
    entity=[
        [('East', 'NNP'), ('India', 'CTR'), ('Company', 'ORG')],
        [('Pasteur', 'ZZP')]
    ]))

Use apply使用apply

In [4697]: df.entity.apply(lambda x: ', '.join(t[0] for t in x))
Out[4697]:
0    East, India, Company
1                 Pasteur
Name: entity, dtype: object

Details详情

                                        entity
0  [(East, NNP), (India, CTR), (Company, ORG)]
1                             [(Pasteur, ZZP)]

Here is another solution这是另一个解决方案

df['New']=df.entity.apply(pd.Series).stack().apply(pd.Series).groupby(level=0)[0].agg(lambda x: ','.join(set(x)))
df
Out[74]: 
                                        entity                 New
0  [(East, NNP), (India, CTR), (Company, ORG)]  India,Company,East
1                             [(Pasteur, ZZP)]             Pasteur

Data Input数据输入

df=pd.DataFrame({'entity':[[('East', 'NNP'), ('India', 'CTR'), ('Company', 'ORG')],[('Pasteur', 'ZZP')] ]})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM