[英]How to extract new substring as column columns
I have a pandas dataframe a colum named: entity when I print the column via:当我通过以下方式打印列时,我有一个名为实体的 Pandas 数据框:
df.entity
The output looks like this (I have 267 rows this is just the first two rows)输出看起来像这样(我有 267 行,这只是前两行)
[(East, NNP), (India, CTR), (Company, ORG)]
[(Pasteur, ZZP)]
How can I get a new column where the output is like:我怎样才能得到一个新列,其中的输出是这样的:
East, India, Company
Pasteur
Option 1选项 1
zip
and iterators zip
和迭代器
df.assign(entity=[', '.join(next(zip(*r))) for r in df.entity])
entity
0 East, India, Company
1 Pasteur
Option 2选项 2
A comprehension verion of @Zero's answer. @Zero 答案的理解版本。 Should be quicker.
应该更快。
df.assign(entity=[', '.join([x[0] for x in r]) for r in df.entity])
entity
0 East, India, Company
1 Pasteur
Setup设置
df = pd.DataFrame(dict(
entity=[
[('East', 'NNP'), ('India', 'CTR'), ('Company', 'ORG')],
[('Pasteur', 'ZZP')]
]))
Use apply
使用
apply
In [4697]: df.entity.apply(lambda x: ', '.join(t[0] for t in x))
Out[4697]:
0 East, India, Company
1 Pasteur
Name: entity, dtype: object
Details详情
entity
0 [(East, NNP), (India, CTR), (Company, ORG)]
1 [(Pasteur, ZZP)]
Here is another solution这是另一个解决方案
df['New']=df.entity.apply(pd.Series).stack().apply(pd.Series).groupby(level=0)[0].agg(lambda x: ','.join(set(x)))
df
Out[74]:
entity New
0 [(East, NNP), (India, CTR), (Company, ORG)] India,Company,East
1 [(Pasteur, ZZP)] Pasteur
Data Input数据输入
df=pd.DataFrame({'entity':[[('East', 'NNP'), ('India', 'CTR'), ('Company', 'ORG')],[('Pasteur', 'ZZP')] ]})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.