简体   繁体   English

根据另一列中的条件填充新列

[英]Populate new column based on conditions in another column

I'm playing about with Python and pandas. 我正在玩Python和熊猫。

I have created a dataframe, I have a column (axis 1) called 'County' but I need to create a column called 'Region' and populate it like this (atleast I think): 我已经创建了一个数据框,我有一个名为“县”的列(轴1),但是我需要创建一个名为“ Region”的列并像这样填充(至少我认为):

If County column == 'Suffolk' or 'Norfolk' or 'Essex' then in Region column insert 'East Anglia'

If County column == 'Kent' or 'East Sussex' or 'West Sussex' then in Region Column insert 'South East'

If County column == 'Dorset' or 'Devon' or 'Cornwall' then in Region Column insert 'South West'

and so on... 等等...

So far I have this: 到目前为止,我有这个:

myDataFrame['Region'] = np.where(myDataFrame['County']=='Suffolk', 'East   Anglia', '')

But I suspect this won't work for any other counties 但是我怀疑这对其他任何县都行不通

As I'm sure is obvious I am a beginner. 我敢肯定,我是个初学者。 I have tried googling and reading but only could find out about numpy where, which got me this far. 我曾尝试使用谷歌搜索和阅读,但只能找到有关numpy的信息,这使我走到了这一步。

You'll definitely need df.isin and loc based indexing: 您肯定需要基于df.isinloc的索引:

df['Region'] = np.nan
df.loc[df.County.isin(['Suffolk','Norfolk', 'Essex']), 'Region'] = 'East Anglia'
df.loc[df.County.isin(['Kent', 'East Sussex', 'West Sussex']), 'Region'] = 'South East'
df.loc[df.County.isin(['Dorset', 'Devon', 'Cornwall']), 'Region'] = 'South West'

You could also create a mapping of sorts and use df.map or df.replace : 您还可以创建各种映射并使用df.mapdf.replace

mapping = { 'Suffolk' : 'East Anglia', 'Norfolk': 'East Anglia', ... 'Kent'  :'South East', ..., ... }
df['Region'] = df.County.map(mapping) 

I would prefer a map here because it would convert non-matches to NaN , which would be the ideal thing. 我更喜欢这里的地图,因为它将不匹配项转换为NaN ,这将是理想的选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM