简体   繁体   English

Pandas数据框基于多个if语句添加字段

[英]Pandas dataframe add a field based on multiple if statements

I'm quite new to Python and Pandas so this might be an obvious question. 我对Python和Pandas很陌生,所以这可能是一个显而易见的问题。

I have a dataframe with ages listed in it. 我有一个列有年龄的数据框。 I want to create a new field with an age banding. 我想创建一个带有年龄段的新领域。 I can use the lambda statement to capture a single if / else statement but I want to use multiple if's eg if age < 18 then 'under 18' elif age < 40 then 'under 40' else '>40' . 我可以使用lambda语句捕获单个if / else语句,但我想使用多个if,例如, if age < 18 then 'under 18' elif age < 40 then 'under 40' else '>40'

I don't think I can do this using lambda but am not sure how to do it in a different way. 我不认为我可以使用lambda做到这一点,但我不知道如何以不同的方式做到这一点。 I have this code so far: 到目前为止我有这个代码:

import pandas as pd
import numpy as n

d = {'Age' : pd.Series([36., 42., 6., 66., 38.]) }

df = pd.DataFrame(d)

df['Age_Group'] =  df['Age'].map(lambda x: '<18' if x < 19 else '>18')

print(df)

The pandas DataFrame provides a nice querying ability. pandas DataFrame提供了很好的查询功能。

What you are trying to do can be done simply with: 你想要做的只是简单地完成:

# Set a default value
df['Age_Group'] = '<40'
# Set Age_Group value for all row indexes which Age are greater than 40
df['Age_Group'][df['Age'] > 40] = '>40'
# Set Age_Group value for all row indexes which Age are greater than 18 and < 40
df['Age_Group'][(df['Age'] > 18) & (df['Age'] < 40)] = '>18'
# Set Age_Group value for all row indexes which Age are less than 18
df['Age_Group'][df['Age'] < 18] = '<18'

The querying here is a powerful tool of the dataframe and will allow you to manipulate the DataFrame as you need. 这里的查询是数据框的强大工具,允许您根据需要操作DataFrame。

For more complex conditionals, you can specify multiple conditions by encapsulating each condition in parenthesis and separating them with a boolean operator ( eg. '&' or '|') 对于更复杂的条件,您可以通过将每个条件封装在括号中并使用布尔运算符(例如'&'或'|')分隔它们来指定多个条件。

You can see this in work here for the second conditional statement for setting >18. 你可以在这里看到这个用于设置> 18的第二个条件语句。

Edit: 编辑:

You can read more about indexing of DataFrame and conditionals: 您可以阅读有关DataFrame和条件的索引的更多信息:

http://pandas.pydata.org/pandas-docs/dev/indexing.html#index-objects http://pandas.pydata.org/pandas-docs/dev/indexing.html#index-objects

Edit: 编辑:

To see how it works: 要了解它是如何工作的:

>>> d = {'Age' : pd.Series([36., 42., 6., 66., 38.]) }
>>> df = pd.DataFrame(d)
>>> df
   Age
0   36
1   42
2    6
3   66
4   38
>>> df['Age_Group'] = '<40'
>>> df['Age_Group'][df['Age'] > 40] = '>40'
>>> df['Age_Group'][(df['Age'] > 18) & (df['Age'] < 40)] = '>18'
>>> df['Age_Group'][df['Age'] < 18] = '<18'
>>> df
   Age Age_Group
0   36       >18
1   42       >40
2    6       <18
3   66       >40
4   38       >18

Edit: 编辑:

To see how to do this without the chaining [using EdChums approach]. 要查看如何在没有链接的情况下执行此操作[使用EdChums方法]。

>>> df['Age_Group'] = '<40'
>>> df.loc[df['Age'] < 40,'Age_Group'] = '<40'
>>> df.loc[(df['Age'] > 18) & (df['Age'] < 40), 'Age_Group'] = '>18'
>>> df.loc[df['Age'] < 18,'Age_Group'] = '<18'
>>> df
   Age Age_Group
0   36       >18
1   42       <40
2    6       <18
3   66       <40
4   38       >18

You can also do a nested np.where() 你也可以做一个嵌套的np.where()

df['Age_group'] = np.where(df.Age<18, 'under 18',
                           np.where(df.Age<40,'under 40', '>40'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM