简体繁体中英

Pandas creating a new variable based on two existing variables

原文 2018-06-14 06:43:03 4 2 python/ pandas

I have the following code I think is highly inefficient. Is there a better way to do this type common recoding in pandas?

df['F'] = 0
df['F'][(df['B'] >=3) & (df['C'] >=4.35)] = 1
df['F'][(df['B'] >=3) & (df['C'] < 4.35)] = 2
df['F'][(df['B'] < 3) & (df['C'] >=4.35)] = 3
df['F'][(df['B'] < 3) & (df['C'] < 4.35)] = 4

2 answers

Use numpy.select and cache boolean masks to variables for better performance:

m1 = df['B'] >= 3
m2 = df['C'] >= 4.35
m3 = df['C'] < 4.35
m4 = df['B'] < 3

df['F'] = np.select([m1 & m2, m1 & m3, m4 & m2, m4 & m3], [1,2,3,4], default=0)

In your specific case, you can make use of the fact that booleans are actually integers (False == 0, True == 1) and use simple arithmetic:

df['F'] = 1 + (df['C'] < 4.35) + 2 * (df['B'] < 3)

Note that this will ignore any NaN's in your B and C columns, these will be assigned as being above your limit.

Creating new variables based on two columns as index one column as new variable names python pandas or R

pandas: creating lagged variables of existing variable in a loop

Creating new column based on existing column in pandas

creating new column in pandas based on if and existing column

Creating New Columns in Pandas based on subtracting two variables based on value from different indexes

Python pandas and numpy: assign numerical values to new variable based on multiple conditions for existing variables

Creating new Pandas DataFrame based on two parameters

Python/Pandas - creating new variable based on several variables and if/elif/else function

Creating a variable based on the values of two other variables

Trouble creating new variable based off of existing variable

暂无

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Creating new variables based on two columns as index one column as new variable names python pandas or R pandas: creating lagged variables of existing variable in a loop Creating new column based on existing column in pandas creating new column in pandas based on if and existing column Creating New Columns in Pandas based on subtracting two variables based on value from different indexes Python pandas and numpy: assign numerical values to new variable based on multiple conditions for existing variables Creating new Pandas DataFrame based on two parameters Python/Pandas - creating new variable based on several variables and if/elif/else function Creating a variable based on the values of two other variables Trouble creating new variable based off of existing variable

Related Tags

粤ICP备18138465号 © 2020-2024 STACKOOM.COM