简体   繁体   中英

Assigning value to new column ['E'] based on column ['A'] value using dataframes

In the example below. I am trying to generate a column 'E' that is assigned either [1 or 2] depending on a conditional statement on column A.

I've tried various options but they throw a slicing error. (Should it not be something like this to assign a value to new column 'E'?

df2= df.loc[df['A'] == 'foo']['E'] = 1

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print('Filter the content')
df2= df.loc[df['A'] == 'foo']
print(df2)

#      A      B  C   D   E 
# 0  foo    one  0   0   1
# 2  foo    two  2   4   1
# 4  foo    two  4   8   1
# 6  foo    one  6  12   1
# 7  foo  three  7  14   1

df3= df.loc[df['A'] == 'bar']
print(df3)

#      A      B  C   D   E
# 1  bar    one  1   2   2
# 3  bar  three  3   6   2
# 5  bar    two  5  10   2

#Combile df2 and df3 back to df and print df
print(df)
#      A      B  C   D   E
# 0  foo    one  0   0   1
# 1  bar    one  1   2   2 
# 2  foo    two  2   4   1
# 3  bar  three  3   6   2
# 4  foo    two  4   8   1
# 5  bar    two  5  10   2
# 6  foo    one  6  12   1
# 7  foo  three  7  14   1

那简单吗?

df['E'] = np.where(df['A'] == 'foo', 1, 2)

This does what I think you're trying to do. Create a column E in your dataframe that is 1 if A==foo, and 2 if A!=foo.

df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
df['E']=np.ones([df.shape[0],])*2
df.loc[df.A=='foo','E']=1
df.E=df.E.astype(int)
print(df)

Note: Your suggested solution df2= df.loc[df['A'] == 'foo']['E'] = 1 uses serial slicing, rather than taking advantage of loc. To slice df rows by the first conditional and return the column E, you should instead use df.loc[df['A']=='foo','E']

Note II: If you have more than one conditional, you could also use .replace() and pass in a dictionary. In this case mapping foo to 1, bar to 2, and so on.

for brevity (characters)

df.assign(E=df.A.ne('foo')+1)

     A      B  C   D  E
0  foo    one  0   0  1
1  bar    one  1   2  2
2  foo    two  2   4  1
3  bar  three  3   6  2
4  foo    two  4   8  1
5  bar    two  5  10  2
6  foo    one  6  12  1
7  foo  three  7  14  1

for brevity (time)

df.assign(E=(df.A.values != 'foo') + 1)

     A      B  C   D  E
0  foo    one  0   0  1
1  bar    one  1   2  2
2  foo    two  2   4  1
3  bar  three  3   6  2
4  foo    two  4   8  1
5  bar    two  5  10  2
6  foo    one  6  12  1
7  foo  three  7  14  1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM