I have a pandas dataframe
dfcolour:
A B C D E
0 red 0 redy 1 red
1 blue 1 bluey 2 bluey
2 green 0 greeny 0 greenz
I want to check the values in column E and compare them to columns A & C. If the value in E matches the value in A for the same row then I want to increment the value in B, if it matches the value in C for the same row, then I will increment the value in D, if it doesn't match either, I want to create 2 new columns F & G with F having the new string and G as an integer column being incremented starting at 0
The new dfcolour will look like:
A B C D E F G
0 red 1 redy 1 red 0
1 blue 1 bluey 3 bluey 0
2 green 0 greeny 0 greenz greenz 1
Is it possible to do this without iterating?
Code to create dataframe:
dfObject = pd.DataFrame()
dfObject.set_value(1, 'A', 'red')
dfObject.set_value(1, 'B', 0)
dfObject.set_value(1, 'C', 'redy')
dfObject.set_value(1, 'D', 1)
dfObject.set_value(1, 'E', 'red')
dfObject.set_value(2, 'A', 'blue')
dfObject.set_value(2, 'B', 1)
dfObject.set_value(2, 'C', 'bluey')
dfObject.set_value(1, 'D', 2)
dfObject.set_value(1, 'E', 'bluey')
dfObject.set_value(3, 'A', 'green')
dfObject.set_value(3, 'B', 0)
dfObject.set_value(3, 'C', 'greeny')
dfObject.set_value(1, 'D', 0)
dfObject.set_value(1, 'E', 'greenz')
You can create those conditions and use numpy.where
to construct new columns:
AE = df.A == df.E
CE = df.C == df.E
df['B'] += AE # if A == E, add one to B
df['D'] += CE # if C == E, add one to D
df['F'] = pd.np.where(~(AE|CE), df.E, '') # else create F
df['G'] = pd.np.where(~(AE|CE), 1, 0) # else create G
df
# A B C D E F G
#0 red 1 redy 1 red 0
#1 blue 1 bluey 3 bluey 0
#2 green 0 greeny 0 greenz greenz 1
Well sorry, my first idea was to do a kind of iteration: you can apply a function to the rows of the DataFrame
, and return one or more columns. This is usually how I do it. It is still iterating, but kind of a "better" way to do than using iterrows
.
def special_function(row):
b = row['B']
d = row['D']
f = None
g = 0
if row['E'] == row['A']:
b = b + 1
elif row['E'] == row['C']:
d = d + 1
else:
f = row['E']
g = 1
return pandas.Series({ 'B':b, 'D':d, 'F': f, 'G': g })
dfcolour[['B', 'D', 'F', 'G']] = dfcolour.apply(special_function, axis=1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.