I have a dataframe with two columns 'a' and 'b' where 'b' is the difference between the value of 'a' and the previous value 'a'
df = pd.DataFrame({'a': [10, 60, 30, 80, 10]})
df['b'] = df['a']-df['a'].shift(1)
a b
0 10 NaN
1 60 50.0
2 30 -30.0
3 80 50.0
4 10 -70.0
I want to create a new column 'c' with values as a list of previous value of 'a' and the current value of 'a' (example, [60,30]) only where the column 'b' is negative. Otherwise it has to be a list of the current value 'a' itself.
The resulting output should look like
a b c
0 10 NaN [10]
1 60 50.0 [60]
2 30 -30.0 [60, 30]
3 80 50.0 [80]
4 10 -70.0 [80, 10]
Use list comprehension for create lists if b < 0
in numpy array with shifted helper column s
by Series.shift
added by DataFrame.assign
:
arr = df.assign(s = df['a'].shift(fill_value=0))[['a','b','s']].to_numpy()
df['c'] = [[s,a] if b < 0 else [a] for a,b,s in arr]
print (df)
a b c
0 10 NaN [10.0]
1 60 50.0 [60.0]
2 30 -30.0 [60.0, 30.0]
3 80 50.0 [80.0]
4 10 -70.0 [80.0, 10.0]
Or is used Series.mask
with one element list created by list comprenension:
s = pd.Series([[x] for x in df['a']], index=df.index)
#alternative
s = df['a'].apply(lambda x: [x])
df['c'] = s.mask(df['b'].lt(0), s.shift() + s)
print (df)
a b c
0 10 NaN [10]
1 60 50.0 [60]
2 30 -30.0 [60, 30]
3 80 50.0 [80]
4 10 -70.0 [80, 10]
Use Series.to_numpy
and increase the dimension by adding the newaxis then use boolean indexing with Series.lt
and assign the new values:
df['c'] = df['a'].to_numpy()[:, None].tolist()
df.loc[df['b'].lt(0), 'c'] = df['c'].shift() + df['c']
Result:
a b c
0 10 NaN [10]
1 60 50.0 [60]
2 30 -30.0 [60, 30]
3 80 50.0 [80]
4 10 -70.0 [80, 10]
Load the data:
df = pd.DataFrame({'a': [10, 60, 30, 80, 10]})
df['b'] = df['a']-df['a'].shift(1)
Create a temporary Numpy matrix:
npa = np.array([df['a'].shift(1), df['a']]).transpose()
Write the matrix to a new df column 'c':
df['c'] = list(npa)
Copy values in 'a' to 'c' if values in column 'b' are larger than 0 or NAN:
df.loc[(df['b'] > 0) | (df['b'].isnull() == True) , 'c'] = pd.Series([[x] for x in df['a']])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.