简体   繁体   中英

Create a new column based on the values of another column in a dataframe

Let's say that I have a 6 column DataFrame like this:

                  close     high     low    open     volume     change
ts                                                             
2017-08-24 13:00:00  921.28  930.840  915.50  928.66  1270306.0     -7.38
2017-08-25 13:00:00  915.89  925.555  915.50  923.49  1053376.0     -7.6
2017-08-28 13:00:00  913.81  919.245  911.87  916.00  1086484.0     -2.19
2017-08-29 13:00:00  921.29  923.330  905.00  905.10  1185564.0     16.19
2017-08-30 13:00:00  929.57  930.819  919.65  920.05  1301225.0     9.52
2017-08-31 13:00:00  939.33  941.980  931.76  931.76  1560033.0     7.51

How do I add a column that for each row shows 1 if the change > 0.0 else 0?

Option 1

Using boolean filtering:

df['newCol'] = (df.change > 0).astype(int)
df['newCol'] 

ts
2017-08-24 13:00:00    0
2017-08-25 13:00:00    0
2017-08-28 13:00:00    0
2017-08-29 13:00:00    1
2017-08-30 13:00:00    1
2017-08-31 13:00:00    1
Name: newCol, dtype: int64

Option 2

Using np.where .

df['newCol'] = np.where(df.change > 0.0, 1, 0)
df['newCol']

ts
2017-08-24 13:00:00    0
2017-08-25 13:00:00    0
2017-08-28 13:00:00    0
2017-08-29 13:00:00    1
2017-08-30 13:00:00    1
2017-08-31 13:00:00    1
Name: newCol, dtype: int64

Option 3

Using df.gt :

df['newCol'] = df.change.gt(0).astype(int)  
df['newCol']  

ts
2017-08-24 13:00:00    0
2017-08-25 13:00:00    0
2017-08-28 13:00:00    0
2017-08-29 13:00:00    1
2017-08-30 13:00:00    1
2017-08-31 13:00:00    1
Name: newCol, dtype: int64

Performance

Small

%timeit (df.change > 0).astype(int)
1000 loops, best of 3: 276 µs per loop

%timeit np.where(df.change > 0.0, 1, 0)
10000 loops, best of 3: 209 µs per loop

%timeit df.change.gt(0).astype(int) 
1000 loops, best of 3: 351 µs per loop

Large

df_test = pd.concat([df] * 10000, 0) # Setup

%timeit (df_test.change > 0).astype(int)
1000 loops, best of 3: 377 µs per loop

%timeit np.where(df_test.change > 0.0, 1, 0)
1000 loops, best of 3: 328 µs per loop

%timeit  df_test.change.gt(0).astype(int) 
1000 loops, best of 3: 425 µs per loop

And...

%timeit df_test.change.apply(lambda x: 1 if x > 0 else 0)
10 loops, best of 3: 24.5 ms per loop
df['new_column']=df.apply(lambda row: value_return(row['change']),axis=1)

def value_return(change_variable):

     if(change_variable>0):
          m=1
     else:
          m=0
     return m

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM