简体   繁体   中英

How to create a column of dummies variables in Pandas?

I have a column of time-series data looks like this

TimeStamp               Data
2002-01-01 00:00:00     0.00120 
2002-01-01 08:00:00     0.00070 
2002-01-01 12:00:00     0.00000 
2002-01-01 16:00:00    -0.00440 
...
2003-01-01 12:00:00     0.00220 
2003-01-01 16:00:00    -0.00440 

In general, there are positive, negative and 0.00000 in the column. I would like to add a dummy column that all positive numbers represented by 1, negative by 0, and 0.00000 by 2. I can do this with a loop, but it doesn't seem a smart idea if I am using Pandas.

Could any one tell me the proper way of doing this in Pandas? Thank you!

You could do something like this:

# initialise a column named sign
df["sign"] = [0]*df.shape[0]

# apply to all cases
df.loc[df["Data"] < 0, "sign"] = 0
df.loc[df["Data"] > 0, "sign"] = 1
df.loc[df["Data"] == 0, "sign"] = 2

There's np.sign which gives 1,0,-1 for +,0,- if it works for you:

df['sign'] = np.sign(df['Data'])

You can use numpy select :

df['dummy'] = np.select((df.Data<0, df.Data>0), (0,1), 2)

I believe this should work.

df.loc[df['Data']>0,'Dummy Column'] = 1
df.loc[df['Data']<0,'Dummy Column'] = 0
df.loc[df['Data']==0,'Dummy Column'] = 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM