简体   繁体   中英

How can I create a Pandas column based on another column with a date?

My csv has:

Date,Open,High,Low,Close,Adj Close,Volume,dOpen,dHigh,dLow,dClose,dVolume
1/29/93,43.96875,43.96875,43.75,43.9375,26.45393,1003200,0,0,0,0,0
2/1/93,43.96875,44.25,43.96875,44.25,26.642057,480500,0,0.006396588,0.005,0.007112376,0.007111495
2/2/93,44.21875,44.375,44.125,44.34375,26.698507,201300,0.005685856,0.002824859,0.00355366,0.002118644,0.00211883
2/3/93,44.40625,44.84375,44.375,44.8125,26.980742,529400,0.004240283,0.01056338,0.005665722,0.010570825,0.01057119
2/4/93,44.96875,45.09375,44.46875,45,27.093624,531500,0.012667136,0.005574913,0.002112676,0.0041841,0.004183799

I am doing:

    spy_data = pd.read_csv('data/SPY_daily.csv')
    spy_data['day_of_week'] = spy_data.apply(
        lambda row: datetime.strptime(row['Date'],  "%m/%d/%Y"))
    print(spy_data)

But I get an error

KeyError: ('Date', 'occurred at index Date')

What am I doing incorrectly?

You also need to specify axis=1 when using apply() function in order to indicate that it should be applied on row-level.

axis : {0 or 'index', 1 or 'columns'}, default 0

Axis along which the function is applied:

  • 0 or 'index': apply function to each column.

  • 1 or 'columns': apply function to each row.

The following should do the trick:

spy_data = pd.read_csv('data/SPY_daily.csv')
spy_data['day_of_week'] = spy_data.apply(
    lambda row: datetime.strptime(row['Date'],  "%m/%d/%Y"), axis=1)
print(spy_data)

instead of using apply it's better to treat the date as a proper datetime and use functionality of dt methods

eg

spy_data = pd.read_csv(StringIO(your_data),sep=',')
spy_data['Date'] = pd.to_datetime(spy_data['Date'])

day_of_week = spy_data['Date'].dt.strftime("%m/%d/%Y")

print(day_of_week)

0    01/29/1993
1    02/01/1993
2    02/02/1993
3    02/03/1993
4    02/04/1993
Name: Date, dtype: object

if your date is set to your index :

spy_data.index.strftime("%m/%d/%Y")
out:
Index(['01/29/1993', '02/01/1993', '02/02/1993', '02/03/1993', '02/04/1993'], dtype='object')

The issue is that pandas.read_csv puts the first column as a index of your dataframe, so it is not in columns. In order to avoid it you should explicitly say pandas to not do this:

spy_data = pd.read_csv('data/SPY_daily.csv', index_col=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM