简体   繁体   中英

Create NaN column in pandas DataFrame

I saw the following example to illustrate how to create a NaN column in a DataFrame.

import pandas as pd
import numpy as np
import math
import copy
import datetime as dt

"""
Accepts a list of symbols along with start and end date
Returns the Event Matrix which is a pandas Datamatrix
Event matrix has the following structure :
    |IBM |GOOG|XOM |MSFT| GS | JP |
(d1)|nan |nan | 1  |nan |nan | 1  |
(d2)|nan | 1  |nan |nan |nan |nan |
(d3)| 1  |nan | 1  |nan | 1  |nan |
(d4)|nan |  1 |nan | 1  |nan |nan |
...................................
...................................
Also, d1 = start date
nan = no information about any event.
1 = status bit(positively confirms the event occurence)
"""

def find_events(ls_symbols, d_data):
    ''' Finding the event dataframe '''
    df_close = d_data['actual_close']
    ts_market = df_close['SPY']

    print "Finding Events"

    # Creating an empty dataframe
    df_events = copy.deepcopy(df_close) # type <class 'pandas.core.frame.DataFrame'>
    df_events = df_events * np.NAN # << why it works here

I try to duplicate the method as follows:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame

data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
        'year': [2000, 2001, 2002, 2001, 2002],
        'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data)
frame = frame * np.NAN # TypeError: can't multiply sequence by non-int of type 'float'

Q> Why it doesn't work here now?

Because you have the column state which contains string, and multiplying strings with a NaN produces the error. If you really want to set the states to NaN , use frame['state'] = np.NAN .

Note df_close was actually a column , not a dataframe. ( df_close = d_data['actual_close'] . Hence so was df_events ). You have a dataframe with three columns, of which state is a string, which pandas stores as a Python object. And you can't multiply string/object by a number.

Anyway the multiplication is totally unnecessary:

  • all df_close = df_close * np.NaN does is assign NaN to the entire column , in an unnecessarily obfuscated way.
  • It would be far clearer to directly assign = np.NaN . Or to pd.np.NaN
  • if you want to assign NaN to multiple columns do: df[['year','pop']] = pd.np.nan
  • There's no real multiplication going on. Just don't write code like that. Don't abuse the operators...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM