![](/img/trans.png)
[英]Create a new column in Pandas Dataframe based on the 'NaN' values in another column
[英]Create NaN column in pandas DataFrame
我看到以下示例来说明如何在 DataFrame 中创建 NaN 列。
import pandas as pd
import numpy as np
import math
import copy
import datetime as dt
"""
Accepts a list of symbols along with start and end date
Returns the Event Matrix which is a pandas Datamatrix
Event matrix has the following structure :
|IBM |GOOG|XOM |MSFT| GS | JP |
(d1)|nan |nan | 1 |nan |nan | 1 |
(d2)|nan | 1 |nan |nan |nan |nan |
(d3)| 1 |nan | 1 |nan | 1 |nan |
(d4)|nan | 1 |nan | 1 |nan |nan |
...................................
...................................
Also, d1 = start date
nan = no information about any event.
1 = status bit(positively confirms the event occurence)
"""
def find_events(ls_symbols, d_data):
''' Finding the event dataframe '''
df_close = d_data['actual_close']
ts_market = df_close['SPY']
print "Finding Events"
# Creating an empty dataframe
df_events = copy.deepcopy(df_close) # type <class 'pandas.core.frame.DataFrame'>
df_events = df_events * np.NAN # << why it works here
我尝试按如下方式复制该方法:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data)
frame = frame * np.NAN # TypeError: can't multiply sequence by non-int of type 'float'
Q> 为什么它现在在这里不起作用?
因为您有包含字符串的列state
,并且将字符串与NaN
相乘会产生错误。 如果您真的想将状态设置为NaN
,请使用frame['state'] = np.NAN
。
注意df_close
实际上是一个column ,而不是一个数据df_close
。 ( df_close = d_data['actual_close']
。因此df_events
也是df_events
)。 您有一个包含三列的数据框,其中state
是一个字符串,pandas 将其存储为 Python 对象。 并且您不能将字符串/对象乘以一个数字。
无论如何,乘法是完全没有必要的:
df_close = df_close * np.NaN
所做的是以一种不必要的混淆方式将 NaN 分配给整个 column 。= np.NaN
会清晰= np.NaN
。 或者到pd.np.NaN
df[['year','pop']] = pd.np.nan
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.