简体   繁体   English

数据框字典中的Pandas面板返回NaN

[英]Pandas Panel from Dict of Dataframes Returns NaNs

I have a group of DataFrames that I'm trying to turn into a Panel. 我有一组要尝试变成面板的DataFrame。 Here is my code: 这是我的代码:

# OPEN THE FILES INTO DATAFRAMES
filenames = ['Yahoo_2016-01-17.csv', 'Yahoo_2016-01-18.csv',
    'Yahoo_2016-01-19.csv','Yahoo_2016-01-23.csv','Yahoo_2016-01-27.csv',     
    'Yahoo_2016-02-05.csv', 'Yahoo_2016-02-06.csv', 'Yahoo_2016-02-09.csv',     
    'Yahoo_2016-02-11.csv', 'Yahoo_2016-02-13.csv', 'Yahoo_2016-02-15.csv', 
    'Yahoo_2016-02-16.csv', 'Yahoo_2016-02-29.csv']

dates = np.array(['2016-01-17', '2016-01-18', '2016-01-19', '2016-01-23', 
    '2016-01-27', '2016-02-05', '2016-02-06','2016-02-09', 
    '2016-02-11', '2016-02-13', '2016-02-15', '2016-02-16',
    '2016-02-29']).astype('datetime64[D]')

filepath = '/Users/RickS/Documents/Investing/Stock_files/GENERAL/'

dfs = [pd.read_csv(filepath+f) for f in filenames]

# Panel not working...
panel = pd.Panel(dict([(date, df) for date in dates for df in dfs]))
panel.swapaxes('major','minor')

However when I try to read the panel, all the values in each dataframe have turned into NaNs: 但是,当我尝试阅读面板时,每个数据帧中的所有值都变为NaN:

数据就是NaN

When I look at the dataframes individually they all look fine. 当我单独查看数据帧时,它们看起来都很好。 Here is one of the csv files that gets imported into df: example_csv_file 这是导入到df中的csv文件之一: example_csv_file

One thing to note that may (or may not) be important is that the dtypes for each dataframe are not all the same: 需要注意的一件事可能(或可能不重要)是每个数据帧的dtype都不相同:

In [24]: dfs[1].dtypes
Out[24]: 
Name                          object
Symbol                        object
Previous_Close               float64
Average_Daily_Volume           int64
Change_&_Percent_Change       object
Earnings/Share               float64
EPS_Estimate_Current_Year    float64
EPS_Estimate_Next_Quarter    float64
EPS_Estimate_Next_Year       float64
52-week_Low                  float64
52-week_High                 float64
EBITDA                        object
200-day_Moving_Average       float64
P/E_Ratio                    float64
PEG_Ratio                    float64
Short_Ratio                  float64
1_yr_Target_Price            float64
52-week_Range                 object
Date                          object
dtype: object

What am I doing wrong? 我究竟做错了什么?

Reason for empty panel with all NaNs is your dates numpy array currently stored as datetime64 types. 具有所有NaN的空白面板的原因是您的dates numpy数组当前存储为datetime64类型。 Apparently, the pandas panel object does not work well with underlying dictionary keys. 显然,pandas面板对象不能与基础字典键配合使用。

Simply remove the astype or even use a list or tuple which will render dates as string keys. 只需删除astype ,甚至使用将日期显示为字符串键的列表或元组。 But since dictionary keys are measured through days, each will be unique for your panel needs. 但是由于字典键是按天计算的,因此每个键对于您的面板需求都是唯一的。

dates = np.array(['2016-01-17', '2016-01-18', '2016-01-19', '2016-01-23', 
                  '2016-01-27', '2016-02-05', '2016-02-06','2016-02-09', 
                  '2016-02-11', '2016-02-13', '2016-02-15', '2016-02-16',
                  '2016-02-29'])

dates = ['2016-01-17', '2016-01-18', '2016-01-19', '2016-01-23', 
         '2016-01-27', '2016-02-05', '2016-02-06','2016-02-09', 
         '2016-02-11', '2016-02-13', '2016-02-15', '2016-02-16',
         '2016-02-29']

However, this brings my earlier find. 但是,这带来了我先前的发现。 Currently, the list comprehension within the dict() function will return a panel of only the last data frame, repeated 13 times. 当前, dict()函数中的列表理解将仅返回最后一个数据帧的面板,重复13次。 Reason being the list comprehension below returns a total combination set between the dfs list and dates array with a length equal to the product of both collections: 13 X 13 (ie, cross join/cartesian join). 原因是下面的列表理解会返回dfs列表和dates数组之间的总组合集,其长度等于两个集合的乘积:13 X 13(即,交叉联接/笛卡尔联接)。 Output below to see: 输出如下:

[(date, df) for date in dates for df in dfs]

Once you apply dict() to above, you force the 13 unique dates to carry the value of last df , essentially pulling in the last combination pairing . 一旦将dict()应用于上述内容,就可以强制13个唯一的dates携带last df的值,实际上是拉入last 组合对

Consider using zip() to iterate over each item of both collections together: 考虑使用zip()一起迭代两个集合的每个项目:

dfDict = {}
for f,d in zip(filenames, dates):    
    dfDict[d] = pd.read_csv(filepath+f)    

panel = pd.Panel(dfDict)

Or the shorter: 或更短:

dfs = [pd.read_csv(filepath+f) for f in filenames] 
panel = pd.Panel(dict([i for i in zip(dates, dfs)]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM