简体   繁体   English

python KeyError:'日期时间'

[英]python KeyError: 'date-time'

I have a dataframe pd1 got with pandas 我有一个熊猫附带的数据pd1

pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',sep=':',
                  header=None, names  = ['date-time','domain','requests-qty','response-bytes'],
                   parse_dates=[1], converters={'date-time': to_datetime}, index_col = 'date-time')

with index 带索引

>> pd1.index:  

 DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                ...
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00'],
               dtype='datetime64[ns]', name='date-time', length=6084158, freq=None)

But when I want to set index to that colomn, I get error as below (I initially wanted to set multiple columns index, that error appeared, then tried to created other dataframe from it pd_new_index = pd1.set_index(['requests-qty','domain']) with other columns as index (ok) and to make new frame also setting index to 'date-time' column back pd_new_2 = pd_new_index.set_index(['date-time']) - same error). 但是,当我想为该列设置索引时,出现如下错误(我最初想设置多列索引,但出现了该错误,然后尝试从中创建其他数据pd_new_index = pd1.set_index(['requests-qty','domain']) ,将其他列设为索引(确定),并要使新框架还将索引设置为'date-time'列,则返回pd_new_2 = pd_new_index.set_index(['date-time']) -相同的错误)。 'date-time' does not look like special keyword and also that column is index now. “日期时间”看起来不像特殊关键字,而且该列现在是索引。 Why error? 为什么会出错?

KeyError Traceback (most recent call last) C:\\ProgramData\\Anaconda3\\lib\\site-packages\\pandas\\core\\indexes\\base.py in get_loc(self, key, method, tolerance) 2656 try: -> 2657 return self._engine.get_loc(key) 2658 except KeyError: KeyError Traceback(最近一次通话最近一次)get_loc中的C:\\ ProgramData \\ Anaconda3 \\ lib \\ site-packages \\ pandas \\ core \\ indexes \\ base.py(self,key,method,tolerance)2656尝试:-> 2657返回self。 _engine.get_loc(key)2658,但KeyError除外:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas._libs.index.IndexEngine.get_loc()中的pandas / _libs / index.pyx

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas._libs.index.IndexEngine.get_loc()中的pandas / _libs / index.pyx

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas中的pandas / _libs / hashtable_class_helper.pxi._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas中的pandas / _libs / hashtable_class_helper.pxi._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'date-time' KeyError:“日期时间”

During handling of the above exception, another exception occurred: 在处理上述异常期间,发生了另一个异常:

KeyError Traceback (most recent call last) in ----> 1 pd_new_2 = pd_new_index.set_index(['date-time']) ----> 1中的KeyError追溯(最近一次通话最后一次)1 pd_new_2 = pd_new_index.set_index(['date-time'])

C:\\ProgramData\\Anaconda3\\lib\\site-packages\\pandas\\core\\frame.py in set_index(self, keys, drop, append, inplace, verify_integrity) 4176 names.append(None) 4177 else: -> 4178 level = frame[col]._values 4179 names.append(col) 4180 if drop: C:\\ ProgramData \\ Anaconda3 \\ lib \\ site-packages \\ pandas \\ core \\ frame.py in set_index(self,keys,drop,append,inplace,verify_integrity)4176 names.append(None)4177 else:-> 4178等级= frame [col] ._ values 4179名称。如果下落则追加(col)4180:

C:\\ProgramData\\Anaconda3\\lib\\site-packages\\pandas\\core\\frame.py in getitem (self, key) 2925 if self.columns.nlevels > 1: 2926 return self._getitem_multilevel(key) -> 2927 indexer = self.columns.get_loc(key) 2928 if is_integer(indexer): 2929 indexer = [indexer] 如果self.columns.nlevels> 1:2926返回self._getitem_multilevel(key)-> 2927 indexer = C:\\ ProgramData \\ Anaconda3 \\ lib \\ site-packages \\ pandas \\ core \\ frame.py in getitem (self,key)2925 self.columns.get_loc(key)2928 if is_integer(indexer):2929 indexer = [indexer]

C:\\ProgramData\\Anaconda3\\lib\\site-packages\\pandas\\core\\indexes\\base.py in get_loc(self, key, method, tolerance) 2657 C:\\ ProgramData \\ Anaconda3 \\ lib \\ site-packages \\ pandas \\ core \\ indexes \\ base.py in get_loc(self,key,method,tolerance)2657
return self._engine.get_loc(key) 2658 except KeyError: -> 2659 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2660 返回self._engine.get_loc(key)2658,但KeyError除外:-> 2659返回self._engine.get_loc(self._maybe_cast_indexer(key))2660
indexer = self.get_indexer([key], method=method, tolerance=tolerance) 2661 if indexer.ndim > 1 or indexer.size > 1: indexer = self.get_indexer([key],method = method,tolerance = tolerance)2661如果indexer.ndim> 1或indexer.size> 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas._libs.index.IndexEngine.get_loc()中的pandas / _libs / index.pyx

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas._libs.index.IndexEngine.get_loc()中的pandas / _libs / index.pyx

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas中的pandas / _libs / hashtable_class_helper.pxi._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas中的pandas / _libs / hashtable_class_helper.pxi._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'date-time' KeyError:“日期时间”

Reason is date-time is already index, here DatetimeIndex , so not possible select it like columns by names. 原因是date-time已经是索引,在这里是DatetimeIndex ,因此不可能像按名称的列那样选择它。

Reason is parameter index_col : 原因是参数index_col

pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',
                  sep=':',
                  header=None, 
                  names  = ['date-time','domain','requests-qty','response-bytes'],
                  parse_dates=[1], 
                  converters={'date-time': to_datetime}, 
                  index_col = 'date-time')

For MultiIndex add list of columns names in index_col , remove converters and specify column name in parse_dates parameter: 对于MultiIndex,在index_col添加列名称列表,删除converters并在parse_dates参数中指定列名称:

import pandas as pd
from io import StringIO

temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp), 
                 sep=':',
                 header=None, 
                 names  = ['date-time','domain','requests-qty','response-bytes'],
                 parse_dates=['date-time'], 
                 index_col = ['date-time','domain'])

print (df)

date-time  domain                              
2016-01-01 d1                 0               0
2016-01-02 d2                 0               1
2016-01-03 d3                 1               0

print (df.index)
MultiIndex([('2016-01-01', 'd1'),
            ('2016-01-02', 'd2'),
            ('2016-01-03', 'd3')],
           names=['date-time', 'domain'])

EDIT1: Solution with append parameter in set_index : EDIT1:用溶液append参数set_index

import pandas as pd
from io import StringIO


temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp), 
                 sep=':',
                 header=None, 
                 names  = ['date-time','domain','requests-qty','response-bytes'],
                 parse_dates=['date-time'], 
                 index_col = 'date-time')

print (df)
           domain  requests-qty  response-bytes
date-time                                      
2016-01-01     d1             0               0
2016-01-02     d2             0               1
2016-01-03     d3             1               0

print (df.index)
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03'], 
              dtype='datetime64[ns]', name='date-time', freq=None)

df1 = df.set_index(['domain'], append = True)
print (df1)
                   requests-qty  response-bytes
date-time  domain                              
2016-01-01 d1                 0               0
2016-01-02 d2                 0               1
2016-01-03 d3                 1               0

print (df1.index)
MultiIndex([('2016-01-01', 'd1'),
            ('2016-01-02', 'd2'),
            ('2016-01-03', 'd3')],
           names=['date-time', 'domain'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM