简体   繁体   English

Python 3:使用数据框的字典时出现KeyError 0

[英]Python 3: KeyError 0 when using dataframe's dictionary

Using the below code I am trying to insert n-DataFrames to an MSSQL table. 使用以下代码,我试图将n-DataFrames插入MSSQL表。

for file in os.listdir():
   print('# Inserting ' + file + ' . . .')
   df = pd.read_csv(file)
   df = df.fillna('NULL')
   if(len(df)>1):
       dfs = partDF(df , lim)
       for k in dfs.keys():
           print('\t' + str(int(k.split('t')[1])+1) + ' / ' + str(len(dfs.keys()))+ '\t')
           aux = dfs[k]
           insert2SQL(aux, table)
           del(aux)
       print(' OK :)')
   del(df, dfs)

The partDF() function splits the dataframe into smaller ones so that each one doesn't exceed 1000 rows length. partDF()函数将数据帧拆分为较小的数据帧,以使每个数据帧的长度均不超过1000行。 These dataframes are returned inside a dictionary, whose keys are named t0, t1, t1 ... tn. 这些数据帧在字典中返回,字典的键名为t0,t1,t1 ... tn。
Note that for security, I used the key names directly from dict.keys() method. 请注意,为了安全起见,我直接从dict.keys()方法中使用了键名。

The above code raises Keyerror 0 after it inserts the first dataframe inside the loop. 上面的代码在循环中插入第一个数据帧后,将引发Keyerror 0

    KeyError                                  Traceback (most recent call last)
<ipython-input-4-0e1d02aa1939> in <module>()
      8                         print('\t' + str(int(k.split('t')[1])+1) + ' / ' + str(len(dfs.keys()))+ '\t')
      9                         aux = dfs[k]
---> 10                         insert2SQL(aux, table)
     11                         del(aux)
     12                 print(' OK :)')

<ipython-input-2-fd6c30d5a003> in insert2SQL(tablilla, sqlTab)
     27         vals = list()
     28         for field in tablilla.columns:
---> 29                 if(type(tablilla[field][0]) == str):
     30                         vals.append(True)
     31                 else:

c:\python36\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    621         key = com._apply_if_callable(key, self)
    622         try:
--> 623             result = self.index.get_value(self, key)
    624 
    625             if not is_scalar(result):

c:\python36\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   2558         try:
   2559             return self._engine.get_value(s, k,
-> 2560                                           tz=getattr(series.dtype, 'tz', None))
   2561         except KeyError as e1:
   2562             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

However, when I execute the below code where I am just printing the dataframes' headers, no such error is encountered: 但是,当我执行以下代码(仅打印数据帧的标头)时,不会遇到此类错误:

for file in os.listdir():
   df = pd.read_csv(file)
   df = df.fillna('NULL')
   if(len(df)>1):
       dfs = partDF(df , lim)
       for k in dfs.keys():
           aux = dfs[k]
           print('\t\t\t\tOriginal length : ' + str(len(aux)))
           print(aux.head(10))
           #insert2SQL(aux, table)
           del(aux)
   del(df, dfs)

I don't understand what is happening and would be glad if someone could help me. 我不知道发生了什么,如果有人可以帮助我,我会很高兴。

PS. PS。 I didn't post the partDF() code because I think it is very clear from the 2 code snippets that it's not the reason for the error. 我之所以没有发布partDF()代码,是因为我认为从这两个代码段中可以很清楚地看出这不是错误的原因。

PS2. PS2。 insert2SQL code: insert2SQL代码:

def insert2SQL(tablilla, sqlTab):
    # connection data
    # servName = 'server'
    # userName = 'me'
    # psswd = 'pass'
    # cnxn = pyodbc.connect(driver='{SQL Server}', server=servName, UID=userName, PWD=psswd)
    # cursor = cnxn.cursor()

    rows = len(tablilla)
    fields = tablilla.columns
    vals = list()
    for field in tablilla.columns:
        if(type(tablilla[field][0]) == str):
            vals.append(True)
        else:
            vals.append(False)

    textFields = dict(zip(fields, vals))
    q = "INSERT INTO " + sqlTab + " VALUES ("

    for r in range(rows):
        for field in fields:
            if(str(tablilla.loc[r, field]) != 'NULL'):
                quot = "'" if textFields[field] else ""
            else:
                quot = ""
            q += quot + str(tablilla.loc[r, field]) + quot + ", "
        q = q[0:len(q)-2] + '),('  
    q = q[0:len(q)-2]

    print(q)
    print('\n')
    # cursor.execute(q)
    # cursor.commit()
    # cnxn.close()

EDITTED: 编辑:

After the remarks you made me, I tried to review over your comments and found my error is because I always try to compare the first row for each dataframe, forgetting that after splitting the big dataframes pandas keeps the index rows. 在您发表我的评论之后,我尝试查看您的评论,发现我的错误是因为我总是尝试比较每个数据框的第一行,却忘记了在拆分大数据框后,大熊猫保留了索引行。 The solution was just applying df.reset_index() to each dataframe I send to my insert2SQL() function. 解决方案是将df.reset_index()应用于我发送到我的insert2SQL()函数的每个数据帧。

Thanks alot! 非常感谢!

PS. PS。 Is there a way to vote up your comments as long they were useful to me? 只要您的评论对我有用,有什么方法可以投票赞成? How do I close this question? 如何解决这个问题?

Is table initialized somewhere? 表是否在某处初始化? You need to initialize it. 您需要初始化它。

UPDATE: In the function insert check if the followings actually exist every time the loop iterates? 更新:在函数插入中检查每次循环迭代时是否实际存在以下内容? - tablilla[field][0] - textFields[field] - tablilla.loc[r, field] -tablilla [field] [0]-textFields [field]-tablilla.loc [r,field]

I suggest you comment out each for loop one by one and see what part causes the error. 我建议您逐一注释掉每个for循环,看看是什么部分导致了错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM