简体   繁体   English

无法在同一个python程序中打开多个csv文件

[英]Unable to open more than one csv file in the same python program

My requirement is that I have two CSV files, I need to compare and perform operations on the last column of both the files. 我的要求是我有两个CSV文件,我需要在两个文件的最后一列进行比较并执行操作。 I am using pandas to open two CSV files, When I open the second CSV file and try to access any column returns the error. 我正在使用熊猫打开两个CSV文件,当我打开第二个CSV文件并尝试访问任何列时会返回错误。

import pandas as pd1
import pandas as pd

# comma delimited is the default
df = pd.read_csv("results.csv", header = 0)

spamColumnValues=df['isSpam'].values

df1=pd1.read_csv("compare.csv",header=0)

spamCompareValues=df1['isSpam'].values

Getting an error 遇到错误

  File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 1964, in __getitem__
    return self._getitem_column(key)

  File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 1971, in _getitem_column
    return self._get_item_cache(key)

  File "/Library/Python/2.7/site-packages/pandas/core/generic.py", line 1645, in _get_item_cache
    values = self._data.get(item)

  File "/Library/Python/2.7/site-packages/pandas/core/internals.py", line 3590, in get
    loc = self.items.get_loc(item)

  File "/Library/Python/2.7/site-packages/pandas/core/indexes/base.py", line 2444, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)

  File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)

  File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)

  File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)

KeyError: 'isSpam'

Can anyone point out my mistake, or it is not possible to do this with pandas? 谁能指出我的错误,还是不可能用熊猫做到这一点?

Both the csv files can be found at 这两个csv文件都可以在以下位置找到

https://drive.google.com/file/d/0B3XlF206d5UrUENtZlcwd0pVLW8/view?usp=sharing https://drive.google.com/file/d/0B3XlF206d5UrUENtZlcwd0pVLW8/view?usp=sharing

https://drive.google.com/file/d/0B3XlF206d5UrbGdJRFM5TURmejQ/view?usp=sharing https://drive.google.com/file/d/0B3XlF206d5UrbGdJRFM5TURmejQ/view?usp=sharing

The issue is you don't have a column named "isSpam" in compare.csv . 问题是您在compare.csv没有名为“ isSpam”的列。 You will need to pass header=None to pd.read_csv() otherwise you'll be capturing the first observation as headers: 您需要将header=None传递给pd.read_csv()否则您将捕获第一个观察结果作为标头:

df1=pd1.read_csv("compare.csv",header=None)

and since the columns appear to be the same: 并且由于这些列看起来是相同的:

df1.columns = df.columns

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM