简体   繁体   中英

Reading and Using a CSV file in python 3 panda

I have CSV file

 Firstname   Lastname     City     Province
'Guy',       'Ouell',   'Brossard','QC'
'Michelle',  'Balonne','Stittsville','ON'
'Ben',       'Sluzing','Toronto','ON'
'Theodora', 'Panapoulos','Saint-Constant','QC'
'Kathleen', 'Mercier','St Johns','NL'
...

and I open and check it which is everything is fine:

 df = pd.read_csv('a.csv')
 df.head(n=5)

When I want to use columns I have two different problems:

Problem1: Only I have access to the first column and when I want to use other columns I get an error:

for mis_column, mis_row in missing_df.iterrows():
    print(mis_row['Firstname'])

I get all of the first names but when I want to get all of the cities, for example, I see:

TypeError                                 Traceback (most recent call last)
E:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   2482             try:
-> 2483                 return libts.get_value_box(s, key)
   2484             except IndexError:

pandas/_libs/tslib.pyx in pandas._libs.tslib.get_value_box 
(pandas\_libs\tslib.c:18843)()

pandas/_libs/tslib.pyx in pandas._libs.tslib.get_value_box 
(pandas\_libs\tslib.c:18477)()

TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-36-55ba81245685> in <module>()
       1 
       2 for mis_column, mis_row in missing_df.iterrows():
 ----> 3     print(mis_row['City'])
       4 
       5 

  E:\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
      599         key = com._apply_if_callable(key, self)
      600         try:
  --> 601             result = self.index.get_value(self, key)
      602 
      603             if not is_scalar(result):

  E:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in 
  get_value(self, series, key)
     2489                     raise InvalidIndexError(key)
     2490                 else:
  -> 2491                     raise e1
     2492             except Exception:  # pragma: no cover
     2493                 raise e1

  E:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
     2475         try:
     2476             return self._engine.get_value(s, k,
  -> 2477        tz=getattr(series.dtype, 'tz', None))
     2478         except KeyError as e1:
     2479             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

  pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

  pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

  pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

  pandas\_libs\hashtable_class_helper.pxi in 
  pandas._libs.hashtable.PyObjectHashTable.get_item()

  pandas\_libs\hashtable_class_helper.pxi in 
  pandas._libs.hashtable.PyObjectHashTable.get_item()

  KeyError: 'City'

Problem 2:

 for mis_column, mis_row in df.iterrows():
     if mis_row['Firstname'] == 'Guy': 
            print('A')

does not print A

Thanks in advance

With your CSV's header comma separated. Like this,

 Firstname,   Lastname,     City,     Province
'Guy',       'Ouell',   'Brossard','QC'
'Michelle',  'Balonne','Stittsville','ON'
'Ben',       'Sluzing','Toronto','ON'
'Theodora', 'Panapoulos','Saint-Constant','QC'
'Kathleen', 'Mercier','St John's','NL'

As your CSV has white spaces around, you can read to dataframe by skipping,

df = pd.read_csv('<your_input>.csv', skipinitialspace=True)

If you want to remove the single quotes as well, then,

df = pd.read_csv('<your_input>.csv', skipinitialspace=True, quotechar="'")

>>> df
  Firstname    Lastname            City Province
0       Guy       Ouell        Brossard       QC
1  Michelle     Balonne     Stittsville       ON
2       Ben     Sluzing         Toronto       ON
3  Theodora  Panapoulos  Saint-Constant       QC
4  Kathleen     Mercier       St Johns'       NL


>>> import pandas as pd
>>> df = pd.read_csv('test2.csv', skipinitialspace=True, quotechar="'")
>>> df
  Firstname    Lastname            City Province
0       Guy       Ouell        Brossard       QC
1  Michelle     Balonne     Stittsville       ON
2       Ben     Sluzing         Toronto       ON
3  Theodora  Panapoulos  Saint-Constant       QC
4  Kathleen     Mercier       St Johns'       NL
>>> for mis_column, mis_row in df.iterrows():
...      if mis_row['Firstname'] == 'Guy':
...             print('A')
...
A
>>>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM