Why is str.replace (on the index) giving KeyError?

Question

I am trying to use the below piece of code to replace parenthesis in Country Names where Country is an index to a DataFrame:

energy['Country'] = energy['Country'].str.replace(r"\s+\(.*\)","")

I have tried variations here and there but whatever I do I get the following error:

KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

followed by this:

KeyError: 'Country'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-45-740ea96e825f> in <module>()
     23 
     24 #energy['Country'] = energy['Country'].str.replace("A","B")
---> 25 energy['Country'] = energy['Country'].str.replace(r"\s+\(.*\)","")
     26 
     27 #energy['Country'] = energy['Country']

and then it keeps going.

Can someone please explain the error and what I need to correct?

Thanks.

Answer 1

If 'Country' is in your index you can't access it using df['Country'] syntax. This only works for table columns. However you have other options.

I've used the following test DataFrame to keep things simple.

df = pd.DataFrame([('abb', 1, 2), ('abc', 2, 4), ('abd', 3, 7), ('abe', 4, 8), ('abg', 5, 6), ('abh', 6, 3)], columns=['Country', 'b', 'c'])

If 'Country' is in the index (and a single-level index) you can perform the substitution as follows. Note, this will not work on a MultiIndex .

df = df.set_index('Country')
df.index = df.index.str.replace(r"a","")

Alternatively, you can use .reset_index to move everything out of the index and back into the columns. You can then do the indexing as you have it.

df = df.set_index(['Country', 'b'])  # Move 2 columns into the index.
df = df.reset_index()  # Country & b are now back out of the index, as a normal columns.
df['Country'] = df['Country'].str.replace(r"a","")  # Normal indexing works.

In both cases you should get the following output

  Country  b  c
0      bb  1  2
1      bc  2  4
2      bd  3  7
3      be  4  8
4      bg  5  6
5      bh  6  3

Why is str.replace (on the index) giving KeyError?

Question

1 answers

solution1
1 2019-02-17 22:34:11

Why is str.replace (on the index) giving KeyError?

Question

1 answers

solution1 1 2019-02-17 22:34:11

solution1
1 2019-02-17 22:34:11