简体   繁体   中英

pandas df loop through column resulting in KeyError 1

I have an if loop that's throwing a keyerror 1 when i try to reference a location in a dataframe, which has been imported with pandas. I only receive this error on windows, the loop runs in os - this command also works outside of a loop. what am I do wrong? I am running though a column and if and of the string values = a specific string, I want it to tell me.I also tried.loc[i] and that didnt work in the loop either

    i=0
    for R in df:
        i=i+1
        if df['Data status'][i] == 'In progress':
            print ('temp')
        else:
            print ('not')

Your code has the following flaws:

  1. Your loop for R in df: iterates over column names . So if your DataFrame has eg 3 columns, you attempt to process just 3 rows.

  2. df['Data status'] is a Series - a column with this name. It has index , just the same as the whole DataFrame. By default it consists of consecutive numbers starting from 0 , but I don't know whether the index in your DataFrame is just like this. It can also contain other values, maybe dates, maybe strings, you failed to provide any data on this detail.

  3. Apparently your code failed in the first turn of the loop, when i was 1 (after increment from 0 ), when df['Data status'][i] attempted to refer to element of Data status column with index 1 . I assume that your DataFrame does not contain index == 1 , hence KeyError exception was raised.

To process your DataFrame, row by row, try the following code:

for ind, row in df.iterrows():
    if row['Data status'] == 'In progress':
        print(ind, 'temp')
    else:
        print(ind, 'not')

In the code above:

  • ind is the index of the curent row,
  • row is the current row itself.

Ok, so generally the for loop is redundant there. You can leverage vectorized operations here:

In case if you want to create new column with the mentioned values:

import numpy as np
df['new_column']=np.where(df['Data status'].eq("In progress"), "temp", "not")

In case if you want some processing to be done on the side, depending on the value, which would justify the for loop - just do:

import numpy as np
arr=np.where(df['Data status'].eq("In progress"), "temp", "not")

for el in arr:
    print(el)
    if(el=="temp"):
        Do_something()
    else:
        Do_something_else()

In case I totally misread your intentions, and you just want to make your code work, for some higher purpose, just do (it's also way more pythonish approach):

# i is row index, and R is the single row
for i,R in df.iterrows():
    if R['Data status'] == 'In progress':
        print ('temp')
    else:
        print ('not')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM