简体   繁体   中英

How to modify a Pandas dataframe while iterating over it

So I have a dataframe that I am iterating over, and about halfway through the df I want to modify a column name but continue my iteration. I have code like this:

for index, row in df.iterrows():
    do something with row

    if certain condition is met:
        df.rename(columns={'old_name':'new_name'}, inplace=True)

After I do the rename, the column name is changed in the 'df' variable for subsequent iterations, but the value of 'row' still contains the old column name. How can I fix this? I know I have encountered similar situations in pandas before. Maybe the iterator doesn't get updated even the dataframe itself is modified?

Changing the source of something you're iterating over is not a good practice.

You could set a flag if the condition is met, and then after the iteration, make any necessary changes to the dataframe.

Edited to add: I have a large dataset that needs "line by line" parsing, but that instruction was given to me by a non-programmer. Here's what I did: I added a boolean condition to the dataframe, split the dataframe into two separate dataframes based on that condition, stored one for later integration and moved on with the other dataframe. At the end I used pd.concat to put everything back together. But if you change a column name that pd.concat will create extra columns in the end.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM