简体   繁体   English

在迭代时从 pandas dataframe 中删除行

[英]Removing rows from a pandas dataframe while iterating through it

I have the following python script.我有以下 python 脚本。 In it, I am iterating through a CSV file which has rows and rows of loyalty cards.在其中,我正在遍历一个 CSV 文件,该文件包含一排又一排的会员卡。 In many cases, there is more than one entry per card.在许多情况下,每张卡有多个条目。 I am currently looping through each row, then using loc to find all other instances of the card in the current row, so I can combine them together to post to an API. What I'd like to do however, is when that post is done, remove all the rows I've just merged, so that way the iteration doesn't hit them again.我目前正在遍历每一行,然后使用 loc 在当前行中查找卡片的所有其他实例,因此我可以将它们组合在一起以发布到 API。但是我想做的是,当该帖子是完成后,删除我刚刚合并的所有行,这样迭代就不会再次命中它们。

That's the part I'm stuck on.那是我坚持的部分。 Any ideas?有任何想法吗? Essentially I want to remove all the rows in card_list from csv before I go for the next iteration.本质上,我想在下一次迭代的 go 之前从 csv 中删除 card_list 中的所有行。 That way even though there might be 5 rows with the same card number, I only process that card once.这样即使可能有 5 行具有相同的卡号,我也只处理该卡一次。 I tried by using我尝试使用

csv = csv[csv.card != row.card]

At the end of the loop, thinking it might re-generate the dataframe without any rows with a card matching the one just processed, but it didn't work.在循环结束时,认为它可能会重新生成 dataframe,其中没有任何行与一张与刚刚处理的卡片匹配的卡片,但它没有用。

import urllib3
import json
import pandas as pd 
import os
import time 
import pyfiglet
from datetime import datetime
import array as arr

    for row in csv.itertuples():
        dt = datetime.now()
        vouchers = []
        if minutePassed(time.gmtime(lastrun)[4]):
            print('Getting new token...')
            token = get_new_token()
            lastrun = time.time()
        print('processing ' + str(int(row.card)))
        card_list = csv.loc[csv['card'] == int(row.card)]
        print('found ' + str(len(card_list)) + ' vouchers against this card')

        for row in card_list.itertuples():
            print('appending card ' + str(int(row.card)) + ' voucher ' + str(row.voucher))
            vouchers.append(row.voucher)
        print('vouchers, ', vouchers)

        encoded_data = json.dumps({
            "store_id":row.store,
            "transaction_id":"11111",
            "card_number":int(row.card),
            "voucher_instance_ids":vouchers
        })
        print(encoded_data)
        number += 1

        r = http.request('POST', lcs_base_path + 'customer/auth/redeem-commit',body=encoded_data,headers={'x-api-key': api_key, 'Authorization': 'Bearer ' + token})
        response_data = json.loads(r.data)

        if (r.status == 200):
            print (str(dt) + ' ' + str(number) + ' done. processing card:' + str(int(row.card)) + ' voucher:' + str(row.voucher) + ' store:' + str(row.store) + ' status: ' + response_data['response_message'] + ' request:' + response_data['lcs_request_id'])
        else:
            print (str(dt) + ' ' + str(number) +  'done. failed to commit ' + str(int(row.card)) + ' voucher:' + str(row.voucher) + ' store:' + str(row.store) + ' status: ' + response_data['message'])
            new_row = {'card':row.card, 'voucher':row.voucher, 'store':row.store, 'error':response_data['message']}
            failed_csv = failed_csv.append(new_row, ignore_index=True)
            failed_csv.to_csv(failed_csv_file, index=False)
            csv = csv[csv.card != row.card]
    print ('script completed')
    print (str(len(failed_csv)) + ' failed vouchers will be saved to failed_commits.csv')
    print("--- %s seconds ---" % (time.time() - start_time))

First rule of thumb is never alternate what you are iterating on .第一条经验法则永远不会替代您正在迭代的内容 Also, I think you are doing it wrong with itertuples .另外,我认为你对itertuples做错了。 Let's do groupby:让我们做groupby:

for card, card_list in csv.groupby('card'):
    # card_list now contains all the rows that have a specific cards
    # exactly like `card_list` in your code
    print('processing, card)
    print('found', len(card_list), 'vouchers against this card')

    # again `itertuples` is over killed -- REMOVE IT
    # for row in card_list.itertuples():

    encoded_data = json.dumps({
            "store_id": card_list['store'].iloc[0],      # same as `row.store`
            "transaction_id":"11111",
            "card_number":int(card),
            "voucher_instance_ids": list(card_list['voucher']) # same as `vouchers`
        })
    
    # ... Other codes

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM