简体   繁体   中英

How do I best iterate through rows on a DataFrame based on unique values in one of the columns?

I have a pricelist with roughly 60K lines, containing around 5.5K products with different service durations. Simplified it looks like this:

dpl_Description w/o months  dpl_Order Duration
X                           36
X                            9
Y                           23
F                           26
F                            7
F                           18
X                            6
X                            4
X                           15
Z                           35
Z                            6
Z                            5
C                            3
X                           34
Y                           12
Y                            5

(on that topic: is there a better way to post tables?)

I want to go through this list, and remove any products with a duration that is not 12, 24 or 36 months, provided a 12 month item exists (if this particular product is not available as a 12 month item all items should remain).

This is my current code for achieving this:

for pwl in pd.unique(result["dpl_Description w/o months"]):
if result[(result["dpl_Description w/o months"] == pwl) & (result["dpl_Order Duration"] == 12)].empty:
    pass
else:
    for i in result[(result["dpl_Description w/o months"] == pwl) & (result["Charity"] != "Yes")]["dpl_Order Duration"]:
        if i in [12, 24, 36]:

        else:
            result.drop(result[(result["dpl_Description w/o months"] == pwl) & (result["dpl_Order Duration"] == i)].index, inplace=True)

The code runs accomplishes what I want from it, but it is horribly slow. Given that I was planning to write a function around it and use this same approach for a variety of other operations that need to be done on the data set I wanted to get some feedback.

What would a better approach to this problem be, resulting in a more time efficient computation?

EDIT I have tried the following in the hopes of accelerating the code, as this should avoid much of the looping through individual durations. It still runs extremely slow, however:

for pwl in pd.unique(result["dpl_Description w/o months"]):
if result[(result["dpl_Description w/o months"] == pwl) & (result["dpl_Order Duration"] == 12)].empty:
    pass
else:
     result.drop(result[~(result["dpl_Order Duration"].isin([12,24,36])) & (result["Charity"] != "Yes") & (result["dpl_Description w/o months"] == pwl)].index, inplace=True)

2. Edit

Based on the provided example dataset the output I am expecting would be:

X 36
X 9
F 26
F 7
F 18
X 6
X 4
X 15
Z 35
Z 6
Z 5
C 3
Y 12

As stated, I only wish to delete non 12,24 or 36 rows, if the same product is also within the price list as a 12 month item. In this case that would only apply to the product "Y".

Without an expected output, I took a guess

df = df[df['dpl_Order Duration'].isin([12, 24, 36])]

   dpl_Description w/o months  dpl_Order Duration
0                           X                  36
14                          Y                  12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM