简体   繁体   English

如何根据列中的唯一值最好地遍历 DataFrame 上的行?

[英]How do I best iterate through rows on a DataFrame based on unique values in one of the columns?

I have a pricelist with roughly 60K lines, containing around 5.5K products with different service durations.我有一个包含大约 60K 行的价目表,其中包含大约 5.5K 种服务持续时间不同的产品。 Simplified it looks like this:简化后看起来像这样:

dpl_Description w/o months  dpl_Order Duration
X                           36
X                            9
Y                           23
F                           26
F                            7
F                           18
X                            6
X                            4
X                           15
Z                           35
Z                            6
Z                            5
C                            3
X                           34
Y                           12
Y                            5

(on that topic: is there a better way to post tables?) (关于该主题:有没有更好的方式来张贴表格?)

I want to go through this list, and remove any products with a duration that is not 12, 24 or 36 months, provided a 12 month item exists (if this particular product is not available as a 12 month item all items should remain).如果存在 12 个月的项目,我想查看此列表,并删除持续时间不是 12、24 或 36 个月的任何产品(如果此特定产品不能作为 12 个月的项目提供,则所有项目都应保留)。

This is my current code for achieving this:这是我当前用于实现此目的的代码:

for pwl in pd.unique(result["dpl_Description w/o months"]):
if result[(result["dpl_Description w/o months"] == pwl) & (result["dpl_Order Duration"] == 12)].empty:
    pass
else:
    for i in result[(result["dpl_Description w/o months"] == pwl) & (result["Charity"] != "Yes")]["dpl_Order Duration"]:
        if i in [12, 24, 36]:

        else:
            result.drop(result[(result["dpl_Description w/o months"] == pwl) & (result["dpl_Order Duration"] == i)].index, inplace=True)

The code runs accomplishes what I want from it, but it is horribly slow.代码运行完成了我想要的,但速度非常慢。 Given that I was planning to write a function around it and use this same approach for a variety of other operations that need to be done on the data set I wanted to get some feedback.鉴于我计划围绕它编写一个函数,并将这种相同的方法用于需要在数据集上完成的各种其他操作,我想获得一些反馈。

What would a better approach to this problem be, resulting in a more time efficient computation?解决这个问题的更好方法是什么,从而导致更省时的计算?

EDIT I have tried the following in the hopes of accelerating the code, as this should avoid much of the looping through individual durations.编辑我已经尝试了以下希望加速代码,因为这应该避免在各个持续时间中进行大部分循环。 It still runs extremely slow, however:但是,它仍然运行得很慢:

for pwl in pd.unique(result["dpl_Description w/o months"]):
if result[(result["dpl_Description w/o months"] == pwl) & (result["dpl_Order Duration"] == 12)].empty:
    pass
else:
     result.drop(result[~(result["dpl_Order Duration"].isin([12,24,36])) & (result["Charity"] != "Yes") & (result["dpl_Description w/o months"] == pwl)].index, inplace=True)

2. Edit 2. 编辑

Based on the provided example dataset the output I am expecting would be:根据提供的示例数据集,我期望的输出是:

X 36
X 9
F 26
F 7
F 18
X 6
X 4
X 15
Z 35
Z 6
Z 5
C 3
Y 12

As stated, I only wish to delete non 12,24 or 36 rows, if the same product is also within the price list as a 12 month item.如上所述,如果相同的产品也在价目表中作为 12 个月的项目,我只想删除非 12,24 或 36 行。 In this case that would only apply to the product "Y".在这种情况下,这只适用于产品“Y”。

Without an expected output, I took a guess没有预期的输出,我猜测

df = df[df['dpl_Order Duration'].isin([12, 24, 36])]

   dpl_Description w/o months  dpl_Order Duration
0                           X                  36
14                          Y                  12

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 遍历 Panda dataframe 中的多个列并找到计数唯一值 - Iterate through multiple columns in a Panda dataframe and find count unique values 如何遍历 dataframe 中的列并更新值? - How to iterate through columns in a dataframe and update the values? 如何制作一个 function 将遍历 pandas dataframe 中的列并返回唯一值 - How to make a function that will iterate through the columns in a pandas dataframe and return unique values 如何将 MultiIndex 转换为 dataframe,其中一个索引作为行,第二个索引作为列,另一列作为值 - How do I transform a MultiIndex to a dataframe with one index as rows, second index as columns and another columns as values 在 pandas 中,如何根据一列中的唯一值创建列,然后根据另一列中的值填充它? - In pandas, how do I create columns out of unique values in one column, and then fill it based on values in another column? 如何遍历数据框的列,然后在这些列中的每一列上运行 describe() 函数? - How do I iterate through a dataframe's columns and then run the describe() function on each of these columns? 迭代没有唯一值的python数据帧 - Iterate through a python dataframe with no unique values 如何遍历 DataFrame 的行? - How to iterate through rows of a DataFrame? 如何遍历 dataframe 的列? - How to iterate through columns of the dataframe? 如何遍历 DataFrame 行并从 cols 中的字典中获取值? - How to iterate through DataFrame rows and grab values from dicts in cols?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM