简体   繁体   English

检查熊猫数据框中的重复序列

[英]Check for a repeating sequence in a pandas dataframe

I have a pandas Dataframe, a column of which has a repeating sequence of values which almost looks like the following: 我有一个pandas Dataframe,它的一列具有重复的值序列,看起来几乎如下所示:

      Cell
0      x_a
1      x_b
2      x_c
3      x_a
4      x_b
5      x_c
6      x_a
7      x_b
8      x_b
9      x_c
10     x_c
11     x_b
12     x_a

I need to check through this entire column to see that the repetition of this sequence "x_a, x_b, x_c" is exactly maintained in this order, ie "x_c" follows x_b" follows "x_a". 我需要遍历整个专栏,以确保按此顺序完全保持该序列“ x_a,x_b,x_c”的重复,即“ x_b之后的” x_c”在“ x_a”之后。

Wherever this order is broken, eg in indices 7 and 8 where "B" is repeated twice or 10, 11, 12 where the order is wrong, I need to be able to find out which value is playing foul? 无论何时破坏该顺序,例如在索引7和8中“ B”重复两次,或者在顺序错误的10、11、12中,我需要找出哪个值在犯规?

Any pointers on how to do it? 关于如何做到这一点的任何指针?

I've been scratching my head with df.loc all this while but to no avail and I'm fairly certain df.loc is not the right way. 我一直在用df.loc挠头,但无济于事,我敢肯定df.loc是不正确的方法。

Thanks in advance guys. 在此先感谢大家。

I wrote this solution with a pre-defined rule for the order: 我使用针对该订单的预定义规则编写了此解决方案:

import pandas as pd

#Creating Dummy Dataframe
dummy_frame = pd.DataFrame(columns=["dummy"])

#Adding Dummy Values to the DataFrame
dummy_frame["dummy"] = ["x_a","x_b","x_c","x_a","x_b","x_c","x_a","x_a","x_b"]

#Pre-defining order to check in the dataframe
correct_order = ["x_a","x_b","x_c"]

#For Loop Based on length of the order (Triplets in this case)
for i in range(0,len(dummy_frame),len(correct_order)):

    #Check if the order is matched
    if correct_order != dummy_frame["dummy"][i:i+3].tolist():
        for j in range(len(correct_order)):

            #Check for the incorrect value in the triplet
            if correct_order[j] != dummy_frame["dummy"][i:i+3].tolist()[j]:
                print "Value at index:",i+j,"is incorrect."
                print "Current Value:",dummy_frame["dummy"][i:i+3].tolist()[j],"Correct Value is:",correct_order[j]

Sample Output: 样本输出:

Value at index: 7 is incorrect.
Current Value: x_a Correct Value is: x_b
Value at index: 8 is incorrect.
Current Value: x_b Correct Value is: x_c

Hope this helps :) 希望这可以帮助 :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM