简体   繁体   English

如何检查两个列表在Python中是否循环相同

[英]How to check whether two lists are circularly identical in Python

For instance, I have lists: 例如,我有以下列表:

a[0] = [1, 1, 1, 0, 0]
a[1] = [1, 1, 0, 0, 1]
a[2] = [0, 1, 1, 1, 0]
# and so on

They seem to be different, but if it is supposed that the start and the end are connected, then they are circularly identical. 它们似乎不同,但是如果假定起点和终点相连,则它们在循环上是相同的。

The problem is, each list which I have has a length of 55 and contains only three ones and 52 zeros in it. 问题是,每个列表的长度为55,并且仅包含三个1和52个零。 Without circular condition, there are 26,235 (55 choose 3) lists. 如果没有循环条件,则有26,235(55选择3)个列表。 However, if the condition 'circular' exists, there are a huge number of circularly identical lists 但是,如果存在条件“循环”,则存在大量循环相同的列表

Currently I check circularly identity by following: 目前,我通过以下方式循环检查身份:

def is_dup(a, b):
    for i in range(len(a)):
        if a == list(numpy.roll(b, i)): # shift b circularly by i
            return True
    return False

This function requires 55 cyclic shift operations at the worst case. 在最坏的情况下,此功能需要55次循环移位操作。 And there are 26,235 lists to be compared with each other. 并且有26,235个列表可以相互比较。 In short, I need 55 * 26,235 * (26,235 - 1) / 2 = 18,926,847,225 computations. 简而言之,我需要55 * 26,235 *(26,235-1)/ 2 = 18,926,847,225个计算。 It's about nearly 20 Giga! 大约有20 Giga!

Is there any good way to do it with less computations? 有什么好方法可以减少计算量吗? Or any data types that supports circular ? 还是支持循环的任何数据类型?

First off, this can be done in O(n) in terms of the length of the list You can notice that if you will duplicate your list 2 times ( [1, 2, 3] ) will be [1, 2, 3, 1, 2, 3] then your new list will definitely hold all possible cyclic lists. 首先,这可以根据列表的长度在O(n)中完成。您会注意到,如果将列表重复2次( [1, 2, 3] )将为[1, 2, 3, 1, 2, 3]那么您的新列表肯定会包含所有可能的循环列表。

So all you need is to check whether the list you are searching is inside a 2 times of your starting list. 因此,您所需要做的就是检查您要搜索的列表是否在起始列表的2倍之内。 In python you can achieve this in the following way (assuming that the lengths are the same). 在python中,您可以通过以下方式实现此目的(假设长度相同)。

list1 = [1, 1, 1, 0, 0]
list2 = [1, 1, 0, 0, 1]
print ' '.join(map(str, list2)) in ' '.join(map(str, list1 * 2))

Some explanation about my oneliner: list * 2 will combine a list with itself, map(str, [1, 2]) convert all numbers to string and ' '.join() will convert array ['1', '2', '111'] into a string '1 2 111' . 关于我的oneliner的一些解释: list * 2会将列表与其自身组合, map(str, [1, 2])将所有数字转换为字符串,而' '.join()将转换数组['1', '2', '111']转换为字符串'1 2 111'

As pointed by some people in the comments, oneliner can potentially give some false positives, so to cover all the possible edge cases: 正如某些人在评论中指出的那样,oneliner可能会给出一些误报,因此涵盖了所有可能的极端情况:

def isCircular(arr1, arr2):
    if len(arr1) != len(arr2):
        return False

    str1 = ' '.join(map(str, arr1))
    str2 = ' '.join(map(str, arr2))
    if len(str1) != len(str2):
        return False

    return str1 in str2 + ' ' + str2

PS1 when speaking about time complexity, it is worth noticing that O(n) will be achieved if substring can be found in O(n) time. PS1谈到时间复杂度时,值得注意的是O(n)如果子串中可以找到将达到O(n)时间。 It is not always so and depends on the implementation in your language ( although potentially it can be done in linear time KMP for example). 它并非总是如此,并且取决于您所用语言的实现方式( 尽管可能可以例如在线性时间KMP中完成 )。

PS2 for people who are afraid strings operation and due to this fact think that the answer is not good. PS2对于那些害怕字符串操作的人,由于这个事实,他们认为答案并不理想。 What important is complexity and speed. 重要的是复杂性和速度。 This algorithm potentially runs in O(n) time and O(n) space which makes it much better than anything in O(n^2) domain. 该算法可能在O(n)时间和O(n)空间中运行,这使其比O(n^2)域中的任何算法都要好得多。 To see this by yourself, you can run a small benchmark (creates a random list pops the first element and appends it to the end thus creating a cyclic list. You are free to do your own manipulations) 要亲自查看,可以运行一个小的基准测试(创建一个随机列表会弹出第一个元素,并将其附加到末尾,从而创建一个循环列表。您可以自行进行操作)

from random import random
bigList = [int(1000 * random()) for i in xrange(10**6)]
bigList2 = bigList[:]
bigList2.append(bigList2.pop(0))

# then test how much time will it take to come up with an answer
from datetime import datetime
startTime = datetime.now()
print isCircular(bigList, bigList2)
print datetime.now() - startTime    # please fill free to use timeit, but it will give similar results

0.3 seconds on my machine. 在我的机器上0.3秒。 Not really long. 不太长。 Now try to compare this with O(n^2) solutions. 现在尝试将此与O(n^2)解决方案进行比较。 While it is comparing it, you can travel from US to Australia (most probably by a cruise ship) 在进行比较时,您可以从美国到澳大利亚旅行(很可能乘游轮旅行)

Not knowledgeable enough in Python to answer this in your requested language, but in C/C++, given the parameters of your question, I'd convert the zeros and ones to bits and push them onto the least significant bits of an uint64_t. 在Python中没有足够的知识来以您所请求的语言来回答这个问题,但是在C / C ++中,给定您问题的参数,我会将零和一转换为位,然后将它们压入uint64_t的最低有效位。 This will allow you to compare all 55 bits in one fell swoop - 1 clock. 这样一来,您就可以比较所有55位-1个时钟。

Wickedly fast, and the whole thing will fit in on-chip caches (209,880 bytes). 速度极快,整个过程都可以放入片上缓存(209,880字节)。 Hardware support for shifting all 55 list members right simultaneously is available only in a CPU's registers. 仅在CPU的寄存器中提供对所有55个列表成员同时右移的硬件支持。 The same goes for comparing all 55 members simultaneously. 同时比较所有55个成员时也是如此。 This allows for a 1-for-1 mapping of the problem to a software solution. 这样可以将问题一对一地映射到软件解决方案。 (and using the SIMD/SSE 256 bit registers, up to 256 members if needed) As a result the code is immediately obvious to the reader. (并使用SIMD / SSE 256位寄存器,如果需要,最多可使用256个成员),因此,代码对于读者而言立即显而易见。

You might be able to implement this in Python, I just don't know it well enough to know if that's possible or what the performance might be. 您也许可以在Python中实现此功能,但我只是不太了解它是否可能或性能如何。

After sleeping on it a few things became obvious, and all for the better. 睡在上面之后,一些事情变得显而易见,并且一切都变得更好。

1.) It's so easy to spin the circularly linked list using bits that Dali's very clever trick isn't necessary. 1.)使用位旋转循环链接列表是如此容易,以至于Dali不需要非常巧妙的技巧。 Inside a 64-bit register standard bit shifting will accomplish the rotation very simply, and in an attempt to make this all more Python friendly, by using arithmetic instead of bit ops. 在64位寄存器中,标准移位将非常简单地完成旋转,并通过使用算术而非位操作来使这一切对Python更友好。

2.) Bit shifting can be accomplished easily using divide by 2. 2.)使用2分频可以轻松完成移位。

3.) Checking the end of the list for 0 or 1 can be easily done by modulo 2. 3.)通过模2可以很容易地检查列表末尾的0或1。

4.) "Moving" a 0 to the head of the list from the tail can be done by dividing by 2. This because if the zero were actually moved it would make the 55th bit false, which it already is by doing absolutely nothing. 4.)通过从2的尾部将0从列表的尾部“移动”到列表的头部,这是因为,如果实际移动了0,则将使第55位为false,这已经是绝对不做的了。

5.) "Moving" a 1 to the head of the list from the tail can be done by dividing by 2 and adding 18,014,398,509,481,984 - which is the value created by marking the 55th bit true and all the rest false. 5.)将a从末尾“ 1”移动到列表的头部可以通过除以2并加上18,014,398,509,481,984-这是通过将第55位标记为true并将其余所有标记为false来创建的值。

6.) If a comparison of the anchor and composed uint64_t is TRUE after any given rotation, break and return TRUE. 6.)如果在任何给定的旋转之后,锚点与组成的uint64_t的比较为TRUE,则中断并返回TRUE。

I would convert the entire array of lists into an array of uint64_ts right up front to avoid having to do the conversion repeatedly. 我会将整个列表数组直接转换为uint64_ts数组,以避免重复进行转换。

After spending a few hours trying to optimize the code, studying the assembly language I was able to shave 20% off the runtime. 在花了几个小时尝试优化代码之后,研究了汇编语言,我能够将运行时节省20%。 I should add that the O/S and MSVC compiler got updated mid-day yesterday as well. 我还应该补充说,O / S和MSVC编译器也在昨天中午更新。 For whatever reason/s, the quality of the code the C compiler produced improved dramatically after the update (11/15/2014). 无论出于何种原因,C编译器生成的代码质量在更新(11/15/2014)后都得到了显着改善。 Run-time is now ~ 70 clocks, 17 nanoseconds to compose and compare an anchor ring with all 55 turns of a test ring and NxN of all rings against all others is done in 12.5 seconds . 现在,运行时间约为70个时钟,约17纳秒 ,可以在12.5秒内完成并比较锚环与测试环的所有55圈,并将所有环的NxN与其他环进行比较。

This code is so tight all but 4 registers are sitting around doing nothing 99% of the time. 这段代码是如此紧张,除了4个寄存器外,其余99%的时间都无所事事。 The assembly language matches the C code almost line for line. 汇编语言几乎逐行匹配C代码。 Very easy to read and understand. 非常容易阅读和理解。 A great assembly project if someone were teaching themselves that. 如果有人自学的话,这是一个很棒的组装项目。

Hardware is Hazwell i7, MSVC 64-bit, full optimizations. 硬件是Hazwell i7,MSVC 64位,全面优化。

#include "stdafx.h"
#include "stdafx.h"
#include <string>
#include <memory>
#include <stdio.h>
#include <time.h>

const uint8_t  LIST_LENGTH = 55;    // uint_8 supports full witdth of SIMD and AVX2
// max left shifts is 32, so must use right shifts to create head_bit
const uint64_t head_bit = (0x8000000000000000 >> (64 - LIST_LENGTH)); 
const uint64_t CPU_FREQ = 3840000000;   // turbo-mode clock freq of my i7 chip

const uint64_t LOOP_KNT = 688275225; // 26235^2 // 1000000000;

// ----------------------------------------------------------------------------
__inline uint8_t is_circular_identical(const uint64_t anchor_ring, uint64_t test_ring)
{
    // By trial and error, try to synch 2 circular lists by holding one constant
    //   and turning the other 0 to LIST_LENGTH positions. Return compare count.

    // Return the number of tries which aligned the circularly identical rings, 
    //  where any non-zero value is treated as a bool TRUE. Return a zero/FALSE,
    //  if all tries failed to find a sequence match. 
    // If anchor_ring and test_ring are equal to start with, return one.

    for (uint8_t i = LIST_LENGTH; i;  i--)
    {
        // This function could be made bool, returning TRUE or FALSE, but
        // as a debugging tool, knowing the try_knt that got a match is nice.
        if (anchor_ring == test_ring) {  // test all 55 list members simultaneously
            return (LIST_LENGTH +1) - i;
        }

        if (test_ring % 2) {    //  ring's tail is 1 ?
            test_ring /= 2;     //  right-shift 1 bit
            // if the ring tail was 1, set head to 1 to simulate wrapping
            test_ring += head_bit;      
        }   else    {           // ring's tail must be 0
            test_ring /= 2;     // right-shift 1 bit
            // if the ring tail was 0, doing nothing leaves head a 0
        }
    }
    // if we got here, they can't be circularly identical
    return 0;
}
// ----------------------------------------------------------------------------
    int main(void)  {
        time_t start = clock();
        uint64_t anchor, test_ring, i,  milliseconds;
        uint8_t try_knt;

        anchor = 31525197391593472; // bits 55,54,53 set true, all others false
        // Anchor right-shifted LIST_LENGTH/2 represents the average search turns
        test_ring = anchor >> (1 + (LIST_LENGTH / 2)); //  117440512; 

        printf("\n\nRunning benchmarks for %llu loops.", LOOP_KNT);
        start = clock();
        for (i = LOOP_KNT; i; i--)  {
            try_knt = is_circular_identical(anchor, test_ring);
            // The shifting of test_ring below is a test fixture to prevent the 
            //  optimizer from optimizing the loop away and returning instantly
            if (i % 2) {
                test_ring /= 2;
            }   else  {
                test_ring *= 2;
            }
        }
        milliseconds = (uint64_t)(clock() - start);
        printf("\nET for is_circular_identical was %f milliseconds."
                "\n\tLast try_knt was %u for test_ring list %llu", 
                        (double)milliseconds, try_knt, test_ring);

        printf("\nConsuming %7.1f clocks per list.\n",
                (double)((milliseconds * (CPU_FREQ / 1000)) / (uint64_t)LOOP_KNT));

        getchar();
        return 0;
}

在此处输入图片说明

Reading between the lines, it sounds as though you're trying to enumerate one representative of each circular equivalence class of strings with 3 ones and 52 zeros. 在行之间阅读时,听起来好像您正在尝试枚举具有3个1和52个0的每个圆形等效类的字符串的一个代表。 Let's switch from a dense representation to a sparse one (set of three numbers in range(55) ). 让我们从密集表示转换为稀疏表示( range(55)的三个数字的集合)。 In this representation, the circular shift of s by k is given by the comprehension set((i + k) % 55 for i in s) . 在此表示中, s乘以k的循环移位由理解set((i + k) % 55 for i in s) The lexicographic minimum representative in a class always contains the position 0. Given a set of the form {0, i, j} with 0 < i < j , the other candidates for minimum in the class are {0, j - i, 55 - i} and {0, 55 - j, 55 + i - j} . 一个类中的字典最小代表总是包含位置0。给定一组{0, i, j}0 < i < j ,该类中其他最小的候选者是{0, j - i, 55 - i}{0, 55 - j, 55 + i - j} Hence, we need (i, j) <= min((j - i, 55 - i), (55 - j, 55 + i - j)) for the original to be minimum. 因此,我们需要(i, j) <= min((j - i, 55 - i), (55 - j, 55 + i - j))来使原件最小。 Here's some enumeration code. 这是一些枚举代码。

def makereps():
    reps = []
    for i in range(1, 55 - 1):
        for j in range(i + 1, 55):
            if (i, j) <= min((j - i, 55 - i), (55 - j, 55 + i - j)):
                reps.append('1' + '0' * (i - 1) + '1' + '0' * (j - i - 1) + '1' + '0' * (55 - j - 1))
    return reps

Repeat the first array, then use the Z algorithm (O(n) time) to find the second array inside the first. 重复第一个数组,然后使用Z算法 (O(n)时间)在第一个数组中找到第二个数组。

(Note: you don't have to physically copy the first array. You can just wrap around during matching.) (注意:您不必物理地复制第一个数组。您可以在匹配期间进行环绕。)

The nice thing about the Z algorithm is that it's very simple compared to KMP, BM, etc. Z算法的优点在于,与KMP,BM等相比,它非常简单。
However, if you're feeling ambitious, you could do string matching in linear time and constant space -- strstr , for example, does this. 但是,如果您有雄心壮志,则可以在线性时间和恒定空间中进行字符串匹配-例如, strstr可以做到这一点。 Implementing it would be more painful, though. 但是,实施它会更加痛苦。

Following up on Salvador Dali's very smart solution, the best way to handle it is to make sure all elements are of the same length, as well as both LISTS are of the same length. 紧随Salvador Dali的非常聪明的解决方案之后,处理该问题的最佳方法是确保所有元素的长度相同,并且两个LIST的长度相同。

def is_circular_equal(lst1, lst2):
    if len(lst1) != len(lst2):
        return False
    lst1, lst2 = map(str, lst1), map(str, lst2)
    len_longest_element = max(map(len, lst1))
    template = "{{:{}}}".format(len_longest_element)
    circ_lst = " ".join([template.format(el) for el in lst1]) * 2
    return " ".join([template.format(el) for el in lst2]) in circ_lst

No clue if this is faster or slower than AshwiniChaudhary's recommended regex solution in Salvador Dali's answer, which reads: 不知道这是快于或慢于萨尔瓦多·达利的答案中AshwiniChaudhary推荐的正则表达式解决方案的内容,该内容为:

import re

def is_circular_equal(lst1, lst2):
    if len(lst2) != len(lst2):
        return False
    return bool(re.search(r"\b{}\b".format(' '.join(map(str, lst2))),
                          ' '.join(map(str, lst1)) * 2))

Given that you need to do so many comparisons might it be worth your while taking an initial pass through your lists to convert them into some sort of canonical form that can be easily compared? 假设您需要进行大量比较,那么在初次遍历列表时将它们转换为某种可以轻松比较的规范形式是否值得?

Are you trying to get a set of circularly-unique lists? 您是否要获取一组圆形唯一列表? If so you can throw them into a set after converting to tuples. 如果是这样,您可以在转换为元组后将它们放入集合中。

def normalise(lst):
    # Pick the 'maximum' out of all cyclic options
    return max([lst[i:]+lst[:i] for i in range(len(lst))])

a_normalised = map(normalise,a)
a_tuples = map(tuple,a_normalised)
a_unique = set(a_tuples)

Apologies to David Eisenstat for not spotting his v.similar answer. 大卫·艾森斯塔德(David Eisenstat)未能对他的相似答案表示歉意。

You can roll one list like this: 您可以像这样滚动一个列表:

list1, list2 = [0,1,1,1,0,0,1,0], [1,0,0,1,0,0,1,1]

str_list1="".join(map(str,list1))
str_list2="".join(map(str,list2))

def rotate(string_to_rotate, result=[]):
    result.append(string_to_rotate)
    for i in xrange(1,len(string_to_rotate)):
        result.append(result[-1][1:]+result[-1][0])
    return result

for x in rotate(str_list1):
    if cmp(x,str_list2)==0:
        print "lists are rotationally identical"
        break

First convert every of your list elements (in a copy if necessary) to that rotated version that is lexically greatest. 首先将每一个列表元素(在副本如有必要) 旋转的版本,是词法最大。

Then sort the resulting list of lists (retaining an index into the original list position) and unify the sorted list, marking all the duplicates in the original list as needed. 然后对列表的结果列表进行排序(在原始列表位置保留索引)并统一排序后的列表,并根据需要在原始列表中标记所有重复项。

Piggybacking on @SalvadorDali's observation on looking for matches of a in any a-lengthed sized slice in b+b, here is a solution using just list operations. Sa带@SalvadorDali观察在b + b中任何a长度大小的切片中寻找a的匹配项的观察,这是仅使用列表操作的解决方案。

def rollmatch(a,b):
    bb=b*2
    return any(not any(ax^bbx for ax,bbx in zip(a,bb[i:])) for i in range(len(a)))

l1 = [1,0,0,1]
l2 = [1,1,0,0]
l3 = [1,0,1,0]

rollmatch(l1,l2)  # True
rollmatch(l1,l3)  # False

2nd approach: [deleted] 第二种方法:[已删除]

Not a complete, free-standing answer, but on the topic of optimizing by reducing comparisons, I too was thinking of normalized representations. 这不是一个完整的,独立的答案,但在通过减少比较来进行优化的主题上,我也在考虑标准化的表示形式。

Namely, if your input alphabet is {0, 1}, you could reduce the number of allowed permutations significantly. 即,如果您输入的字母为{0,1},则可以大大减少允许的排列数量。 Rotate the first list to a (pseudo-) normalized form (given the distribution in your question, I would pick one where one of the 1 bits is on the extreme left, and one of the 0 bits is on the extreme right). 将第一个列表旋转为(伪)归一化形式(考虑到问题中的分布,我将选择其中一个1位之一在最左端,而0个位之一在最右端的一个)。 Now before each comparison, successively rotate the other list through the possible positions with the same alignment pattern. 现在,在每次比较之前,以相同的对齐方式在可能的位置上依次旋转其他列表。

For example, if you have a total of four 1 bits, there can be at most 4 permutations with this alignment, and if you have clusters of adjacent 1 bits, each additional bit in such a cluster reduces the amount of positions. 例如,如果您总共有四个1位,则此对齐方式最多可以有4个排列,并且如果您具有相邻1位的簇,则此簇中的每个附加位都会减少位置数。

List 1   1 1 1 0 1 0

List 2   1 0 1 1 1 0  1st permutation
         1 1 1 0 1 0  2nd permutation, final permutation, match, done

This generalizes to larger alphabets and different alignment patterns; 这可以概括为较大的字母和不同的对齐方式。 the main challenge is to find a good normalization with only a few possible representations. 主要挑战是找到仅包含几种可能表示形式的良好归一化方法。 Ideally, it would be a proper normalization, with a single unique representation, but given the problem, I don't think that's possible. 理想情况下,这将是具有单个唯一表示形式的适当规范化,但是鉴于问题,我认为这是不可能的。

Building further on RocketRoy's answer: Convert all your lists up front to unsigned 64 bit numbers. 进一步建立在RocketRoy的答案上:将所有列表预先转换为无符号的64位数字。 For each list, rotate those 55 bits around to find the smallest numerical value. 对于每个列表,将其旋转55位以找到最小的数值。

You are now left with a single unsigned 64 bit value for each list that you can compare straight with the value of the other lists. 现在,每个列表都剩下一个无符号的64位值,您可以直接将其与其他列表的值进行比较。 Function is_circular_identical() is not required anymore. 不再需要函数is_circular_identical()。

(In essence, you create an identity value for your lists that is not affected by the rotation of the lists elements) That would even work if you have an arbitrary number of one's in your lists. (从本质上讲,您为列表创建了一个不受列表元素轮换影响的标识值)如果列表中有任意多个ID,则该值甚至可以工作。

This is the same idea of Salvador Dali but don't need the string convertion. 这与Salvador Dali的想法相同,但是不需要字符串转换。 Behind is the same KMP recover idea to avoid impossible shift inspection. 后面是相同的KMP恢复想法,以避免不可能的轮班检查。 Them only call KMPModified(list1, list2+list2). 它们仅调用KMPModified(list1,list2 + list2)。

    public class KmpModified
    {
        public int[] CalculatePhi(int[] pattern)
        {
            var phi = new int[pattern.Length + 1];
            phi[0] = -1;
            phi[1] = 0;

            int pos = 1, cnd = 0;
            while (pos < pattern.Length)
                if (pattern[pos] == pattern[cnd])
                {
                    cnd++;
                    phi[pos + 1] = cnd;
                    pos++;
                }
                else if (cnd > 0)
                    cnd = phi[cnd];
                else
                {
                    phi[pos + 1] = 0;
                    pos++;
                }

            return phi;
        }

        public IEnumerable<int> Search(int[] pattern, int[] list)
        {
            var phi = CalculatePhi(pattern);

            int m = 0, i = 0;
            while (m < list.Length)
                if (pattern[i] == list[m])
                {
                    i++;
                    if (i == pattern.Length)
                    {
                        yield return m - i + 1;
                        i = phi[i];
                    }
                    m++;
                }
                else if (i > 0)
                {
                    i = phi[i];
                }
                else
                {
                    i = 0;
                    m++;
                }
        }

        [Fact]
        public void BasicTest()
        {
            var pattern = new[] { 1, 1, 10 };
            var list = new[] {2, 4, 1, 1, 1, 10, 1, 5, 1, 1, 10, 9};
            var matches = Search(pattern, list).ToList();

            Assert.Equal(new[] {3, 8}, matches);
        }

        [Fact]
        public void SolveProblem()
        {
            var random = new Random();
            var list = new int[10];
            for (var k = 0; k < list.Length; k++)
                list[k]= random.Next();

            var rotation = new int[list.Length];
            for (var k = 1; k < list.Length; k++)
                rotation[k - 1] = list[k];
            rotation[rotation.Length - 1] = list[0];

            Assert.True(Search(list, rotation.Concat(rotation).ToArray()).Any());
        }
    }

Hope this help! 希望有帮助!

Simplifying The Problem 简化问题

  • The problem consist of list of ordered items 问题包括订购商品清单
  • The domain of value is binary (0,1) 值的域是二进制(0,1)
  • We can reduce the problem by mapping consecutive 1 s into a count 我们可以通过将连续的1 s映射为一个计数来减少问题
  • and consecutive 0 s into a negative count 并连续0 s为负数

Example

A = [ 1, 1, 1, 0, 0, 1, 1, 0 ]
B = [ 1, 1, 0, 1, 1, 1, 0, 0 ]
~
A = [ +3, -2, +2, -1 ]
B = [ +2, -1, +3, -2 ]
  • This process require that the first item and the last item must be different 此过程要求第一个项目和最后一个项目必须不同
  • This will reduce the amount of comparisons overall 这将减少总体比较量

Checking Process 检查流程

  • If we assume that they're duplicate, then we can assume what we are looking for 如果我们假设它们是重复的,那么我们可以假设我们正在寻找什么
  • Basically the first item from the first list must exist somewhere in the other list 基本上,第一个列表中的第一个项目必须存在于另一个列表中的某个位置
  • Followed by what is followed in the first list, and in the same manner 其次是第一个列表中的后续内容,并且以相同的方式
  • The previous items should be the last items from the first list 前面的项目应该是第一个列表中的最后一个项目
  • Since it's circular, the order is the same 因为是圆形的,所以顺序是一样的

The Grip 握力

  • The question here is where to start, technically known as lookup and look-ahead 这里的问题是从哪里开始,在技术上称为lookuplook-ahead
  • We will just check where the first element of the first list exist through the second list 我们将检查第二个列表中第一个列表中第一个元素的位置
  • The probability of frequent element is lower given that we mapped the lists into histograms 鉴于我们将列表映射到直方图中,频繁元素的概率较低

Pseudo-Code 伪代码

FUNCTION IS_DUPLICATE (LIST L1, LIST L2) : BOOLEAN

    LIST A = MAP_LIST(L1)
    LIST B = MAP_LIST(L2)

    LIST ALPHA = LOOKUP_INDEX(B, A[0])

    IF A.SIZE != B.SIZE
       OR COUNT_CHAR(A, 0) != COUNT_CHAR(B, ALPHA[0]) THEN

        RETURN FALSE

    END IF

    FOR EACH INDEX IN ALPHA

        IF ALPHA_NGRAM(A, B, INDEX, 1) THEN

            IF IS_DUPLICATE(A, B, INDEX) THEN

                RETURN TRUE

            END IF

        END IF

    END FOR

    RETURN FALSE

END FUNCTION

FUNCTION IS_DUPLICATE (LIST L1, LIST L2, INTEGER INDEX) : BOOLEAN

    INTEGER I = 0

    WHILE I < L1.SIZE DO

        IF L1[I] != L2[(INDEX+I)%L2.SIZE] THEN

            RETURN FALSE

        END IF

        I = I + 1

    END WHILE

    RETURN TRUE

END FUNCTION

Functions 功能

  • MAP_LIST(LIST A):LIST MAP CONSQUETIVE ELEMENTS AS COUNTS IN A NEW LIST MAP_LIST(LIST A):LIST以新列表中的计数MAP_LIST(LIST A):LIST MAP决定性元素

  • LOOKUP_INDEX(LIST A, INTEGER E):LIST RETURN LIST OF INDICES WHERE THE ELEMENT E EXIST IN THE LIST A LOOKUP_INDEX(LIST A, INTEGER E):LIST返回元素E在列表A存在的索引的列表

  • COUNT_CHAR(LIST A , INTEGER E):INTEGER COUNT HOW MANY TIMES AN ELEMENT E OCCUR IN A LIST A COUNT_CHAR(LIST A , INTEGER E):INTEGER LIST A元素E次数

  • ALPHA_NGRAM(LIST A,LIST B,INTEGER I,INTEGER N):BOOLEAN CHECK IF B[I] IS EQUIVALENT TO A[0] N-GRAM IN BOTH DIRECTIONS ALPHA_NGRAM(LIST A,LIST B,INTEGER I,INTEGER N):BOOLEAN如果B[I]在两个方向上均等价于A[0] N-GRAM ALPHA_NGRAM(LIST A,LIST B,INTEGER I,INTEGER N):BOOLEAN检查


Finally 最后

If the list size is going to be pretty huge or if the element we are starting to check the cycle from is frequently high, then we can do the following: 如果列表大小将非常庞大,或者如果我们开始检查周期的元素经常很高,则可以执行以下操作:

  • Look for the least-frequent item in the first list to start with 在第一个列表中查找频率最低的项目以

  • increase the n-gram N parameter to lower the probability of going through a the linear check 增加n-gram N参数以降低通过线性检查的可能性

An efficient, quick-to-compute "canonical form" for the lists in question can be derived as: 有关列表的有效,快速计算的“规范形式”可以得出:

  • Count the number of zeroes between the ones (ignoring wrap-around), to get three numbers. 计算一个之间的零数目(忽略环绕),以获得三个数字。
  • Rotate the three numbers so that the biggest number is first. 旋转三个数字,以便最大的数字位于第一位。
  • The first number ( a ) must be between 18 and 52 (inclusive). 第一个数字( a )必须介于1852 (含)之间。 Re-encode it as between 0 and 34 . 将其重新编码为034之间。
  • The second number ( b ) must be between 0 and 26 , but it doesn't matter much. 第二个数字( b )必须介于026之间,但这并不重要。
  • Drop the third number, since it's just 52 - (a + b) and adds no information 删除第三个数字,因为它只有52 - (a + b)并且不添加任何信息

The canonical form is the integer b * 35 + a , which is between 0 and 936 (inclusive), which is fairly compact (there are 477 circularly-unique lists in total). 规范形式是整数b * 35 + a ,介于0936之间(包括0936 ),该整数非常紧凑(总共有477圆形唯一列表)。

I wrote an straightforward solution which compares both lists and just increases (and wraps around) the index of the compared value for each iteration. 我写了一个简单的解决方案,它比较两个列表,并为每次迭代增加(并包装)比较值的索引。

I don't know python well so I wrote it in Java, but it's really simple so it should be easy to adapt it to any other language. 我不太了解python,所以我用Java编写了它,但是它非常简单,因此应该很容易将其适应任何其他语言。

By this you could also compare lists of other types. 这样,您还可以比较其他类型的列表。

public class Main {

    public static void main(String[] args){
        int[] a = {0,1,1,1,0};
        int[] b = {1,1,0,0,1};

        System.out.println(isCircularIdentical(a, b));
    }

    public static boolean isCircularIdentical(int[] a, int[]b){
        if(a.length != b.length){
            return false;
        }

        //The outer loop is for the increase of the index of the second list
        outer:
        for(int i = 0; i < a.length; i++){
            //Loop trough the list and compare each value to the according value of the second list
            for(int k = 0; k < a.length; k++){
                // I use modulo length to wrap around the index
                if(a[k] != b[(k + i) % a.length]){
                    //If the values do not match I continue and shift the index one further
                    continue outer;
                }
            }
            return true;
        }
        return false;
    }
}

As others have mentioned, once you find the normalized rotation of a list, you can compare them. 正如其他人提到的那样,一旦找到列表的标准化轮换,就可以对其进行比较。

Heres some working code that does this, Basic method is to find a normalized rotation for each list and compare: 这是执行此操作的一些工作代码,基本方法是为每个列表查找归一化的旋转并进行比较:

  • Calculate a normalized rotation index on each list. 计算每个列表上的归一化旋转索引。
  • Loop over both lists with their offsets, comparing each item, returning if they mis-match. 循环使用两个列表及其偏移量,比较每个项目,如果它们不匹配则返回。

Note that this method is it doesn't depend on numbers, you can pass in lists of strings (any values which can be compared). 请注意,此方法不依赖于数字,您可以传入字符串列表(可以比较的任何值)。

Instead of doing a list-in-list search, we know we want the list to start with the minimum value - so we can loop over the minimum values, searching until we find which one has the lowest successive values, storing this for further comparisons until we have the best. 我们知道我们希望列表以最小值开头,而不是进行列表中列表搜索,因此我们可以遍历最小值,直到找到哪个具有最低连续值为止,然后将其存储以进行进一步比较直到我们拥有最好的。

There are many opportunities to exit early when calculating the index, details on some optimizations. 计算索引时有很多机会提早退出,有关一些优化的详细信息。

  • Skip searching for the best minimum value when theres only one. 如果只有一个,则跳过搜索最佳最小值。
  • Skip searching minimum values when the previous is also a minimum value (it will never be a better match). 当前一个也是最小值时,跳过搜索最小值(永远不会是更好的匹配项)。
  • Skip searching when all values are the same. 当所有值都相同时,跳过搜索。
  • Fail early when lists have different minimum values. 列表具有不同的最小值时,较早失败。
  • Use regular comparison when offsets match. 偏移量匹配时使用常规比较。
  • Adjust offsets to avoid wrapping the index values on one of the lists during comparison. 调整偏移量以避免在比较期间将索引值包装在列表之一上。

Note that in Python a list-in-list search may well be faster, however I was interested to find an efficient algorithm - which could be used in other languages too. 请注意,在Python中,列表中列表搜索可能会更快,但是我很想找到一种有效的算法-也可以在其他语言中使用该算法。 Also, there is some advantage to avoiding to create new lists. 同样,避免创建新列表也有一些优势。

def normalize_rotation_index(ls, v_min_other=None):
    """ Return the index or -1 (when the minimum is above `v_min_other`) """

    if len(ls) <= 1:
        return 0

    def compare_rotations(i_a, i_b):
        """ Return True when i_a is smaller.
            Note: unless there are large duplicate sections of identical values,
            this loop will exit early on.
        """
        for offset in range(1, len(ls)):
            v_a = ls[(i_a + offset) % len(ls)]
            v_b = ls[(i_b + offset) % len(ls)]
            if v_a < v_b:
                return True
            elif v_a > v_b:
                return False
        return False

    v_min = ls[0]
    i_best_first = 0
    i_best_last = 0
    i_best_total = 1
    for i in range(1, len(ls)):
        v = ls[i]
        if v_min > v:
            v_min = v
            i_best_first = i
            i_best_last = i
            i_best_total = 1
        elif v_min == v:
            i_best_last = i
            i_best_total += 1

    # all values match
    if i_best_total == len(ls):
        return 0

    # exit early if we're not matching another lists minimum
    if v_min_other is not None:
        if v_min != v_min_other:
            return -1
    # simple case, only one minimum
    if i_best_first == i_best_last:
        return i_best_first

    # otherwise find the minimum with the lowest values compared to all others.
    # start looking after the first we've found
    i_best = i_best_first
    for i in range(i_best_first + 1, i_best_last + 1):
        if (ls[i] == v_min) and (ls[i - 1] != v_min):
            if compare_rotations(i, i_best):
                i_best = i

    return i_best


def compare_circular_lists(ls_a, ls_b):
    # sanity checks
    if len(ls_a) != len(ls_b):
        return False
    if len(ls_a) <= 1:
        return (ls_a == ls_b)

    index_a = normalize_rotation_index(ls_a)
    index_b = normalize_rotation_index(ls_b, ls_a[index_a])

    if index_b == -1:
        return False

    if index_a == index_b:
        return (ls_a == ls_b)

    # cancel out 'index_a'
    index_b = (index_b - index_a)
    if index_b < 0:
        index_b += len(ls_a)
    index_a = 0  # ignore it

    # compare rotated lists
    for i in range(len(ls_a)):
        if ls_a[i] != ls_b[(index_b + i) % len(ls_b)]:
            return False
    return True


assert(compare_circular_lists([0, 9, -1, 2, -1], [-1, 2, -1, 0, 9]) == True)
assert(compare_circular_lists([2, 9, -1, 0, -1], [-1, 2, -1, 0, 9]) == False)
assert(compare_circular_lists(["Hello" "Circular", "World"], ["World", "Hello" "Circular"]) == True)
assert(compare_circular_lists(["Hello" "Circular", "World"], ["Circular", "Hello" "World"]) == False)

See: this snippet for some more tests/examples. 请参阅: 此片段以获取更多测试/示例。

You can check to see if a list A is equal to a cyclic shift of list B in expected O(N) time pretty easily. 您可以很容易地检查列表A是否等于列表B在预期O(N)时间中的循环移位。

I would use a polynomial hash function to compute the hash of list A, and every cyclic shift of list B. Where a shift of list B has the same hash as list A, I'd compare the actual elements to see if they are equal. 我将使用多项式哈希函数来计算列表A的哈希,以及列表B的每个循环移位。如果列表B的移位具有与列表A相同的哈希,则我将比较实际元素以查看它们是否相等。

The reason this is fast is that with polynomial hash functions (which are extremely common!), you can calculate the hash of each cyclic shift from the previous one in constant time, so you can calculate hashes for all of the cyclic shifts in O(N) time. 之所以如此之快,是因为使用多项式哈希函数(这是非常常见的!),您可以在恒定时间内计算与上一个循环移位相比的每个循环移位的哈希,因此您可以为O()中的所有循环移位计算哈希N)时间。

It works like this: 它是这样的:

Let's say B has N elements, then the the hash of B using prime P is: 假设B有N个元素,那么使用质数P的B的哈希为:

Hb=0;
for (i=0; i<N ; i++)
{
    Hb = Hb*P + B[i];
}

This is an optimized way to evaluate a polynomial in P, and is equivalent to: 这是评估P中的多项式的一种优化方法,等效于:

Hb=0;
for (i=0; i<N ; i++)
{
    Hb += B[i] * P^(N-1-i);  //^ is exponentiation, not XOR
}

Notice how every B[i] is multiplied by P^(N-1-i). 注意如何将每个B [i]乘以P ^(N-1-i)。 If we shift B to the left by 1, then every every B[i] will be multiplied by an extra P, except the first one. 如果我们将B左移1,则每个B [i]将乘以一个额外的P,第一个除外。 Since multiplication distributes over addition, we can multiply all the components at once just by multiplying the whole hash, and then fix up the factor for the first element. 由于乘法分布在加法上,因此我们可以通过将整个哈希值乘以一次来乘以所有分量,然后为第一个元素固定因数。

The hash of the left shift of B is just B左移的哈希只是

Hb1 = Hb*P + B[0]*(1-(P^N))

The second left shift: 第二个左移:

Hb2 = Hb1*P + B[1]*(1-(P^N))

and so on... 等等...

NOTE: all math above is performed modulo some machine word size, and you only have to calculate P^N once. 注意:上面的所有数学运算都是以某​​些机器字长为模,并且您只需计算一次P ^ N。

To glue to the most pythonic way to do it, use sets ! 要粘合到最pythonic的方式,请使用set!

from sets import Set
a = Set ([1, 1, 1, 0, 0])
b = Set ([0, 1, 1, 1, 0]) 
c = Set ([1, 0, 0, 1, 1])
a==b
True
a==b==c
True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM