简体   繁体   中英

Most efficient way to determine the type of sequence in a python list of 0s and 1s?

Say we have python lists of random size with values in a random sequence of 0s and 1s. What's a good approach to determining if the sequence is one of the following types up to 3 places and to return one of the following strings for "sequence type"?:

  • 0 (only has 0) [0,0,0]

  • 1 (only has 1) [1,1]

  • 01 (starts with 0 then encounters a 1 x number of digits after) [0,0,0,0,1]

  • 10 (inverse of 10) [1,0,0,0,0]

  • 010 (starts with 0, encounters a 1 x number of digits after the 0, then a 0 x number of digits after the 1) [0,0,0,1,0,1,1,0]

  • 101 (inverse of above) [1,1,1,1,1,0,0,1,0,1,0,1]

I can think of the simple cases and then a naive approach where you have nested loops and keep a counter, but is there a more elegant way to do this?

def sequence_type(sequence):
    
if 0 not in sequence:
    return '1'

elif 1 not in sequence:
    return '0'

else:
    if sequence[0] == 0:
        # loop through for sequence type 0xx
    elif sequence[0] == 1:
        # loop through for sequence type 1xx

Edit: We don't care about what is at the end of the sequence when checking for type, but instead what the sequence is when looking at the first 3 "unique" digits.

For example: [0,0,1,0,1,0] is type 010 because:

  • 0 is the first digit which is our "Starting off point"
  • then we go right, see that it is another 0 so it's not unique, skip over and move right again
  • then encounter a 1 and we log this one since it's unique, move right again
  • see that the digit is 0 and unique (we now counted 3 digits), so the pattern is 010.

You can return the sequence type by progressing through the bits. The first digit of the type is always equal to the first bit. If the inverse bit is found after the first position, then the type will have two or 3 digits with the second digit being the inverse of the first. If the bit is present after the position of the inverse then the sequence type is 3 digit (alternating again):

def seqType(seq):
    bit = seq[0]                 # seq type starts with first bit
    try: p = seq.index(1-bit)    # search position of inverse bit
    except: return str(bit)      # not found --> single digit type
    if bit not in seq[p+1:]:     # search for initial bit after inverse
        return f"{bit}{1-bit}"   # not found --> two digit type
    return f"{bit}{1-bit}{bit}"  # --> 3 digit type

    

output:

tests = ([0,0,0],[1,1],[0,0,0,0,1],[1,0,0,0,0],
         [0,0,0,1,0,1,1,0],[1,1,1,1,1,0,0,1,0,1,0,1])    
for seq in tests:
    print(seq,seqType(seq))
    
[0, 0, 0] 0
[1, 1] 1
[0, 0, 0, 0, 1] 01
[1, 0, 0, 0, 0] 10
[0, 0, 0, 1, 0, 1, 1, 0] 010
[1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1] 101 

If you'd like a more advanced approach, the type can be computed using the zip function to compact consecutive bits that are identical. The first 3 bits of the compacted sequence will correspond to the sequence type:

def seqType(seq):
    return "".join(str(a) for a,b in zip(seq,seq[1:]+[2]) if a!=b)[:3])

Or, if you like recursive solutions:

def seqType(seq):
    if len(seq) == 1:  return str(seq[0])
    if seq[0]==seq[1]: return seqType(seq[1:])
    return str(seq[0]) + seqType(seq[1:])[:2]

If I understand the you correctly, you're parsing a regular language , so you can implement it as a finite-state automaton .

Here's an example in C:

bool start(char *s) {
  if (*s) {
    if (*s++ == '0') A(s);
    else B(s);
  } else /* empty string doesn't match anything (or it's both a type "0" and type "1" */ 
}

bool A(char *s) {
  if (*s) {
    if (*s++ == '0') A(s);
    else C(s);
  } else /* "0" */
}

bool B(char *s) {
  if (*s) {
    if (*s++ == '0') D(s);
    else E(s)
  } else /* "1" */
}

bool C(char *s) {
  if (*s) {
    if (*s++ == '0') /* "010" */
    else /* Should be illegal. */
  } else /* "01" */
}

bool D(char *s) {
  if (*s) {
    if (*s++ == '0') G(s);
    else /* "101" */
  } else /* "10" */

bool E(char *s) {
  if (*s) {
    if (*s++ == '0') F(s);
    else E(s);
  } else /* "1" */
}

bool F(char *s) {
  if (*s) {
    if (*s++ == '0') /* illegal */
    else /* "101" */
  } else 
}

bool G(char *s) {
  if (*s) {
    if (*s++ == '0') G(s);
    else /* "101" */
  } else /* "10" */
}

This may not be the most concise way to encode a finite-state automaton, but something along these lines should work—and finite automata are certainly elegant structures.

You could also regex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM