简体   繁体   中英

Need efficient algorithm in combinatorics

I am trying to find the best (realistic) algorithm for solving a cryptography challenge, in which:

  • the given cipher text C is made of about 6000 characters taken in the set S={A,B,C,...,Y,a,b,c,...y}. So |S| = 50.
  • the encryption scheme does not allow to have two identical adjacent characters in C
  • 25 letters in S are called Nulls, and are unknown
  • these Nulls must be removed from C to obtain the actual cipher text C' which can then be attacked.
  • the list of Nulls in C is named N and |N| is close to |C|/2 = 3000
  • so: |N| + |C'| = |C|

My aim is to identify the 25 Nulls, satisfying these two conditions:

  • there may not be two identical adjacent characters in C'
  • there may not be two identical adjacent Nulls in N

Obviously by brute force there are 50!/(25! 25!) = 126410606437752 combinations of 25 Nulls in S, so this is not a realistic approach.

I have tried to recursively explore the tree of sets of Nulls and 'cut branches' as much and as soon as possible. For example, when adding a letter of S to the subset of Nulls, if the sequence "x n1n2 x" appears in C where x is not yet a Null and n1n2 are Nulls, then x should be a Null too. However this is not enough for a run-time lower than a few centuries...

Can you think of a more clever algorithm for identifying these 25 Nulls ?

Note: there might be more than one set of Nulls satisfying the two conditions

lets try something like this:

  • Create a list of sets - each set contains one char from S. the set is the null chars.
  • while you have more then two sets:
    • for each set
      • search the cipher text for X[<set-chars>]+X
      • if found, union the set with the set X in it.
    • if no sets where united, start recursing with two sets united.

You can speed up things if you keep a different cipher text for each set, removing from it the chars in the set. if you do so, the search is easier - you are searching for XX, witch is constant length. every time you union two sets you need to remove all the chars in the sets from the cipher text.

The time this well take depends on the string C you are given.

An explanation about the sets - each set is an option for C' or N. If you find that A and X are in the same group, then {A, X} is either a subset of N or of C'. If later you will find the same about Y and B, then {Y, B} is a subset. Later, finding a substring YAXAXY means that Y is in the same group as A and X, and so will B, because it's with Y. At the end you will end with two groups - one for C' and one for N, witch you can't distinguish between.

elyashiv's method is the good one.

It is very fast.

I have produced the two sets C' and N, which are equivalent. The sub-sets of S, S1 and S2 which produce C' and N are adequately such that S = S1 U S2.

Thank you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM