简体   繁体   English

通过删除具有替代字符的子序列将二进制字符串减少为空字符串

[英]Reduce binary string to an empty string by removing subsequences with alternative characters

This was a question asked in the coding round for NASDAQ internship.这是在纳斯达克实习的编码回合中提出的一个问题。

Program description:程序说明:

The program takes a binary string as input.该程序将一个二进制字符串作为输入。 We have to successively remove sub-sequences having all characters alternating, till the string is empty.我们必须连续删除所有字符交替的子序列,直到字符串为空。 The task was to find the minimum number of steps required to do so.任务是找到这样做所需的最少步骤数。

Example1:示例 1:
let the string be : 0111001让字符串为:0111001
Removed-0101, Remaining-110 Removed-0101, Remaining-110
Removed-10 , Remaining-1 Removed-10 ,Remaining-1
Removed-1删除-1
No of steps = 3步骤数 = 3

Example2:示例2:
let the string be : 111000111让字符串为:111000111
Removed-101, Remaining-110011 Removed-101, Remaining-110011
Removed-101, Remaining-101 Removed-101, Remaining-101
Removed-101删除-101
No of steps = 3步骤数 = 3

Example3:示例3:
let the string be : 11011让字符串为:11011
Removed-101, Remaining-11 Removed-101,Remaining-11
Removed-1 , Remaining-1 Removed-1 , Remaining-1
Removed-1删除-1
No of steps = 3步骤数 = 3

Example4:示例 4:
let the string be : 10101让字符串为:10101
Removed-10101删除-10101
No of steps = 1步数 = 1

The solution I tried, considered the first character of the binary string as first character for my sub-sequence.我尝试的解决方案将二进制字符串的第一个字符视为我的子序列的第一个字符。 Then created a new string, where the next character would be appended if it wasn't part of the alternating sequence.然后创建一个新字符串,如果下一个字符不是交替序列的一部分,则将附加到该字符串中。 The new string becomes our binary string.新字符串成为我们的二进制字符串。 In this way, a loop continues till the new string is empty.这样,循环一直持续到新字符串为空。 (somewhat an O(n^2) algorithm). (有点 O(n^2) 算法)。 As expected, it gave me a timeout error.正如预期的那样,它给了我一个超时错误。 Adding a somewhat similar code in C++ to the one I had tried, which was in Java.在 C++ 中添加了一些与我尝试过的 Java 代码类似的代码。

    #include<bits/stdc++.h>
    using namespace std;
    
    int main() {
        string str, newStr;
        int len;
        char c;
        int count = 0;
        getline(cin, str);
        len = str.length();
    
        //continue till string is empty
        while(len > 0) {
            len = 0;
            c = str[0];
            for(int i=1; str[i] != '\0';i++) {
                //if alternative characters are found, set as c and avoid that character
                if(c != str[i]) 
                    c = str[i];
                //if next character is not alternate, add the character to newStr
                else {
                    newStr.push_back(str[i]);
                    len++;
                }
            }
            str = newStr;
            newStr = "";
            count++;
        }
        cout<<count<<endl;
        return 0;
    }

I also tried methods like finding the length of the largest sub sequence of same consecutive characters which obviously didn't satisfy every case, like that of example3.我还尝试过寻找相同连续字符的最大子序列的长度等方法,但显然不能满足所有情况,例如example3。

Hope somebody could help me with the most optimized solution for this question.希望有人可以帮助我为这个问题提供最优化的解决方案。 Preferably a code in C, C++ or python.最好是 C、C++ 或 python 中的代码。 Even the algorithm would do.甚至算法也可以。

I found a more optimal O(NlogN) solution by maintaining a Min-Heap and Look-up hashMap.我通过维护一个最小堆和查找哈希映射找到了一个更优化的O(NlogN)解决方案。

We start with the initial array as alternating counts of 0, 1.我们从初始数组作为 0、1 的交替counts开始。

That is, for string= 0111001 ;也就是说,对于string= 0111001 ; lets assume our input-array S=[1,3,2,1]让我们假设我们的输入数组S=[1,3,2,1]

Basic idea:基本思路:

  1. Heapify the count-array堆化计数数组
  2. Extract minimum count node => add to num_steps提取最小计数节点 => 添加到 num_steps
  3. Now extract both its neighbours (maintained in the Node-class) from the Heap using the lookup-map现在使用查找图从堆中提取它的两个邻居(在节点类中维护)
  4. Merge both these neighbours and insert into the Heap合并这两个邻居并插入到堆中
  5. Repeat steps 2-4 until no entries remain in the Heap重复步骤 2-4 直到堆中没有任何条目

Code implementation in Python Python中的代码实现

class Node:
    def __init__(self, node_type: int, count: int):
        self.prev = None
        self.next = None
        self.node_type = node_type
        self.node_count = count

    @staticmethod
    def compare(node1, node2) -> bool:
        return node1.node_count < node2.node_count


def get_num_steps(S: list): ## Example: S = [2, 1, 2, 3]
    heap = []
    node_heap_position_map = {} ## Map[Node] -> Heap-index
    prev = None
    type = 0
    for s in S:
        node: Node = Node(type, s)
        node.prev = prev
        if prev is not None:
            prev.next = node
        prev = node
        type = 1 - type

        # Add element to the map and also maintain the updated positions of the elements for easy lookup
        addElementToHeap(heap, node_heap_position_map, node)

    num_steps = 0
    last_val = 0
    while len(heap) > 0:
        # Extract top-element and also update the positions in the lookup-map
        top_heap_val: Node = extractMinFromHeap(heap, node_heap_position_map)
        num_steps += top_heap_val.node_count - last_val
        last_val = top_heap_val.node_count

        # If its the corner element, no merging is required
        if top_heap_val.prev is None or top_heap_val.next is None:
            continue

        # Merge the nodes adjacent to the extracted-min-node:
        prev_node = top_heap_val.prev
        next_node = top_heap_val.next

        removeNodeFromHeap(prev_node, node_heap_position_map)
        removeNodeFromHeap(next_node, node_heap_position_map)
        del node_heap_position_map[prev_node]
        del node_heap_position_map[next_node]
        
        # Created the merged-node for neighbours and add to the Heap; and update the lookup-map
        merged_node = Node(prev_node.node_type, prev_node.node_count + next_node.node_count)
        merged_node.prev = prev_node.prev
        merged_node.next = next_node.next
        addElementToHeap(heap, node_heap_position_map, merged_node)

    return num_steps

PS: I havent implemented the Min-heap operations above, but the function-method-names are quite eponymous. PS:我还没有实现上面的最小堆操作,但是函数方法名称是相当同名的。

I won't write the full code.我不会写完整的代码。 But I have an idea of an approach that will probably be fast enough (certainly faster than building all of the intermediate strings).但是我有一种方法的想法,该方法可能足够快(当然比构建所有中间字符串还要快)。

Read the input and change it to a representation that consists of the lengths of sequences of the same character.读取输入并将其更改为由相同字符的序列长度组成的表示。 So 11011 is represented with a structure that specifies it something like [{length: 2, value: 1}, {length: 1, value: 0}, {length: 2, value: 1}] .所以 11011 用一个结构来表示,该结构指定它类似于[{length: 2, value: 1}, {length: 1, value: 0}, {length: 2, value: 1}] With some cleverness you can drop the values entirely and represent it as [2, 1, 2] - I'll leave that as an exercise for the reader.稍微聪明一点,您可以完全删除这些值并将其表示为[2, 1, 2] - 我将把它留给读者作为练习。

With that representation you know that you can remove one value from each of the identified sequences of the same character in each "step".通过这种表示,您知道可以从每个“步骤”中相同字符的每个已识别序列中删除一个值。 You can do this a number of times equal to the smallest length of any of those sequences.您可以执行此操作的次数等于任何这些序列的最小长度。

So you identify the minimum sequence length, add that to a total number of operations that you're tracking, then subtract that from every sequence's length.因此,您确定最小序列长度,将其添加到您正在跟踪的操作总数中,然后从每个序列的长度中减去它。

After doing that, you need to deal with sequences of 0 length.这样做之后,您需要处理长度为 0 的序列。 - Remove them, then if there are any adjacent sequences of the same value, merge those (add together the lengths, remove one). - 删除它们,然后如果有任何具有相同值的相邻序列,则合并它们(将长度相加,删除一个)。 This merging step is the one that requires some care if you're going for the representation that forgets the values.如果您要进行忘记值的表示,则此合并步骤需要小心。

Keep repeating this until there's nothing left.继续重复此操作,直到没有任何剩余。 It should run somewhat faster than dealing with string manipulations.它的运行速度应该比处理字符串操作要快一些。

There's probably an even better approach that doesn't iterate through the steps at all after making this representation, just examining the lengths of sequences starting at the start in one pass through to the end.可能有一种更好的方法,在进行这种表示后根本不迭代这些步骤,只检查从开始到结束的序列的长度。 I haven't worked out what that approach is exactly, but I'm reasonably confident that it would exist.我还没有弄清楚这种方法到底是什么,但我有理由相信它会存在。 After trying what I've outlined above, working that out is a good idea.在尝试了我上面概述的内容之后,解决这个问题是个好主意。 I have a feeling it's something like - start total at 0, keep track of minimum and maximum total reaches.我有一种感觉,就像 - 从 0 开始总计,跟踪最小和最大总到达。 Scan each value from the start of string, adding 1 to the total for each 1 encountered, subtracting 1 for each 0 encountered.扫描字符串开头的每个值,每遇到 1 就将总数加 1,每遇到 0 就减 1。 The answer is the greater of the absolute values of the minimum and maximum reached by total.答案是 total 达到的最小值和最大值的绝对值中的较大者。 - I haven't verified that, it's just a hunch. - 我还没有证实,这只是一种预感。 Comments have lead to further speculation that doing this but adding together the maximum and absolute of minimum may be more realistic.评论导致进一步猜测,这样做但将最大值和最小值的绝对值加在一起可能更现实。

We can solve this in O(n) time and O(1) space.我们可以在O(n)时间和O(1)空间内解决这个问题。

This isn't about order at all.这根本不是为了秩序。 The actual task, when you think about it, is how to divide the string into the least number of subsequences that consist of alternating characters (where a single is allowed).仔细想想,实际任务是如何将字符串分成最少数量的由交替字符组成的子序列(允许单个字符)。 Just maintain two queues or stacks;只需维护两个队列或堆栈; one for 1s, the other for 0s, where characters pop their immediate alternate predecessors.一个用于 1,另一个用于 0,其中字符弹出它们的直接替代前辈。 Keep a record of how long the queue is at any one time during the iteration (not including the replacement moves).记录迭代期间任何时候的队列长度(不包括替换移动)。

Examples:例子:

(1) (1)

0111001

   queues
1  1   -
0  -   0
0  -   00
1  1   0
1  11  -
1  111 -  <- max 3
0  11  0

For O(1) space, The queues can just be two numbers representimg the current counts.对于O(1)空间,队列可以只是代表当前计数的两个数字。

(2) (2)

111000111
   queues (count of 1s and count of 0s)
1  1  0
1  2  0
1  3  0  <- max 3
0  2  1
0  1  2
0  0  3  <- max 3
1  1  2
1  2  1
1  3  0  <- max 3

(3) (3)

11011
   queues
1  1  0
1  2  0
0  1  1
1  2  0
1  3  0  <- max 3

(4) (4)

10101

   queues
1  1  0  <- max 1
0  0  1  <- max 1
1  1  0  <- max 1
0  0  1  <- max 1
1  1  0  <- max 1

Time complexity - O(n)时间复杂度 - O(n)

void solve(string s) {
    int n = s.size();
    int zero = 0, One = 0, res = 0;
    
    for (int i = 0; i < n; i++) 
    {
        if (s[i] == '1') 
        {
            if (zero > 0) 
                zero--;
            else 
                res++;
            
            One++;
        }
        
        else
        {
            if (One > 0) 
                One--;
            else 
                res++;
            
            zero++;
        }
    }
    cout << res << endl;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM