简体   繁体   English

面试题:从未排序的链表中删除重复项

[英]Interview question: remove duplicates from an unsorted linked list

I'm reading Cracking the Coding Interview, Fourth Edition: 150 Programming Interview Questions and Solutions and I'm trying to solve the following question:我正在阅读破解编码面试,第四版:150 个编程面试问题和解决方案,我正在尝试解决以下问题:

2.1 Write code to remove duplicates from an unsorted linked list. 2.1 编写代码以从未排序的链表中删除重复项。 FOLLOW UP: How would you solve this problem if a temporary buffer is not allowed?跟进:如果不允许使用临时缓冲区,您将如何解决此问题?

I'm solving it in C#, so I made my own Node class:我正在用 C# 解决它,所以我创建了自己的Node类:

public class Node<T> where T : class
{
    public Node<T> Next { get; set; }
    public T Value { get; set; }

    public Node(T value)
    {
        Next = null;
        Value = value;
    }
}

My solution is to iterate through the list, then for each node to iterated through the remainder of the list and remove any duplicates (note that I haven't actually compiled or tested this, as instructed by the book):我的解决方案是遍历列表,然后让每个节点遍历列表的其余部分并删除任何重复项(请注意,我没有按照本书的说明实际编译或测试过这个):

public void RemoveDuplicates(Node<T> head)
{
    // Iterate through the list
    Node<T> iter = head;
    while(iter != null)
    {
        // Iterate to the remaining nodes in the list
        Node<T> current = iter;
        while(current!= null && current.Next != null)
        {
            if(iter.Value == current.Next.Value)
            {
                current.Next = current.Next.Next;
            }

            current = current.Next;
        }    

        iter = iter.Next;
    }
}

Here is the solution from the book (the author wrote it in java):这是书中的解决方案(作者用java编写):

Without a buffer, we can iterate with two pointers: “current” does a normal iteration, while “runner” iterates through all prior nodes to check for dups.如果没有缓冲区,我们可以使用两个指针进行迭代:“current”进行正常迭代,而“runner”迭代所有先前的节点以检查重复。 Runner will only see one dup per node, because if there were multiple duplicates they would have been removed already. Runner 每个节点只会看到一个副本,因为如果有多个副本,它们已经被删除了。

public static void deleteDups2(LinkedListNode head) 
{
    if (head == null) return;

    LinkedListNode previous = head;
    LinkedListNode current = previous.next;

    while (current != null) 
    {
        LinkedListNode runner = head;

        while (runner != current) { // Check for earlier dups
            if (runner.data == current.data) 
            {
                LinkedListNode tmp = current.next; // remove current
                previous.next = tmp;
                current = tmp; // update current to next node
                break; // all other dups have already been removed
            }
            runner = runner.next;
        }
        if (runner == current) { // current not updated - update now
            previous = current;
            current = current.next;
        }
    }
}

So my solution always looks for duplicates for the current node to the end, while their solution looks for duplicates from the head to the current node.所以我的解决方案总是寻找当前节点的重复到最后,而他们的解决方案寻找从头到当前节点的重复。 I feel like both solutions would suffer performance issues depending on how many duplicates there are in the list and how they're distributed (density and position).我觉得这两种解决方案都会遇到性能问题,具体取决于列表中有多少重复项以及它们的分布方式(密度和位置)。 But in general: is my answer nearly as good as the one in the book or is it significantly worse?但总的来说:我的答案几乎和书中的答案一样好还是明显更糟?

If you give a person a fish, they eat for a day. 如果你给一个人一条鱼,他们会吃一天。 If you teach a person to fish... 如果你教一个人钓鱼......

My measures for the quality of an implementation are: 我对实施质量的衡量标准是:

  • Correctness : If you aren't getting the right answer in all cases, then it isn't ready 正确性 :如果你在所有情况下都没有得到正确答案,那么它还没有准备好
  • Readability/maintainability : Look at code repetition, understandable names, the number of lines of code per block/method (and the number of things each block does), and how difficult it is to trace the flow of your code. 可读性/可维护性 :查看代码重复,可理解的名称,每个块/方法的代码行数(以及每个块执行的操作数),以及跟踪代码流的难度。 Look at any number of books focused on refactoring, programming best-practices, coding standards, etc, if you want more information on this. 如果您想了解更多相关信息,请查看任何数量的书籍,重点关注重构,编程最佳实践,编码标准等。
  • Theoretical performance (worst-case and ammortized): Big-O is a metric you can use. 理论性能 (最坏情况和最重要的): Big-O是您可以使用的指标。 CPU and memory consumption should both be measured 应该测量CPU和内存消耗
  • Complexity : Estimate how it would take an average professional programmer to implement (if they already know the algorithm). 复杂性 :估计一般的专业程序员如何实施(如果他们已经知道算法)。 See if that is in line with how difficult the problem actually is 看看这是否符合实际问题的难度

As for your implementation: 至于你的实施:

  • Correctness : I suggest writing unit tests to determine this for yourself and/or debugging it (on paper) from start to finish with interesting sample/edge cases. 正确性 :我建议编写单元测试来自己确定和/或从头到尾调试它(在纸上)有趣的样本/边缘情况。 Null, one item, two items, various numbers of duplicates, etc 空,一项,两项,各种重复项等
  • Readability/maintainability : It looks mostly fine, though your last two comments don't add anything. 可读性/可维护性 :虽然你的最后两条评论没有添加任何内容,但它看起来很好。 It is a bit more obvious what your code does than the code in the book 你的代码比书中的代码更明显
  • Performance : I believe both are N-squared. 表现 :我相信两者都是N平方。 Whether the amortized cost is lower on one or the other I'll let you figure out :) 无论摊销成本是否较低,我都会让你弄明白:)
  • Time to implement : An average professional should be able to code this algorithm in their sleep, so looking good 实施时间 :普通专业人员应该能够在睡眠中对此算法进行编码,因此看起来很好

There's not much of a difference. 没有太大的区别。 If I've done my math right your's is on average N/16 slower than the authors but pleanty of cases exist where your implementation will be faster. 如果我的数学运算正确,你的平均N / 16比作者慢,但是你的实现速度会更快。

Edit: 编辑:

I'll call your implementation Y and the author's A 我将把你的实现Y和作者的A称为

Both proposed solutions has O(N^2) as worst case and they both have a best case of O(N) when all elements are the same value. 两种提出的解决方案都具有O(N ^ 2)作为最坏情况,并且当所有元素是相同值时它们都具有O(N)的最佳情况。

EDIT: This is a complete rewrite. 编辑:这是一个完整的重写。 Inspired by the debat in the comments I tried to find the average case for random N random numbers. 受到评论中的争议的启发,我试图找到随机N个随机数的平均情况。 That is a sequence with a random size and a random distribution. 这是具有随机大小和随机分布的序列。 What would the average case be. 平均情况是什么?

Y will always run U times where U is the number of unique numbers. Y将始终运行U次,其中U是唯一数字的数量。 For each iteration it will do NX comparisons where X is the number of elements removed prior to the iteration (+1). 对于每次迭代,它将进行NX比较,其中X是在迭代之前移除的元素的数量(+1)。 The first time no element will have been removed and on average on the second iteration N/U will have been removed. 第一次没有元素被移除,并且在第二次迭代时平均移除N / U.

That is on average ½N will been left to iterate. 这是平均½N将被重复。 We can express the average cost as U*½N. 我们可以将平均成本表示为U *½N。 The average U can be expressed based on N as well 0 平均U可以基于N表示,也可以表示0

Expressing A becomes more difficult. 表达A变得更加困难。 Let's say we use I iterations before we've encountered all unique values. 假设我们在遇到所有唯一值之前使用迭代。 After that will run between 1 and U comparisons (on average that's U/") and will do that NI times. 之后将在1和U之间进行比较(平均为U /“)并且将执行NI时间。

I*c+U/2(NI) 我* C + U / 2(NI)

but whats the average number of comparisons (c) we run for the first I iterations. 但是我们在第一次迭代中运行的平均比较次数(c)是多少? on average we need to compare against half of the elements already visited and on average we've visited I/2 elements, Ie. 平均而言,我们需要与已经访问过的元素的一半进行比较,平均而言我们已经访问了I / 2元素,即。 c=I/4 C = I / 4

I/4+U/2(NI). I / 4 + U / 2(NI)。

I can be expressed in terms of N. On average we'll need to visited half on N to find the unique values so I=N/2 yielding an average of 我可以用N表示。平均而言,我们需要在N上找到一半来找到唯一值,所以I = N / 2得到平均值

(I^2)/4+U/2(NI) which can be reduced to (3*N^2)/16. (I ^ 2)/ 4 + U / 2(NI)可以减少到(3 * N ^ 2)/ 16。

That is of course if my estimation of the averages are correct. 当然,如果我对平均值的估计是正确的。 That is on average for any potential sequence A has N/16 fewer comparisons than Y but pleanty of cases exists where Y is faster than A. So I'd say they are equal when compared to the number of comparisons 对于任何潜在序列来说,平均而言,A的比较比Y少了N / 16,但是在Y比A快的情况下存在很多情况。所以我认为它们与比较的数量相比是相等的。

How about using a HashMap? 使用HashMap怎么样? This way it will take O(n) time and O(n) space. 这样就需要O(n)时间和O(n)空间。 I will write psuedocode. 我会写psuedocode。

function removeDup(LinkedList list){
  HashMap map = new HashMap();
  for(i=0; i<list.length;i++)
      if list.get(i) not in map
        map.add(list.get(i))
      else
        list.remove(i)
      end
  end
end

Of course we assume that HashMap has O(1) read and write. 当然我们假设HashMap有O(1)读写。

Another solution is to use a mergesort and removes duplicate from start to end of the list. 另一种解决方案是使用mergesort并从列表的开头到结尾删除重复项。 This takes O(n log n) 这需要O(n log n)

mergesort is O(n log n) removing duplicate from a sorted list is O(n). mergesort是O(n log n),从排序列表中删除重复是O(n)。 do you know why? 你知道为什么吗? therefore the entire operation takes O(n log n) 因此整个操作需要O(n log n)

Here's the implementation using HashSet in O(n) time. 这是在O(n)时间内使用HashSet的实现。

I have used a hashset to store unique values and 2 node-pointers to traverse through the linkedlist. 我使用了一个hashset来存储唯一值和两个节点指针来遍历链表。 If a duplicate is found, assign the value of current pointer to the previous pointer. 如果找到重复项,则将当前指针的值赋给前一个指针。

This will ensure removal of duplicate records. 这将确保删除重复记录。

    /// <summary>
    /// Write code to remove duplicates from an unsorted linked list.
    /// </summary>
    class RemoveDups<T>
    {
        private class Node
        {
            public Node Next;
            public T Data;
            public Node(T value)
            {
                this.Data = value;
            }
        }

        private Node head = null;

        public static void MainMethod()
        {
            RemoveDups<int> rd = new RemoveDups<int>();
            rd.AddNode(15);
            rd.AddNode(10);
            rd.AddNode(15);
            rd.AddNode(10);
            rd.AddNode(10);
            rd.AddNode(20);
            rd.AddNode(30);
            rd.AddNode(20);
            rd.AddNode(30);
            rd.AddNode(35);

            rd.PrintNodes();
            rd.RemoveDuplicates();

            Console.WriteLine("Duplicates Removed!");
            rd.PrintNodes();
        }

        private void RemoveDuplicates()
        {
            //use a hashtable to remove duplicates
            HashSet<T> hs = new HashSet<T>();
            Node current = head;
            Node prev = null;

            //loop through the linked list
            while (current != null)
            {
                if (hs.Contains(current.Data))
                {
                    //remove the duplicate record
                    prev.Next = current.Next;
                }
                else
                {
                    //insert element into hashset
                    hs.Add(current.Data);
                    prev = current;
                }
                current = current.Next;

            }
        }

        /// <summary>
        /// Add Node at the beginning
        /// </summary>
        /// <param name="val"></param>
        public void AddNode(T val)
        {
            Node newNode = new Node(val);
            newNode.Data = val;
            newNode.Next = head;
            head = newNode;
        }

        /// <summary>
        /// Print nodes
        /// </summary>
        public void PrintNodes()
        {
            Node current = head;
            while (current != null)
            {
                Console.WriteLine(current.Data);
                current = current.Next;
            }
        }
    }

Heapsort is an in-place sort. Heapsort是一种就地排序。 You could modify the "siftUp" or "siftDown" function to simply remove the element if it encounters a parent that is equal. 您可以修改“siftUp”或“siftDown”函数,以便在遇到相等的父级时简单地删除该元素。 This would be O(n log n) 这将是O(n log n)

function siftUp(a, start, end) is
 input:  start represents the limit of how far up the heap to sift.
               end is the node to sift up.
 child := end 
 while child > start
     parent := floor((child - 1) ÷ 2)
     if a[parent] < a[child] then (out of max-heap order)
         swap(a[parent], a[child])
         child := parent (repeat to continue sifting up the parent now)
     else if a[parent] == a[child] then
         remove a[parent]
     else
         return

Code in java: java中的代码:

public static void dedup(Node head) {
    Node cur = null;
    HashSet encountered = new HashSet();

    while (head != null) {
        encountered.add(head.data);
        cur = head;
        while (cur.next != null) {
            if (encountered.contains(cur.next.data)) {
                cur.next = cur.next.next;
            } else {
                break;
            }
        }
        head = cur.next;
    }
}

Tried the same in cpp. 在cpp尝试过同样的事情。 Please let me know your comments on this. 请告诉我你对此的评论。

// ConsoleApplication2.cpp : Defines the entry point for the console application. // ConsoleApplication2.cpp:定义控制台应用程序的入口点。 // //

#include "stdafx.h"
#include <stdlib.h>
struct node
{
    int data;
    struct node *next;
};
struct node *head = (node*)malloc(sizeof(node));
struct node *tail = (node*)malloc(sizeof(node));

struct node* createNode(int data)
{
    struct node *newNode = (node*)malloc(sizeof(node));
    newNode->data = data;
    newNode->next = NULL;
    head = newNode;
    return newNode;
}

bool insertAfter(node * list, int data)
{
    //case 1 - insert after head
    struct node *newNode = (node*)malloc(sizeof(node));
    if (!list)
    {

        newNode->data = data;
        newNode->next = head;
        head = newNode;
        return true;
    }

    struct node * curpos = (node *)malloc(sizeof(node));
    curpos = head;
    //case 2- middle, tail of list
    while (curpos)
    {
        if (curpos == list)
        {
            newNode->data = data;
            if (curpos->next == NULL)
            {
            newNode->next = NULL;
            tail = newNode;
            }
            else
            {
                newNode->next = curpos->next;
            }
            curpos->next = newNode;
            return true;
        }
        curpos = curpos->next;
    }
}

void deleteNode(node *runner, node * curr){

    //DELETE AT TAIL
    if (runner->next->next == NULL)
    {
        runner->next = NULL;        
    }
    else//delete at middle
    {
        runner = runner->next->next;
        curr->next = runner;
    }
    }


void removedups(node * list)
{
    struct node * curr = (node*)malloc(sizeof(node));
    struct node * runner = (node*)malloc(sizeof(node));
    curr = head;
    runner = curr;
    while (curr != NULL){
        runner = curr;
        while (runner->next != NULL){
            if (curr->data == runner->next->data){
                deleteNode(runner, curr);
            }
            if (runner->next!=NULL)
            runner = runner->next;
        }
        curr = curr->next;
    }
}
int _tmain(int argc, _TCHAR* argv[])
{
    struct node * list = (node*) malloc(sizeof(node));
    list = createNode(1);
    insertAfter(list,2);
    insertAfter(list, 2);
    insertAfter(list, 3);   
    removedups(list);
    return 0;
}

Code in C: C中的代码:

    void removeduplicates(N **r)
    {
        N *temp1=*r;
        N *temp2=NULL;
        N *temp3=NULL;
        while(temp1->next!=NULL)
        {
            temp2=temp1;
            while(temp2!=NULL)
            {
                temp3=temp2;
                temp2=temp2->next;
                if(temp2==NULL)
                {
                    break;
                }
                if((temp2->data)==(temp1->data))
                {
                    temp3->next=temp2->next;
                    free(temp2);
                    temp2=temp3;
                    printf("\na dup deleted");
                }
            }
            temp1=temp1->next;
        }

    }

Here's the answer in C 这是C中的答案

    void removeduplicates(N **r)
    {
        N *temp1=*r;
        N *temp2=NULL;
        N *temp3=NULL;
        while(temp1->next!=NULL)
        {
            temp2=temp1;
            while(temp2!=NULL)
            {
                temp3=temp2;
                temp2=temp2->next;
                if(temp2==NULL)
                {
                    break;
                }
                if((temp2->data)==(temp1->data))
                {
                    temp3->next=temp2->next;
                    free(temp2);
                    temp2=temp3;
                    printf("\na dup deleted");
                }
            }
            temp1=temp1->next;
        }

    }

您的解决方案与作者一样好,只有它在实现中有错误:)尝试在具有相同数据的两个节点的列表上进行跟踪。

Your approach is simply specular to the book's! 你的方法只是本书的镜面! You go forward, the book goes backward. 你往前走,这本书倒退了。 There is no difference as both of you scan all elements. 没有区别,因为你们都扫描所有元素。 And, yes, since no buffer is allowed, there are performance issues. 并且,是的,因为不允许缓冲区,所以存在性能问题。 You usually don't have to mind about performance with such costrained questions and when it's not explicitly required. 您通常不必考虑使用此类经过培训的问题以及未明确要求的情况。

Interview questions are made to test your open mindness. 面试问题是为了测试你的开放思想。 I have doubts about Mark's answer: it definitely is the best solution in real-world examples, but even if these algorithms use constant space, the constraint that no temporary buffer is allowed must be respected. 我对马克的回答质疑:这绝对是真实世界的例子最好的解决办法,但即使这些算法使用恒定的空间, 没有临时缓冲区允许的约束必须得到尊重。

Otherwise, I guess that the book would have adopted such an approach. 否则,我想这本书会采用这种方法。 Mark, please forgive me for being critic against you. 马克,请原谅我批评你。

Anyway, just to go deeper in the matter, yours and the book's approach both require Theta(n^2) time, while Mark's approach requires Theta(n logn) + Theta(n) time, which results in Theta(n logn) . 无论如何,只是为了更深入地解决这个问题,你和本书的方法都需要Theta(n^2)时间,而Mark的方法需要Theta(n logn) + Theta(n)时间,这导致Theta(n logn) Why Theta ? 为什么Theta Because compare-swap algorithms are Omega(n logn) too, remember! 因为比较交换算法也是Omega(n logn) ,所以请记住!

C# Code for removing duplicates left after the first set of iteration: C#代码用于删除第一组迭代后留下的重复项:

 public Node removeDuplicates(Node head) 
    {
        if (head == null)
            return head;

        var current = head;
        while (current != null)
        {
            if (current.next != null && current.data == current.next.data)
            {
                current.next = current.next.next;
            }
            else { current = current.next; }
        }

        return head;
    }

Hacker Rank Day24:More Linked Lists,Removing duplicate Node in C#. Hacker Rank Day24:More Linked Lists,Removing duplicate Node in C#。

static Node RemoveDuplicateNode(Node head)
    {
        Node Link = head;
        Node Previous;
        Node DulicateNode;
        int count = 0,temp;
        while (Link != null)
        {
            temp = Link.data;
            DulicateNode = Link;
            Previous = Link;
            while(DulicateNode != null)
            {
                if(DulicateNode.data==temp)
                {
                    Previous.data = DulicateNode.data;
                    Previous.next = DulicateNode.next;
                    ++count;
                }
                if(count>=2)
                {
                   if(DulicateNode.next != null)
                    {
                        DulicateNode.data = DulicateNode.next.data;
                        DulicateNode.next = DulicateNode.next.next;
                    }
                   else
                        DulicateNode=null;
                }
                else
                DulicateNode = DulicateNode.next;
            }
            count = 0;
            Link = Link.next;
        }
    
    
        return head;
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM