面试题：从未排序的链表中删除重复项

Question

我正在阅读破解编码面试，第四版：150 个编程面试问题和解决方案，我正在尝试解决以下问题：

2.1 编写代码以从未排序的链表中删除重复项。 跟进：如果不允许使用临时缓冲区，您将如何解决此问题？

我正在用 C# 解决它，所以我创建了自己的Node类：

public class Node<T> where T : class
{
    public Node<T> Next { get; set; }
    public T Value { get; set; }

    public Node(T value)
    {
        Next = null;
        Value = value;
    }
}

我的解决方案是遍历列表，然后让每个节点遍历列表的其余部分并删除任何重复项（请注意，我没有按照本书的说明实际编译或测试过这个）：

public void RemoveDuplicates(Node<T> head)
{
    // Iterate through the list
    Node<T> iter = head;
    while(iter != null)
    {
        // Iterate to the remaining nodes in the list
        Node<T> current = iter;
        while(current!= null && current.Next != null)
        {
            if(iter.Value == current.Next.Value)
            {
                current.Next = current.Next.Next;
            }

            current = current.Next;
        }    

        iter = iter.Next;
    }
}

这是书中的解决方案（作者用java编写）：

如果没有缓冲区，我们可以使用两个指针进行迭代：“current”进行正常迭代，而“runner”迭代所有先前的节点以检查重复。 Runner 每个节点只会看到一个副本，因为如果有多个副本，它们已经被删除了。

public static void deleteDups2(LinkedListNode head) 
{
    if (head == null) return;

    LinkedListNode previous = head;
    LinkedListNode current = previous.next;

    while (current != null) 
    {
        LinkedListNode runner = head;

        while (runner != current) { // Check for earlier dups
            if (runner.data == current.data) 
            {
                LinkedListNode tmp = current.next; // remove current
                previous.next = tmp;
                current = tmp; // update current to next node
                break; // all other dups have already been removed
            }
            runner = runner.next;
        }
        if (runner == current) { // current not updated - update now
            previous = current;
            current = current.next;
        }
    }
}

所以我的解决方案总是寻找当前节点的重复到最后，而他们的解决方案寻找从头到当前节点的重复。 我觉得这两种解决方案都会遇到性能问题，具体取决于列表中有多少重复项以及它们的分布方式（密度和位置）。 但总的来说：我的答案几乎和书中的答案一样好还是明显更糟？

Answer 1

如果你给一个人一条鱼，他们会吃一天。 如果你教一个人钓鱼......

我对实施质量的衡量标准是：

正确性 ：如果你在所有情况下都没有得到正确答案，那么它还没有准备好
可读性/可维护性 ：查看代码重复，可理解的名称，每个块/方法的代码行数（以及每个块执行的操作数），以及跟踪代码流的难度。 如果您想了解更多相关信息，请查看任何数量的书籍，重点关注重构，编程最佳实践，编码标准等。
理论性能 （最坏情况和最重要的）： Big-O是您可以使用的指标。 应该测量CPU和内存消耗
复杂性 ：估计一般的专业程序员如何实施（如果他们已经知道算法）。 看看这是否符合实际问题的难度

至于你的实施：

正确性 ：我建议编写单元测试来自己确定和/或从头到尾调试它（在纸上）有趣的样本/边缘情况。 空，一项，两项，各种重复项等
可读性/可维护性 ：虽然你的最后两条评论没有添加任何内容，但它看起来很好。 你的代码比书中的代码更明显
表现：我相信两者都是N平方。 无论摊销成本是否较低，我都会让你弄明白:)
实施时间 ：普通专业人员应该能够在睡眠中对此算法进行编码，因此看起来很好

Answer 2

没有太大的区别。 如果我的数学运算正确，你的平均N / 16比作者慢，但是你的实现速度会更快。

编辑：

我将把你的实现Y和作者的A称为

两种提出的解决方案都具有O（N ^ 2）作为最坏情况，并且当所有元素是相同值时它们都具有O（N）的最佳情况。

编辑：这是一个完整的重写。 受到评论中的争议的启发，我试图找到随机N个随机数的平均情况。 这是具有随机大小和随机分布的序列。 平均情况是什么？

Y将始终运行U次，其中U是唯一数字的数量。 对于每次迭代，它将进行NX比较，其中X是在迭代之前移除的元素的数量（+1）。 第一次没有元素被移除，并且在第二次迭代时平均移除N / U.

这是平均½N将被重复。 我们可以将平均成本表示为U *½N。 平均U可以基于N表示，也可以表示0

表达A变得更加困难。 假设我们在遇到所有唯一值之前使用迭代。 之后将在1和U之间进行比较（平均为U /“）并且将执行NI时间。

我* C + U / 2（NI）

但是我们在第一次迭代中运行的平均比较次数（c）是多少？ 平均而言，我们需要与已经访问过的元素的一半进行比较，平均而言我们已经访问了I / 2元素，即。 C = I / 4

I / 4 + U / 2（NI）。

我可以用N表示。平均而言，我们需要在N上找到一半来找到唯一值，所以I = N / 2得到平均值

（I ^ 2）/ 4 + U / 2（NI）可以减少到（3 * N ^ 2）/ 16。

当然，如果我对平均值的估计是正确的。 对于任何潜在序列来说，平均而言，A的比较比Y少了N / 16，但是在Y比A快的情况下存在很多情况。所以我认为它们与比较的数量相比是相等的。

Answer 3

使用HashMap怎么样？ 这样就需要O（n）时间和O（n）空间。 我会写psuedocode。

function removeDup(LinkedList list){
  HashMap map = new HashMap();
  for(i=0; i<list.length;i++)
      if list.get(i) not in map
        map.add(list.get(i))
      else
        list.remove(i)
      end
  end
end

当然我们假设HashMap有O（1）读写。

另一种解决方案是使用mergesort并从列表的开头到结尾删除重复项。 这需要O（n log n）

mergesort是O（n log n），从排序列表中删除重复是O（n）。 你知道为什么吗？ 因此整个操作需要O（n log n）

Answer 4

这是在O(n)时间内使用HashSet的实现。

我使用了一个hashset来存储唯一值和两个节点指针来遍历链表。 如果找到重复项，则将当前指针的值赋给前一个指针。

这将确保删除重复记录。

    /// <summary>
    /// Write code to remove duplicates from an unsorted linked list.
    /// </summary>
    class RemoveDups<T>
    {
        private class Node
        {
            public Node Next;
            public T Data;
            public Node(T value)
            {
                this.Data = value;
            }
        }

        private Node head = null;

        public static void MainMethod()
        {
            RemoveDups<int> rd = new RemoveDups<int>();
            rd.AddNode(15);
            rd.AddNode(10);
            rd.AddNode(15);
            rd.AddNode(10);
            rd.AddNode(10);
            rd.AddNode(20);
            rd.AddNode(30);
            rd.AddNode(20);
            rd.AddNode(30);
            rd.AddNode(35);

            rd.PrintNodes();
            rd.RemoveDuplicates();

            Console.WriteLine("Duplicates Removed!");
            rd.PrintNodes();
        }

        private void RemoveDuplicates()
        {
            //use a hashtable to remove duplicates
            HashSet<T> hs = new HashSet<T>();
            Node current = head;
            Node prev = null;

            //loop through the linked list
            while (current != null)
            {
                if (hs.Contains(current.Data))
                {
                    //remove the duplicate record
                    prev.Next = current.Next;
                }
                else
                {
                    //insert element into hashset
                    hs.Add(current.Data);
                    prev = current;
                }
                current = current.Next;

            }
        }

        /// <summary>
        /// Add Node at the beginning
        /// </summary>
        /// <param name="val"></param>
        public void AddNode(T val)
        {
            Node newNode = new Node(val);
            newNode.Data = val;
            newNode.Next = head;
            head = newNode;
        }

        /// <summary>
        /// Print nodes
        /// </summary>
        public void PrintNodes()
        {
            Node current = head;
            while (current != null)
            {
                Console.WriteLine(current.Data);
                current = current.Next;
            }
        }
    }

Answer 5

Heapsort是一种就地排序。 您可以修改“siftUp”或“siftDown”函数，以便在遇到相等的父级时简单地删除该元素。 这将是O（n log n）

function siftUp(a, start, end) is
 input:  start represents the limit of how far up the heap to sift.
               end is the node to sift up.
 child := end 
 while child > start
     parent := floor((child - 1) ÷ 2)
     if a[parent] < a[child] then (out of max-heap order)
         swap(a[parent], a[child])
         child := parent (repeat to continue sifting up the parent now)
     else if a[parent] == a[child] then
         remove a[parent]
     else
         return

Answer 6

java中的代码：

public static void dedup(Node head) {
    Node cur = null;
    HashSet encountered = new HashSet();

    while (head != null) {
        encountered.add(head.data);
        cur = head;
        while (cur.next != null) {
            if (encountered.contains(cur.next.data)) {
                cur.next = cur.next.next;
            } else {
                break;
            }
        }
        head = cur.next;
    }
}

Answer 7

在cpp尝试过同样的事情。 请告诉我你对此的评论。

// ConsoleApplication2.cpp：定义控制台应用程序的入口点。 //

#include "stdafx.h"
#include <stdlib.h>
struct node
{
    int data;
    struct node *next;
};
struct node *head = (node*)malloc(sizeof(node));
struct node *tail = (node*)malloc(sizeof(node));

struct node* createNode(int data)
{
    struct node *newNode = (node*)malloc(sizeof(node));
    newNode->data = data;
    newNode->next = NULL;
    head = newNode;
    return newNode;
}

bool insertAfter(node * list, int data)
{
    //case 1 - insert after head
    struct node *newNode = (node*)malloc(sizeof(node));
    if (!list)
    {

        newNode->data = data;
        newNode->next = head;
        head = newNode;
        return true;
    }

    struct node * curpos = (node *)malloc(sizeof(node));
    curpos = head;
    //case 2- middle, tail of list
    while (curpos)
    {
        if (curpos == list)
        {
            newNode->data = data;
            if (curpos->next == NULL)
            {
            newNode->next = NULL;
            tail = newNode;
            }
            else
            {
                newNode->next = curpos->next;
            }
            curpos->next = newNode;
            return true;
        }
        curpos = curpos->next;
    }
}

void deleteNode(node *runner, node * curr){

    //DELETE AT TAIL
    if (runner->next->next == NULL)
    {
        runner->next = NULL;        
    }
    else//delete at middle
    {
        runner = runner->next->next;
        curr->next = runner;
    }
    }


void removedups(node * list)
{
    struct node * curr = (node*)malloc(sizeof(node));
    struct node * runner = (node*)malloc(sizeof(node));
    curr = head;
    runner = curr;
    while (curr != NULL){
        runner = curr;
        while (runner->next != NULL){
            if (curr->data == runner->next->data){
                deleteNode(runner, curr);
            }
            if (runner->next!=NULL)
            runner = runner->next;
        }
        curr = curr->next;
    }
}
int _tmain(int argc, _TCHAR* argv[])
{
    struct node * list = (node*) malloc(sizeof(node));
    list = createNode(1);
    insertAfter(list,2);
    insertAfter(list, 2);
    insertAfter(list, 3);   
    removedups(list);
    return 0;
}

Answer 8

C中的代码：

    void removeduplicates(N **r)
    {
        N *temp1=*r;
        N *temp2=NULL;
        N *temp3=NULL;
        while(temp1->next!=NULL)
        {
            temp2=temp1;
            while(temp2!=NULL)
            {
                temp3=temp2;
                temp2=temp2->next;
                if(temp2==NULL)
                {
                    break;
                }
                if((temp2->data)==(temp1->data))
                {
                    temp3->next=temp2->next;
                    free(temp2);
                    temp2=temp3;
                    printf("\na dup deleted");
                }
            }
            temp1=temp1->next;
        }

    }

Answer 9

这是C中的答案

    void removeduplicates(N **r)
    {
        N *temp1=*r;
        N *temp2=NULL;
        N *temp3=NULL;
        while(temp1->next!=NULL)
        {
            temp2=temp1;
            while(temp2!=NULL)
            {
                temp3=temp2;
                temp2=temp2->next;
                if(temp2==NULL)
                {
                    break;
                }
                if((temp2->data)==(temp1->data))
                {
                    temp3->next=temp2->next;
                    free(temp2);
                    temp2=temp3;
                    printf("\na dup deleted");
                }
            }
            temp1=temp1->next;
        }

    }

Answer 10

您的解决方案与作者一样好，只有它在实现中有错误:)尝试在具有相同数据的两个节点的列表上进行跟踪。

Answer 11

你的方法只是本书的镜面！ 你往前走，这本书倒退了。 没有区别，因为你们都扫描所有元素。 并且，是的，因为不允许缓冲区，所以存在性能问题。 您通常不必考虑使用此类经过培训的问题以及未明确要求的情况。

面试问题是为了测试你的开放思想。 我对马克的回答质疑：这绝对是真实世界的例子最好的解决办法，但即使这些算法使用恒定的空间，没有临时缓冲区允许的约束必须得到尊重。

否则，我想这本书会采用这种方法。 马克，请原谅我批评你。

无论如何，只是为了更深入地解决这个问题，你和本书的方法都需要Theta(n^2)时间，而Mark的方法需要Theta(n logn) + Theta(n)时间，这导致Theta(n logn) 。 为什么Theta ？ 因为比较交换算法也是Omega(n logn) ，所以请记住！

Answer 12

C＃代码用于删除第一组迭代后留下的重复项：

 public Node removeDuplicates(Node head) 
    {
        if (head == null)
            return head;

        var current = head;
        while (current != null)
        {
            if (current.next != null && current.data == current.next.data)
            {
                current.next = current.next.next;
            }
            else { current = current.next; }
        }

        return head;
    }

Answer 13

Hacker Rank Day24:More Linked Lists,Removing duplicate Node in C#。

static Node RemoveDuplicateNode(Node head)
    {
        Node Link = head;
        Node Previous;
        Node DulicateNode;
        int count = 0,temp;
        while (Link != null)
        {
            temp = Link.data;
            DulicateNode = Link;
            Previous = Link;
            while(DulicateNode != null)
            {
                if(DulicateNode.data==temp)
                {
                    Previous.data = DulicateNode.data;
                    Previous.next = DulicateNode.next;
                    ++count;
                }
                if(count>=2)
                {
                   if(DulicateNode.next != null)
                    {
                        DulicateNode.data = DulicateNode.next.data;
                        DulicateNode.next = DulicateNode.next.next;
                    }
                   else
                        DulicateNode=null;
                }
                else
                DulicateNode = DulicateNode.next;
            }
            count = 0;
            Link = Link.next;
        }
    
    
        return head;
    }

面试题：从未排序的链表中删除重复项

问题描述

13 个解决方案

解决方案1
9 已采纳 2010-12-27 23:46:06

解决方案2
4 2010-12-27 23:42:14

解决方案3
3 2010-12-28 10:04:26

解决方案4
2 2019-08-22 15:57:56

解决方案5
1

解决方案6
0 2012-09-13 20:33:41

解决方案7
0 2013-10-09 19:43:45

解决方案8
0 2013-10-20 18:31:32

解决方案9
0 2013-10-20 18:36:52

解决方案10
0 2010-12-27 23:41:00

解决方案11
0 2010-12-27 23:41:23

解决方案12
0 2018-07-31 15:01:53

解决方案13
0 2022-01-21 11:07:03

面试题：从未排序的链表中删除重复项

问题描述

13 个解决方案

解决方案1 9 已采纳 2010-12-27 23:46:06

解决方案2 4 2010-12-27 23:42:14

解决方案3 3 2010-12-28 10:04:26

解决方案4 2 2019-08-22 15:57:56

解决方案5 1

解决方案6 0 2012-09-13 20:33:41

解决方案7 0 2013-10-09 19:43:45

解决方案8 0 2013-10-20 18:31:32

解决方案9 0 2013-10-20 18:36:52

解决方案10 0 2010-12-27 23:41:00

解决方案11 0 2010-12-27 23:41:23

解决方案12 0 2018-07-31 15:01:53

解决方案13 0 2022-01-21 11:07:03

解决方案1
9 已采纳 2010-12-27 23:46:06

解决方案2
4 2010-12-27 23:42:14

解决方案3
3 2010-12-28 10:04:26

解决方案4
2 2019-08-22 15:57:56

解决方案5
1

解决方案6
0 2012-09-13 20:33:41

解决方案7
0 2013-10-09 19:43:45

解决方案8
0 2013-10-20 18:31:32

解决方案9
0 2013-10-20 18:36:52

解决方案10
0 2010-12-27 23:41:00

解决方案11
0 2010-12-27 23:41:23

解决方案12
0 2018-07-31 15:01:53

解决方案13
0 2022-01-21 11:07:03