[英]Interview question: remove duplicates from an unsorted linked list
I'm reading Cracking the Coding Interview, Fourth Edition: 150 Programming Interview Questions and Solutions and I'm trying to solve the following question:我正在阅读破解编码面试,第四版:150 个编程面试问题和解决方案,我正在尝试解决以下问题:
2.1 Write code to remove duplicates from an unsorted linked list.
2.1 编写代码以从未排序的链表中删除重复项。 FOLLOW UP: How would you solve this problem if a temporary buffer is not allowed?
跟进:如果不允许使用临时缓冲区,您将如何解决此问题?
I'm solving it in C#, so I made my own Node
class:我正在用 C# 解决它,所以我创建了自己的
Node
类:
public class Node<T> where T : class
{
public Node<T> Next { get; set; }
public T Value { get; set; }
public Node(T value)
{
Next = null;
Value = value;
}
}
My solution is to iterate through the list, then for each node to iterated through the remainder of the list and remove any duplicates (note that I haven't actually compiled or tested this, as instructed by the book):我的解决方案是遍历列表,然后让每个节点遍历列表的其余部分并删除任何重复项(请注意,我没有按照本书的说明实际编译或测试过这个):
public void RemoveDuplicates(Node<T> head)
{
// Iterate through the list
Node<T> iter = head;
while(iter != null)
{
// Iterate to the remaining nodes in the list
Node<T> current = iter;
while(current!= null && current.Next != null)
{
if(iter.Value == current.Next.Value)
{
current.Next = current.Next.Next;
}
current = current.Next;
}
iter = iter.Next;
}
}
Here is the solution from the book (the author wrote it in java):这是书中的解决方案(作者用java编写):
Without a buffer, we can iterate with two pointers: “current” does a normal iteration, while “runner” iterates through all prior nodes to check for dups.
如果没有缓冲区,我们可以使用两个指针进行迭代:“current”进行正常迭代,而“runner”迭代所有先前的节点以检查重复。 Runner will only see one dup per node, because if there were multiple duplicates they would have been removed already.
Runner 每个节点只会看到一个副本,因为如果有多个副本,它们已经被删除了。
public static void deleteDups2(LinkedListNode head)
{
if (head == null) return;
LinkedListNode previous = head;
LinkedListNode current = previous.next;
while (current != null)
{
LinkedListNode runner = head;
while (runner != current) { // Check for earlier dups
if (runner.data == current.data)
{
LinkedListNode tmp = current.next; // remove current
previous.next = tmp;
current = tmp; // update current to next node
break; // all other dups have already been removed
}
runner = runner.next;
}
if (runner == current) { // current not updated - update now
previous = current;
current = current.next;
}
}
}
So my solution always looks for duplicates for the current node to the end, while their solution looks for duplicates from the head to the current node.所以我的解决方案总是寻找当前节点的重复到最后,而他们的解决方案寻找从头到当前节点的重复。 I feel like both solutions would suffer performance issues depending on how many duplicates there are in the list and how they're distributed (density and position).
我觉得这两种解决方案都会遇到性能问题,具体取决于列表中有多少重复项以及它们的分布方式(密度和位置)。 But in general: is my answer nearly as good as the one in the book or is it significantly worse?
但总的来说:我的答案几乎和书中的答案一样好还是明显更糟?
If you give a person a fish, they eat for a day. 如果你给一个人一条鱼,他们会吃一天。 If you teach a person to fish...
如果你教一个人钓鱼......
My measures for the quality of an implementation are: 我对实施质量的衡量标准是:
As for your implementation: 至于你的实施:
There's not much of a difference. 没有太大的区别。 If I've done my math right your's is on average N/16 slower than the authors but pleanty of cases exist where your implementation will be faster.
如果我的数学运算正确,你的平均N / 16比作者慢,但是你的实现速度会更快。
Edit: 编辑:
I'll call your implementation Y and the author's A 我将把你的实现Y和作者的A称为
Both proposed solutions has O(N^2) as worst case and they both have a best case of O(N) when all elements are the same value. 两种提出的解决方案都具有O(N ^ 2)作为最坏情况,并且当所有元素是相同值时它们都具有O(N)的最佳情况。
EDIT: This is a complete rewrite. 编辑:这是一个完整的重写。 Inspired by the debat in the comments I tried to find the average case for random N random numbers.
受到评论中的争议的启发,我试图找到随机N个随机数的平均情况。 That is a sequence with a random size and a random distribution.
这是具有随机大小和随机分布的序列。 What would the average case be.
平均情况是什么?
Y will always run U times where U is the number of unique numbers. Y将始终运行U次,其中U是唯一数字的数量。 For each iteration it will do NX comparisons where X is the number of elements removed prior to the iteration (+1).
对于每次迭代,它将进行NX比较,其中X是在迭代之前移除的元素的数量(+1)。 The first time no element will have been removed and on average on the second iteration N/U will have been removed.
第一次没有元素被移除,并且在第二次迭代时平均移除N / U.
That is on average ½N will been left to iterate. 这是平均½N将被重复。 We can express the average cost as U*½N.
我们可以将平均成本表示为U *½N。 The average U can be expressed based on N as well 0
平均U可以基于N表示,也可以表示0
Expressing A becomes more difficult. 表达A变得更加困难。 Let's say we use I iterations before we've encountered all unique values.
假设我们在遇到所有唯一值之前使用迭代。 After that will run between 1 and U comparisons (on average that's U/") and will do that NI times.
之后将在1和U之间进行比较(平均为U /“)并且将执行NI时间。
I*c+U/2(NI) 我* C + U / 2(NI)
but whats the average number of comparisons (c) we run for the first I iterations. 但是我们在第一次迭代中运行的平均比较次数(c)是多少? on average we need to compare against half of the elements already visited and on average we've visited I/2 elements, Ie.
平均而言,我们需要与已经访问过的元素的一半进行比较,平均而言我们已经访问了I / 2元素,即。 c=I/4
C = I / 4
I/4+U/2(NI). I / 4 + U / 2(NI)。
I can be expressed in terms of N. On average we'll need to visited half on N to find the unique values so I=N/2 yielding an average of 我可以用N表示。平均而言,我们需要在N上找到一半来找到唯一值,所以I = N / 2得到平均值
(I^2)/4+U/2(NI) which can be reduced to (3*N^2)/16. (I ^ 2)/ 4 + U / 2(NI)可以减少到(3 * N ^ 2)/ 16。
That is of course if my estimation of the averages are correct. 当然,如果我对平均值的估计是正确的。 That is on average for any potential sequence A has N/16 fewer comparisons than Y but pleanty of cases exists where Y is faster than A. So I'd say they are equal when compared to the number of comparisons
对于任何潜在序列来说,平均而言,A的比较比Y少了N / 16,但是在Y比A快的情况下存在很多情况。所以我认为它们与比较的数量相比是相等的。
How about using a HashMap? 使用HashMap怎么样? This way it will take O(n) time and O(n) space.
这样就需要O(n)时间和O(n)空间。 I will write psuedocode.
我会写psuedocode。
function removeDup(LinkedList list){
HashMap map = new HashMap();
for(i=0; i<list.length;i++)
if list.get(i) not in map
map.add(list.get(i))
else
list.remove(i)
end
end
end
Of course we assume that HashMap has O(1) read and write. 当然我们假设HashMap有O(1)读写。
Another solution is to use a mergesort and removes duplicate from start to end of the list. 另一种解决方案是使用mergesort并从列表的开头到结尾删除重复项。 This takes O(n log n)
这需要O(n log n)
mergesort is O(n log n) removing duplicate from a sorted list is O(n). mergesort是O(n log n),从排序列表中删除重复是O(n)。 do you know why?
你知道为什么吗? therefore the entire operation takes O(n log n)
因此整个操作需要O(n log n)
Here's the implementation using HashSet in O(n)
time. 这是在
O(n)
时间内使用HashSet的实现。
I have used a hashset to store unique values and 2 node-pointers to traverse through the linkedlist. 我使用了一个hashset来存储唯一值和两个节点指针来遍历链表。 If a duplicate is found, assign the value of current pointer to the previous pointer.
如果找到重复项,则将当前指针的值赋给前一个指针。
This will ensure removal of duplicate records. 这将确保删除重复记录。
/// <summary>
/// Write code to remove duplicates from an unsorted linked list.
/// </summary>
class RemoveDups<T>
{
private class Node
{
public Node Next;
public T Data;
public Node(T value)
{
this.Data = value;
}
}
private Node head = null;
public static void MainMethod()
{
RemoveDups<int> rd = new RemoveDups<int>();
rd.AddNode(15);
rd.AddNode(10);
rd.AddNode(15);
rd.AddNode(10);
rd.AddNode(10);
rd.AddNode(20);
rd.AddNode(30);
rd.AddNode(20);
rd.AddNode(30);
rd.AddNode(35);
rd.PrintNodes();
rd.RemoveDuplicates();
Console.WriteLine("Duplicates Removed!");
rd.PrintNodes();
}
private void RemoveDuplicates()
{
//use a hashtable to remove duplicates
HashSet<T> hs = new HashSet<T>();
Node current = head;
Node prev = null;
//loop through the linked list
while (current != null)
{
if (hs.Contains(current.Data))
{
//remove the duplicate record
prev.Next = current.Next;
}
else
{
//insert element into hashset
hs.Add(current.Data);
prev = current;
}
current = current.Next;
}
}
/// <summary>
/// Add Node at the beginning
/// </summary>
/// <param name="val"></param>
public void AddNode(T val)
{
Node newNode = new Node(val);
newNode.Data = val;
newNode.Next = head;
head = newNode;
}
/// <summary>
/// Print nodes
/// </summary>
public void PrintNodes()
{
Node current = head;
while (current != null)
{
Console.WriteLine(current.Data);
current = current.Next;
}
}
}
Heapsort is an in-place sort. Heapsort是一种就地排序。 You could modify the "siftUp" or "siftDown" function to simply remove the element if it encounters a parent that is equal.
您可以修改“siftUp”或“siftDown”函数,以便在遇到相等的父级时简单地删除该元素。 This would be O(n log n)
这将是O(n log n)
function siftUp(a, start, end) is
input: start represents the limit of how far up the heap to sift.
end is the node to sift up.
child := end
while child > start
parent := floor((child - 1) ÷ 2)
if a[parent] < a[child] then (out of max-heap order)
swap(a[parent], a[child])
child := parent (repeat to continue sifting up the parent now)
else if a[parent] == a[child] then
remove a[parent]
else
return
Code in java: java中的代码:
public static void dedup(Node head) {
Node cur = null;
HashSet encountered = new HashSet();
while (head != null) {
encountered.add(head.data);
cur = head;
while (cur.next != null) {
if (encountered.contains(cur.next.data)) {
cur.next = cur.next.next;
} else {
break;
}
}
head = cur.next;
}
}
Tried the same in cpp. 在cpp尝试过同样的事情。 Please let me know your comments on this.
请告诉我你对此的评论。
// ConsoleApplication2.cpp : Defines the entry point for the console application. // ConsoleApplication2.cpp:定义控制台应用程序的入口点。 //
//
#include "stdafx.h"
#include <stdlib.h>
struct node
{
int data;
struct node *next;
};
struct node *head = (node*)malloc(sizeof(node));
struct node *tail = (node*)malloc(sizeof(node));
struct node* createNode(int data)
{
struct node *newNode = (node*)malloc(sizeof(node));
newNode->data = data;
newNode->next = NULL;
head = newNode;
return newNode;
}
bool insertAfter(node * list, int data)
{
//case 1 - insert after head
struct node *newNode = (node*)malloc(sizeof(node));
if (!list)
{
newNode->data = data;
newNode->next = head;
head = newNode;
return true;
}
struct node * curpos = (node *)malloc(sizeof(node));
curpos = head;
//case 2- middle, tail of list
while (curpos)
{
if (curpos == list)
{
newNode->data = data;
if (curpos->next == NULL)
{
newNode->next = NULL;
tail = newNode;
}
else
{
newNode->next = curpos->next;
}
curpos->next = newNode;
return true;
}
curpos = curpos->next;
}
}
void deleteNode(node *runner, node * curr){
//DELETE AT TAIL
if (runner->next->next == NULL)
{
runner->next = NULL;
}
else//delete at middle
{
runner = runner->next->next;
curr->next = runner;
}
}
void removedups(node * list)
{
struct node * curr = (node*)malloc(sizeof(node));
struct node * runner = (node*)malloc(sizeof(node));
curr = head;
runner = curr;
while (curr != NULL){
runner = curr;
while (runner->next != NULL){
if (curr->data == runner->next->data){
deleteNode(runner, curr);
}
if (runner->next!=NULL)
runner = runner->next;
}
curr = curr->next;
}
}
int _tmain(int argc, _TCHAR* argv[])
{
struct node * list = (node*) malloc(sizeof(node));
list = createNode(1);
insertAfter(list,2);
insertAfter(list, 2);
insertAfter(list, 3);
removedups(list);
return 0;
}
Code in C: C中的代码:
void removeduplicates(N **r)
{
N *temp1=*r;
N *temp2=NULL;
N *temp3=NULL;
while(temp1->next!=NULL)
{
temp2=temp1;
while(temp2!=NULL)
{
temp3=temp2;
temp2=temp2->next;
if(temp2==NULL)
{
break;
}
if((temp2->data)==(temp1->data))
{
temp3->next=temp2->next;
free(temp2);
temp2=temp3;
printf("\na dup deleted");
}
}
temp1=temp1->next;
}
}
Here's the answer in C 这是C中的答案
void removeduplicates(N **r)
{
N *temp1=*r;
N *temp2=NULL;
N *temp3=NULL;
while(temp1->next!=NULL)
{
temp2=temp1;
while(temp2!=NULL)
{
temp3=temp2;
temp2=temp2->next;
if(temp2==NULL)
{
break;
}
if((temp2->data)==(temp1->data))
{
temp3->next=temp2->next;
free(temp2);
temp2=temp3;
printf("\na dup deleted");
}
}
temp1=temp1->next;
}
}
您的解决方案与作者一样好,只有它在实现中有错误:)尝试在具有相同数据的两个节点的列表上进行跟踪。
Your approach is simply specular to the book's! 你的方法只是本书的镜面! You go forward, the book goes backward.
你往前走,这本书倒退了。 There is no difference as both of you scan all elements.
没有区别,因为你们都扫描所有元素。 And, yes, since no buffer is allowed, there are performance issues.
并且,是的,因为不允许缓冲区,所以存在性能问题。 You usually don't have to mind about performance with such costrained questions and when it's not explicitly required.
您通常不必考虑使用此类经过培训的问题以及未明确要求的情况。
Interview questions are made to test your open mindness. 面试问题是为了测试你的开放思想。 I have doubts about Mark's answer: it definitely is the best solution in real-world examples, but even if these algorithms use constant space, the constraint that no temporary buffer is allowed must be respected.
我对马克的回答质疑:这绝对是真实世界的例子最好的解决办法,但即使这些算法使用恒定的空间, 没有临时缓冲区允许的约束必须得到尊重。
Otherwise, I guess that the book would have adopted such an approach. 否则,我想这本书会采用这种方法。 Mark, please forgive me for being critic against you.
马克,请原谅我批评你。
Anyway, just to go deeper in the matter, yours and the book's approach both require Theta(n^2)
time, while Mark's approach requires Theta(n logn) + Theta(n)
time, which results in Theta(n logn)
. 无论如何,只是为了更深入地解决这个问题,你和本书的方法都需要
Theta(n^2)
时间,而Mark的方法需要Theta(n logn) + Theta(n)
时间,这导致Theta(n logn)
。 Why Theta
? 为什么
Theta
? Because compare-swap algorithms are Omega(n logn)
too, remember! 因为比较交换算法也是
Omega(n logn)
,所以请记住!
C# Code for removing duplicates left after the first set of iteration: C#代码用于删除第一组迭代后留下的重复项:
public Node removeDuplicates(Node head)
{
if (head == null)
return head;
var current = head;
while (current != null)
{
if (current.next != null && current.data == current.next.data)
{
current.next = current.next.next;
}
else { current = current.next; }
}
return head;
}
Hacker Rank Day24:More Linked Lists,Removing duplicate Node in C#. Hacker Rank Day24:More Linked Lists,Removing duplicate Node in C#。
static Node RemoveDuplicateNode(Node head)
{
Node Link = head;
Node Previous;
Node DulicateNode;
int count = 0,temp;
while (Link != null)
{
temp = Link.data;
DulicateNode = Link;
Previous = Link;
while(DulicateNode != null)
{
if(DulicateNode.data==temp)
{
Previous.data = DulicateNode.data;
Previous.next = DulicateNode.next;
++count;
}
if(count>=2)
{
if(DulicateNode.next != null)
{
DulicateNode.data = DulicateNode.next.data;
DulicateNode.next = DulicateNode.next.next;
}
else
DulicateNode=null;
}
else
DulicateNode = DulicateNode.next;
}
count = 0;
Link = Link.next;
}
return head;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.