简体繁体 English

“经典算法”的真实世界实现

[英]Real world implementations of “classical algorithms”

原文 2008-11-30 12:21:30 8 12 algorithm/ language-agnostic

I wonder how many of you have implemented one of computer science's " classical algorithms " like Dijkstra's algorithm or data structures (eg binary search trees) in a real world , not academic project? 我想知道有多少人在现实世界中实现了计算机科学的“ 经典算法 ”，如Dijkstra算法或数据结构 （例如二叉搜索树），而不是学术项目？

Is there a benefit to our dayjobs in knowing these algorithms and data structures when there are tons of libraries, frameworks and APIs which give you the same functionality? 当有大量的库，框架和API为您提供相同的功能时，了解这些算法和数据结构对我们的日常工作有益吗？

12 个解决方案

Is there a benefit to our dayjobs in knowing these algorithms and data structures when there are tons of libraries, frameworks and APIs which give you the same functionality? 当有大量的库，框架和API为您提供相同的功能时，了解这些算法和数据结构对我们的日常工作有益吗？

The library doesn't know what your problem domain is and won't be able to chose the correct algorithm to do the job. 该库不知道您的问题域是什么，并且无法选择正确的算法来完成这项工作。 That is why I think it is important to know about them: then YOU can make the correct choice of algorithms to solve YOUR problem. 这就是我认为了解它们很重要的原因：然后你可以正确选择算法来解决你的问题。

Is there a benefit to understanding your tools, rather than simply knowing that they exist? 理解您的工具是否有益处，而不仅仅是知道它们存在？

Yes, of course there is. 是的，当然有。 Taking a trivial example, don't you think there's a benefit to knowing what the difference is List (or your language's equivalent dynamic array implementation) and LinkedList (or your language's equivalent)? 举一个简单的例子，你不认为知道List（或你的语言的等效动态数组实现）和LinkedList（或你的语言等价物）有什么区别是有好处的吗？ It's pretty important to know that one has constant random access time, while the other is linear. 知道一个人具有恒定的随机访问时间，而另一个是线性的，这一点非常重要。 And one requires N copies if you insert a value in the middle of the sequence, while the other can do it in constant time. 如果在序列的中间插入一个值，则需要N个副本，而另一个可以在恒定时间内完成。

Don't you think there's an advantage to understanding that the same sorting algorithm isn't always optimal? 您是否认为理解相同的排序算法并不总是最优的？ That for almost-sorted data, quicksort sucks, for example? 例如，对于几乎排序的数据，快速排序很糟糕？ Naively just calling Sort() and hoping for the best can become ridiculously expensive if you don't understand what's happening under the hood. 如果你不明白引擎盖下发生了什么，天真地只是调用Sort（）并希望最好的可能会变得非常昂贵。

Of course there are a lot of algorithms you probably won't need, but even so, just understanding how they work may make it easier for yourself to come up with efficient algorithms to solve other, unrelated, problems. 当然，你可能不需要很多算法，但即便如此，只要了解它们的工作方式，就可以让自己更容易找到有效的算法来解决其他无关的问题。

Knowing, or being able to understand these algorithms is important, these are the tools of your trade. 了解或能够理解这些算法非常重要，这些都是您交易的工具。 It does not mean you have to be able to implement A* in an hour from memory. 这并不意味着你必须能够在一小时内从内存中实现A *。 But you should be able to figure out what the advantages of using a red-black tree as opposed to a normal unbalanced tree are so you can decide if you need it or not. 但是你应该能够弄清楚使用红黑树而不是普通的不平衡树的优点是什么，所以你可以决定你是否需要它。 You need to be able to judge the fitness of an algorithm for solving your problem. 您需要能够判断算法的适用性以解决您的问题。

This might sound too school-masterish but these "classical algorithms" were not invented to give college students exam questions, they were invented to solve problems or improve on current solutions, just like the array, the linked list or the stack are building blocks to write a program so are some of these. 这可能听起来像学校一样，但这些“经典算法”并不是为了给大学生提供考试问题而发明的，它们是为解决问题或改进现有解决方案而发明的，就像数组，链表或堆栈是构建块一样。写一个程序，其中一些。 Just like in math where you move from addition and subtraction to integration and differentiation, these are advanced techniques that will help you solve problems that are out there. 就像在数学中你从加法和减法转向整合和区分一样，这些都是先进的技术，可以帮助你解决那里的问题。

They might not be directly applicable to your problems or work situation but in the long run knowing of them will help you as a professional software engineer. 它们可能不会直接适用于您的问题或工作情况，但从长远来看，了解它们将有助于您作为专业软件工程师。

To answer your question, I did an implementation of A* recently for a game. 为了回答你的问题，我最近为游戏实施了A *。

Well, someone has to write the libraries. 好吧，有人必须编写库。 While working at a mapping software company, I implemented Dijkstra's, as well as binary search trees, b-trees, n-ary trees, bk-trees and hidden markov models. 在地图软件公司工作时，我实现了Dijkstra，以及二叉搜索树，b树，n-ary树，bk-trees和隐藏的马尔可夫模型。

Besides, if all you want is a single 'well known' algorithm, and you also want the freedom to specialise it and optimise it if it becomes critical to performance, including a whole library seems like a poor choice. 此外，如果您想要的只是一个“众所周知”的算法，并且您还希望自由专业化并优化它，如果它对性能至关重要，包括整个库似乎是一个糟糕的选择。

In my previous workplace, which was an EDA company, we implemented versions of Prim and Dijsktra's algorithms, disjoint set data structures, A* search and more. 在我之前的EDA公司工作场所，我们实现了Prim和Dijsktra算法的版本，不相交的集合数据结构，A *搜索等等。 All of these had real world significance. 所有这些都具有现实世界的意义。 I believe this is dependent on problem domain - some domains are more algorithm-intensive and some less so. 我认为这取决于问题域 - 一些域更加算法密集，而另一些域则更少。

Having said that, there is a fine line to walk - I see no business reason for re-implementing STL or Java Generics . 话虽如此，有一条很好的路要走 - 我认为重新实现STL或Java Generics没有商业理由。 In many cases, a standard library is better than "inventing a wheel". 在许多情况下，标准库比“发明轮子”更好。 The more you are near your core application, the more it may be necessary to implement a textbook algorithm or data structure. 您在核心应用程序附近的越多，实现教科书算法或数据结构的可能性就越大。

我们使用来自Knuth SemiNumeric的p随机数生成器的本地实现作为一些统计处理的辅助

If you never work with performance-critical code, consider yourself lucky. 如果您从未使用性能关键代码，请认为自己很幸运。 However, I consider this scenario unrealistic. 但是，我认为这种情况不切实际。 Performance problems could occur anywhere. 性能问题可能发生在任 And then it's necessary to know how to fix that problem. 然后有必要知道如何解决这个问题。 Obviously, merely knowing a few algorithm names isn't enough here – unless you want to implement them all and try them out one after the other. 显然，仅仅知道一些算法名称在这里是不够的 - 除非你想要全部实现它们并一个接一个地尝试它们。

No, knowing (at least some of) the inner workings of different algorithms is important for gauging their strengths and weaknesses and for analyzing how they would handle your situation. 不，了解（至少某些）不同算法的内部运作对于衡量他们的优势和劣势以及分析他们如何处理您的情况非常重要。

Obviously, if there's a library already implementing exactly what you need, you're incredibly lucky. 显然，如果有一个图书馆已经完全实现了你所需要的东西，那你就非常幸运。 But let's face it, even if there is such a library, using it is often not completely straightforward (at the very least, interfaces and data representation often have to be adapted) so it's still good to know what to expect. 但是让我们面对它，即使有这样的库，使用它通常也不是完全直接的（至少，接口和数据表示通常必须进行调整）所以知道期待什么仍然是好的。

A* for a pac man clone. A *为pac人克隆。 It took me weeks to really get but to this day I consider it a thing of beauty. 我花了好几个星期才真正开始，但直到今天我认为这是一件美丽的事情。

I've had to implement some of the classical algorithms from numerical analysis. 我不得不从数值分析中实现一些经典算法。 It was easier to write my own than to connect to an existing library. 编写自己的文件比连接现有库更容易。 Also, I've had to write variations on classical algorithms because the textbook case didn't fit my application. 此外，我不得不在经典算法上编写变体，因为教科书案例不适合我的应用程序。

For classical data structures, I nearly always use the standard libraries, such as STL for C++. 对于经典数据结构，我几乎总是使用标准库，例如STL for C ++。 The one time recently when I thought STL didn't have the structure I needed (a heap) I rolled my own, only to have someone point out almost immediately that I didn't need to do that. 最近有一次，当我认为STL没有我需要的结构（一堆）时，我自己滚动，只是让某人几乎立即指出我不需要那样做。

Classical algorithms I have used in actual work: 我在实际工作中使用的经典算法：

A topological sort 拓扑排序
A red-black tree (although I will confess that I only had to implement insertions for that application and it only got used in a prototype). 一棵红黑树（虽然我承认我只需要为该应用程序实现插入，它只用于原型）。 This got used to implement an 'ordered dict' type structure in Python. 这习惯于在Python中实现'ordered dict'类型结构。
A priority queue 优先级队列
State machines of various sorts 各种状态的机器
Probably one or two others I can't remember. 可能还有一两个我不记得了。

As to the second part of the question: 至于问题的第二部分：

An understanding of how the algorithms work, their complexity and semantics gets used on a fairly regular basis. 理解算法如何工作，它们的复杂性和语义得到了相当规律的使用。 They also inform the design of systems. 它们还告知系统的设计。 Occasionally one has to do things involving parsing or protocol handling, or some computation that's slightly clever. 有时，人们必须做一些涉及解析或协议处理的事情，或者一些稍微聪明的计算。 Having a working knowledge of what the algorithms do, how they work, how expensive they are and where one might find them lying around in library code goes a long way to knowing how to avoid reinventing the wheel poorly. 掌握算法的作用，它们如何工作，它们有多昂贵以及在图书馆代码中找到它们的位置的工作知识对于知道如何避免重新发明轮子很有帮助。

Classical algorithms are usually associated with something glamorous, like games, or Web search, or scientific computation. 经典算法通常与迷人的东西相关联，如游戏，网络搜索或科学计算。 However, I had to use some of the classical algorithms for a mere enterprise application. 但是，我不得不将一些经典算法用于纯粹的企业应用程序。

I was building a metadata migration tool, and I had to use topological sort for dependency resolution, various forms of graph traversals for queries on metadata, and a modified variation of Tarjan's union-find datastructure to partition forest-like structured metadata to trees. 我正在构建元数据迁移工具，我不得不使用拓扑排序来进行依赖项解析，使用各种形式的图遍历来查询元数据，以及修改Tarjan的union-find数据结构变体以将类似森林的结构化元数据划分为树。

That was a really satisfying experience. 这是一次非常令人满意的体验。 Most of those algorithms were implemented before, but their implementations lacked something that I would need for my task. 大多数算法之前都已实现过，但是它们的实现缺少我的任务所需要的东西。 That's why It's important to understand their internals. 这就是理解他们内部的重要性的原因。

I use the Levenshtein distance algorithm to help implement a 'Did you mean [suggested word] ?' 我使用Levenshtein距离算法帮助实现'你的意思是[建议的单词] ？' feature in our website search. 在我们的网站搜索功能。

Works quite well when combined with our 'tagging' system, which allows us to associate extra words (other than those in title/description/etc) with items in the database. 当与我们的'标记'系统结合使用时，它可以很好地工作，这允许我们将额外的单词（标题/描述/等之外的单词）与数据库中的项相关联。 \\ \\

It's not perfect by any means, but it's way better than most corporate site searches, if I don't say so myself ; 它无论如何都不是完美的，但如果我自己也不这样说的话，它会比大多数公司网站搜索更好; ) ）