简体   繁体   English

在大量字符串中搜索最匹配的最有效方法是什么?

[英]What is the most efficient method to search a large collection of strings for the closest match?

I have a large file (400K lines of English sentences) and need to be able to search and compare each sentence to an "input" string, which is also an English sentence. 我有一个大文件(400K行英语句子),需要能够搜索每个句子并将其与“输入”字符串进行比较,该字符串也是英语句子。 I'm not concerned of a memory footprint this application would have; 我并不担心该应用程序会占用多少内存; I'm looking for the fastest way to do this. 我正在寻找最快的方法。 Currently, I have it stored as a large list of strings, and the program iterates through them all, one at a time, and compares the hamiltonian distance of each string - the one that "matches" is the one with the shortest distance. 目前,我将其存储为一大串字符串,并且该程序一次一次遍历所有字符串,并比较每个字符串的汉密尔顿距离-“匹配”的字符串是距离最短的字符串。 Is there something faster than this? 有比这更快的东西吗?

The best data structure to use here is a tree. 此处使用的最佳数据结构是一棵树。 Because in a tree, or even a search-trie (it is really written like "trie") the runtime is definitely smaller than that of a list. 因为在树上,甚至在搜索尝试中(它的确写成“ trie”),运行时间肯定比列表的运行时间小。 You could use the java implementation of TreeSet, or write yourself an own implementation of a tree. 您可以使用TreeSet的Java实现,也可以编写自己的树实现。 A search-trie or a prefix tree is a search tree, where every node of the tree is a character. 搜索树或前缀树是搜索树,其中树的每个节点都是一个字符。 A small example: you can find the image of the tree at the link https://i.stack.imgur.com/pmVCl.png 一个小例子: 您可以在链接https://i.stack.imgur.com/pmVCl.png中找到树的图像

In this case, if you want to find/match the word "app", you need only 3 iterations in the whole tree-data structure. 在这种情况下,如果要查找/匹配单词“ app”,则整个树数据结构中仅需要3次迭代。 This is the most efficient way I know. 这是我所知道的最有效的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 什么是在映射中存储大量字符串的最节省内存的方法? - What is the most memory efficient method of storing a large number of Strings in a map? 存储和搜索大量字符串的最有效方法 - Most efficient way to store and search through a large number of strings 确定鼠标最接近元素的最有效方法是什么? - What is the most efficient way to determine the closest element to a mouse? 搜索前3个数字的最快捷,最有效的方法? - Quickest and most efficient method to search top 3 numbers? 编写此方法最有效的方法是什么? - What's the most efficient way to write this method? 在另一个字符串中搜索字符串数组的最有效方法 - The most efficient way to search for an array of strings in another string 迭代 java 集合的性能有效方法是什么? - what is the performance efficient method for iterate java collection? 在此集合上执行文本替换的最有效方法是什么? - What would be the most efficient way of performing text substitution on this collection? 什么数组/集合对象对“包含”函数最有效? - What array/collection object is most efficient for a “contains” function? 用于提取随机值的最佳/最有效的 Java 集合是什么? - What is the best/most efficient Java collection to use to pull a random value?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM