简体繁体 English

PHP库进行模糊搜索？

[英]PHP Library for fuzzy searching?

原文 2013-03-12 19:46:16 7 1 php

I'm looking for a PHP library that will allow users to basically enter commands to run by using real english. 我正在寻找一个PHP库，该库将允许用户基本上输入使用真实英语运行的命令。 Basically I want a user to be able to do something like: 基本上，我希望用户能够执行以下操作：

Please search for all users in Europe which would equate to a select * where users = 'Europe' Please search for all users in Europe等于“ select * where users = 'Europe'

Or another example of my intentions: 还是我意图的另一个示例：

Lookup (Find) the email address for John Smith

Note: It would also be nice if you could say for John Smith, Jane Smith, and John Doe 注意：如果您能说for John Smith, Jane Smith, and John Doe也很好

Ideally I'd like this library if it exists to be scalable so I can program in what needs to happen once certain things show up like find,email or search,users 理想情况下，如果该库具有可伸缩性，我希望它能够对某些事情（如find,email或search,users显示时需要发生的事情进行编程

Is anyone aware of a PHP Library that can do something like the above? 是否有人知道可以执行上述操作的PHP库？

1 个解决方案

As far as I know, currently there aren't libraries for doing a search based on natural language queries, neither in PHP nor in any other programming language (I assume you can't use IBM Watson :) ). 据我所知，目前尚没有基于PHP或任何其他编程语言的基于自然语言查询的搜索库（我假设您不能使用IBM Watson :)）。

I think the feasible approaches are a grammar-based parser and fuzzy search : 我认为可行的方法是基于语法的解析器和模糊搜索 ：

Using a parser generator like Jison you can parse and "understand" on the user browser all the statements corresponding to a generative grammar, sending to the server just the generated query or an intermediate representation. 使用像Jison这样的解析器生成器，您可以在用户浏览器上解析和“理解”与生成语法相对应的所有语句，仅将生成的查询或中间表示发送到服务器。

It's better than a PHP parser because the user can have an immediate feedback while typing and it's less frustrating than submit a form and get an error. 它比PHP解析器更好，因为用户在键入时可以立即得到反馈，并且比提交表单和得到错误要令人沮丧。 The query interpretation in this case would be 99% correct, but in many case a perfectly right (from the human point of view) query will be rejected because not be foreseen by the grammar. 在这种情况下，查询解释将是99％正确的，但是在很多情况下（从人的角度来看）完全正确的查询将被拒绝，因为语法没有预见到这种情况。

In the other case, you can do some pre-processing, like removing stop words, making text lowercase, stemming and so on, then search with a full text search engine (Lucene is probably the most powerful, but it's in Java). 在另一种情况下，您可以进行一些预处理，例如删除停用词，使文本变为小写字母，词干等等，然后使用全文本搜索引擎进行搜索（Lucene可能是最强大的，但是它在Java中）。 PostgreSQL support it and also MySQL has some full text search capabilities. PostgreSQL 支持它，MySQL还具有一些全文搜索功能。 It's also possible to build a primitive engine based on a basic RDBMS using index and tokenizing text on whitespaces and punctuation. 也可以基于基本RDBMS构建基本引擎，该引擎使用索引以及对空白和标点符号的文本进行标记化。

Which way depends on how diversified and noisy is your data and various the expected query. 哪种方式取决于您的数据的多样性和嘈杂程度以及各种预期的查询。 You can also try to implement an hybrid approach, that is, parse the text using the grammar and whether it fails use a full text search. 您还可以尝试实现一种混合方法，即使用语法分析文本，以及使用全文本搜索是否失败。