简体   繁体   English

用于从关键字字符串创建关键字列表的php算法

[英]php algorithm for creating a list of keywords from a string of keywords

I have a string of keywords to search for in the format: A,B+C,D+E,B+F,E+G+H,... 我有一个要搜索的关键字字符串,格式为:A,B + C,D + E,B + F,E + G + H,...
each letter represent a keyword, and the + is for when I would want all keywords to appear. 每个字母代表一个关键字,而+是我希望所有关键字都出现的时间。
There is no guaranteed order to the keywords. 关键字没有确定的顺序。
later I will search a DB for names that contain these keywords. 稍后,我将在数据库中搜索包含这些关键字的名称。
So following my example I will be interesting in names that contain: 因此,按照我的示例,我将对包含以下内容的名称感兴趣:
A or (B and C) or (D and E) or (B and F) or (E and G and H), etc. A或(B和C)或(D和E)或(B和F)或(E和G和H)等

The problem is that I can only query the DB for names that contain a single keyword (it is an API not my DB) so I need to make a list of keywords to retrieve names for, and then check each name to see if it contains all relevant keywords. 问题是我只能在数据库中查询包含单个关键字的名称(这是API而不是我的数据库),因此我需要列出关键字列表以检索其名称,然后检查每个名称以查看其是否包含所有相关关键字。

I need an algorithm to parse the string and create the list of keywords that will minimize the queries to the DB. 我需要一种算法来解析字符串并创建关键字列表,以最大程度地减少对数据库的查询。

For instance, in my example, I would get names for: 例如,在我的示例中,我将获得以下名称:
A 一种
B and check if they also contain C or F B并检查它们是否还包含C或F
E and check if they also contain D or (G and H) E,并检查它们是否还包含D或(G和H)

so the algorithm should create keywords list of A,B,E to query for, and also add to each one the keywords it should appear with in the name. 因此该算法应创建要查询的A,B,E关键字列表,并向每个关键字添加应与名称一起出现的关键字。

I am working in php so I prefer answers in php, but pseudo code will be fine as well. 我在php中工作,所以我更喜欢php中的答案,但是伪代码也可以。 I hope it is clear... 我希望很清楚...

Make empty array B.
Pass over the given array A, for each word X:
    If not apperars in B, add X to B as index, and set him value to 1/(No. of parts in this conjunction).
    Else, add 1/(No. of parts in this conjunction) to exist value.
Sort B by values, biggest first.
Make your queries from start to end.

The logic is rate by importance of word in all conjunction. 逻辑是所有单词按重要性排序。

A word that is alone is pretty important, but a word that appears four times with another word is more important. 一个单词很重要,但是一个单词与另一个单词出现四次更重要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM