简体   繁体   English

如何将搜索字词捆绑到更有效的查询中?

[英]How can I bundle search terms into more efficient queries?

I need to convert a list of search terms into the most efficient set of combined search terms. 我需要将搜索词列表转换为最有效的组合搜索词集。 Any word or quoted phrase can be separated by an OR. 任何单词或引用的短语都可以用OR分隔。 Many terms can be combined within parentheses. 许多术语可以在括号内组合。 ANDs can also be used. 也可以使用AND。

For example, foo bar and boo bar share bar , so instead of two different search terms, the can be combined as (foo OR boo) AND bar . 例如, foo barboo bar共享bar ,因此可以组合为(foo OR boo) AND bar而不是两个不同的搜索词。

Here's what the algorithm needs to do. 这是算法需要做的事情。 Given this data set: 鉴于此数据集:

foo bar
boo bar
goo bar
hoo doo
foo manchu
moo bar
too bar
foo fighters
"blue kazoo" bar
baz
qux
quux

I want to get the following back: 我想得到以下回复:

(foo OR boo OR goo OR moo OR too OR "blue kazoo") AND bar
foo AND (manchu OR fighters)
hoo doo
baz OR qux OR quux

This does not work: 这不起作用:

(foo bar) OR (boo bar) OR (goo bar) OR (foo manchu)

I'll be working in PHP, but I'll take the answer in pseudo-code, PHP or I'll convert from major languages. 我将使用PHP,但我会用伪代码,PHP或我将从主要语言转换答案。

I got the following code: 我得到以下代码:

function keyMultiSort(&$array,
                      $key,
                      $reverse = false,
                      $priority_last = false,
                      $save_key = true,
                      Callable $func = null)
{
    if ($func === null)
    {
        $func = function ($first, $second) use ($key, $reverse, $priority_last)
        {
            if (!isset($first[$key]))
            {
                return ($reverse === false) ? -1 : 1;
            }
            if (!isset($second[$key]))
            {
                return ($reverse === false) ? 1 : -1;
            }

            if ($first[$key] > $second[$key])
            {
                return ($reverse === false) ? 1 : -1;
            }
            if ($first[$key] < $second[$key])
            {
                return ($reverse === false) ? -1 : 1;
            }
            if ($first[$key] === $second[$key])
            {
                return ($priority_last === false) ? 1 : -1;
            }

            return 0;
        };
    }

    if ($save_key)
    {
        uasort($array, $func);
    }
    else
    {
        usort($array, $func);
    }
}

$array = [
    ['foo', 'bar'],
    ['boo', 'bar'],
    ['goo', 'bar'],
    ['hoo', 'doo'],
    ['foo', 'manchu'],
    ['moo', 'bar'],
    ['too', 'bar'],
    ['foo', 'fighters'],
    ['blue kazoo', 'bar'],
];

$pairs = [];
$str = '';
foreach($array as $item)
{
    if(!isset($pairs[$item[0]]['count']))
    {
        $pairs[$item[0]]['count'] = 1;
    }
    else
    {
        $pairs[$item[0]]['count']++;
    }
    $pairs[$item[0]]['elements'][] = $item[1];

    if(!isset($pairs[$item[1]]['count']))
    {
        $pairs[$item[1]]['count'] = 1;
    }
    else
    {
        $pairs[$item[1]]['count']++;
    }
    $pairs[$item[1]]['elements'][] = $item[0];
    keyMultiSort($pairs, 'count', true);
}

$remove = [];
foreach($pairs as $elm=>$item)
{
    $remove[] = $elm;
    $elements = array_diff($item['elements'], $remove);
    if(empty($elements))
    {
        if (in_array($elm, $remove))
        {
            continue;
        }
        $str .= $elm.PHP_EOL;
    }
    else
    {
        $str .= $elm.' AND ('.implode(' OR ', $elements).')'.PHP_EOL;
    }
    $remove = array_merge($remove, $elements);
}
var_dump($str);

Result: 结果:

string(99) "bar AND (foo OR boo OR goo OR moo OR too OR blue kazoo)
foo AND (manchu OR fighters)
hoo AND (doo)
"

It can be optimized, depending on the objectives... 它可以根据目标进行优化......

I understand the logic but you really need to make the question clearer. 我理解逻辑,但你真的需要让问题更清晰。

Anyway, I see this as a graph problem where we want to find the set of nodes that are have highest degree and can span the whole graph. 无论如何,我认为这是一个图形问题,我们希望找到具有最高度数并可以跨越整个图形的节点集。

在此输入图像描述

I believe if you picture it this way, you can use any data structure you like to serve the purpose. 我相信如果你以这种方式描绘它,你可以使用你喜欢的任何数据结构来达到目的。 You could create an adjacency list and then find nodes with higher degree and then check to see if all elements are covered through those nodes. 您可以创建邻接列表,然后查找具有更高度数的节点,然后检查是否所有元素都通过这些节点覆盖。 The matter of adding AND, OR is just simple afterwards. 添加AND,OR的问题之后就很简单了。

Code for processing more than 2 values 用于处理2个以上值的代码

<?php
function keyMultiSort(&$array,
                      $key,
                      $reverse = false,
                      $priority_last = false,
                      $save_key = true,
                      Callable $func = null)
{
    if ($func === null)
    {
        $func = function ($first, $second) use ($key, $reverse, $priority_last)
        {
            if (!isset($first[$key]))
            {
                return ($reverse === false) ? -1 : 1;
            }
            if (!isset($second[$key]))
            {
                return ($reverse === false) ? 1 : -1;
            }

            if ($first[$key] > $second[$key])
            {
                return ($reverse === false) ? 1 : -1;
            }
            if ($first[$key] < $second[$key])
            {
                return ($reverse === false) ? -1 : 1;
            }
            if ($first[$key] === $second[$key])
            {
                return ($priority_last === false) ? 1 : -1;
            }

            return 0;
        };
    }

    if ($save_key)
    {
        uasort($array, $func);
    }
    else
    {
        usort($array, $func);
    }
}

$array = [
    ['foo', 'bar', 'test'],
    ['boo', 'bar'],
    ['goo', 'bar'],
    ['hoo', 'doo', 'test', 'test2'],
    ['foo', 'manchu'],
    ['moo', 'bar'],
    ['too', 'bar'],
    ['foo', 'fighters'],
    ['blue kazoo', 'bar', 'test'],
];

$pairs = [];
$str = '';
foreach($array as $item)
{
    foreach($item as $key=>$elm)
    {
        foreach($item as $key2=>$elm2)
        {
            if($key !== $key2)
            {
                if(!isset($pairs[$elm]['count']))
                {
                    $pairs[$elm]['count'] = 1;
                }
                else
                {
                    $pairs[$elm]['count']++;
                }
                $pairs[$elm]['elements'][] = $elm2;
            }
        }
    }

    keyMultiSort($pairs, 'count', true);
}
//var_dump($pairs);
$remove = [];
foreach($pairs as $elm=>$item)
{
    $remove[] = $elm;
    $elements = array_diff($item['elements'], $remove);
    if(empty($elements))
    {
        if (in_array($elm, $remove))
        {
            continue;
        }
        $str .= $elm.PHP_EOL;
    }
    else
    {
        $str .= $elm.' AND ('.implode(' OR ', array_unique($elements)).')'.PHP_EOL;
    }
}
var_dump($str);

Response: 响应:

string(184) "bar AND (foo OR test OR boo OR goo OR moo OR too OR blue kazoo)
test AND (foo OR hoo OR doo OR test2 OR blue kazoo)
foo AND (manchu OR fighters)
hoo AND (doo OR test2)
doo AND (test2)
"

PS I hope I have correctly understood the task ... PS我希望我已正确理解任务......

UPDATE Added code which not ignores "single values". 更新添加了不忽略“单个值”的代码。 I changed logic: 我改变了逻辑:

... ...

['"yellow balloon"', 'foo', 'bar', 'baz', 'qut'],

... ...

return: 返回:

... ...

qut AND ("yellow balloon" OR baz)
baz AND ("yellow balloon")

... ...

It seems to me, for this task, that's correct (to conditions to combine more than 2 values). 在我看来,对于这项任务来说,这是正确的(对于组合超过2个值的条件)。

function keyMultiSort(&$array,
                      $key,
                      $reverse = false,
                      $priority_last = false,
                      $save_key = true,
                      Callable $func = null)
{
    if ($func === null)
    {
        $func = function ($first, $second) use ($key, $reverse, $priority_last)
        {
            if (!isset($first[$key]))
            {
                return ($reverse === false) ? -1 : 1;
            }
            if (!isset($second[$key]))
            {
                return ($reverse === false) ? 1 : -1;
            }

            if ($first[$key] > $second[$key])
            {
                return ($reverse === false) ? 1 : -1;
            }
            if ($first[$key] < $second[$key])
            {
                return ($reverse === false) ? -1 : 1;
            }
            if ($first[$key] === $second[$key])
            {
                return ($priority_last === false) ? 1 : -1;
            }

            return 0;
        };
    }

    if ($save_key)
    {
        uasort($array, $func);
    }
    else
    {
        usort($array, $func);
    }
}

$array = [
    ['foo', 'bar', 'test'],
    ['boo', 'bar'],
    ['goo', 'bar'],
    ['hoo', 'doo', 'test', 'test2'],
    ['foo', 'manchu'],
    ['moo', 'bar'],
    ['too', 'bar'],
    ['foo', 'fighters'],
    ['"blue kazoo"', 'bar', 'test'],
    ['"red panda"', 'bar', 'test'],
    ['"yellow balloon"', 'foo', 'bar', 'baz', 'qut'],
    ['"red panda"', 'fighters', 'moo'],
    ['"foo fighters"'],
    ['foo'],
    ['bar'],
];

$pairs = [];
$singles = [];
$str = '';
foreach ($array as $item)
{
    foreach ($item as $key => $elm)
    {
        if(count($item) === 1)
        {
            $singles[$elm] = 1;
        }
        else
        {
            if (!isset($pairs[$elm]))
            {
                $pairs[$elm]['count'] = 0;
                $pairs[$elm]['elements'] = [];
            }
            foreach ($item as $key2 => $elm2)
            {
                if ($key !== $key2)
                {
                    $pairs[$elm]['count']++;
                    $pairs[$elm]['elements'][] = $elm2;
                }
            }
        }
    }

    keyMultiSort($pairs, 'count', true);
}
//var_dump($pairs);exit;
$remove = [];
foreach ($pairs as $elm => $item)
{
    $remove[] = $elm;
    $elements = array_diff($item['elements'], $remove);
    $elements = array_unique($elements);
    if (!empty($elements)){
        $str .= $elm.' AND ('.implode(' OR ', $elements).')'.PHP_EOL;
    }
}
foreach ($singles as $elm => $item)
{
    $str .= $elm.PHP_EOL;
}
var_dump($str);

Response: 响应:

string(421) "bar AND (foo OR test OR boo OR goo OR moo OR too OR "blue kazoo" OR "red panda" OR "yellow balloon" OR baz OR qut)
test AND (foo OR hoo OR doo OR test2 OR "blue kazoo" OR "red panda")
foo AND (manchu OR fighters OR "yellow balloon" OR baz OR qut)
"red panda" AND (fighters OR moo)
qut AND ("yellow balloon" OR baz)
baz AND ("yellow balloon")
test2 AND (hoo OR doo)
fighters AND (moo)
doo AND (hoo)
"foo fighters"
foo
bar
"

PS In my opinion, this problem does not apply to reality PS在我看来,这个问题并不适用于现实

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM