[英]Grouping array based on the key value similarity
假设我有一个像这样的数组:
$data[0]['name'] = 'product 1 brandX';
$data[0]['id_product'] = '77777777';
$data[1]['name'] = 'brandX product 1';
$data[1]['id_product'] = '77777777';
$data[2]['name'] = 'brandX product 1 RED';
$data[2]['id_product'] = '77777777';
$data[3]['name'] = 'product 1 brandX';
$data[3]['id_product'] = '';
$data[4]['name'] = 'product 2 brandY';
$data[4]['id_product'] = '8888888';
$data[5]['name'] = 'product 2 brandY RED';
$data[5]['id_product'] = '';
我试图按它们的相似性(名称或id_product)对它们进行分组。
那将是预期的最终数组:
$uniques[0]['name'] = 'product 1 brandX'; //The smallest name for the product
$uniques[0]['count'] = 4; //Entry which has all the words of the smallest name or the same id_product
$uniques[0]['name'] = 'product 2 brandY';
$uniques[0]['count'] = 2;
到目前为止,这就是我尝试过的:
foreach ($data as $t) {
if (!isset($uniques[$t['id_product']]['name']) || mb_strlen($uniques[$t['id_product']]['name']) > mb_strlen($t['name'])) {
$uniques[$t['id_product']]['name'] = $t['name'];
$uniques[$t['id_product']]['count']++;
}
}
但是我不能基于id_product,因为有时它会是同一产品,但是一个将具有id,而另一个将没有。 我也必须检查名称,但无法完成。
我认为这不会解决您的问题,但可能会让您再次前进
$data = [];
$data[0]['name'] = 'product 1 brandX';
$data[0]['id_product'] = '77777777';
$data[1]['name'] = 'brandX product 1';
$data[1]['id_product'] = '77777777';
$data[2]['name'] = 'brandX product 1 RED';
$data[2]['id_product'] = '77777777';
$data[3]['name'] = 'product 1 brandX';
$data[3]['id_product'] = '';
$data[4]['name'] = 'product 2 brandY';
$data[4]['id_product'] = '8888888';
$data[5]['name'] = 'product 2 brandY RED';
$data[5]['id_product'] = '';
$data = collect($data);
$tallies = [
'brand_x' => 0,
'brand_y' => 0,
'other' => 0
];
$unique = $data->unique(function ($item) use (&$tallies){
switch(true){
case(strpos($item['name'], 'brandX') !== false):
$tallies['brand_x']++;
return 'product X';
break;
case(strpos($item['name'], 'brandY') !== false):
$tallies['brand_y']++;
return 'product Y';
break;
default:
$tallies['other']++;
return 'other';
break;
}
});
print_r($unique);
print_r($tallies);
我认为解决此问题的最佳方法是使用唯一的product_id
,但如果要通过在名称字段中查找相似性来创建唯一键,则可以使用preg_split将名称转换为数组,然后使用array_diff查找差异数组。 如果2个名称的差异计数小于2,则认为它们是唯一的。我创建此函数,它以$arr
返回相似的名称,如果未找到则返回false
:
function get_similare_key($arr, $name) {
$names = preg_split("/\s+/", $name);
// get similaire key from $arr
foreach( $arr as $key => $value ) {
$key_names = preg_split("/\s+/", $key);
$diff = array_diff($key_names, $names);
if ( count($diff) <= 1 ) {
return $key;
}
}
return false;
}
这是一个工作演示在这里
我的答案基于关于产品应如何分组的两个假设:
尽管id_product
可能会丢失,但它存在的位置正确且足以匹配两个产品; 和
要使两个产品名称匹配,最长的name
(单词最多的名称)必须包含最短name
(单词最少的name
)中的所有单词。
根据这些假设,下面是一个函数,用于确定两个单独的产品是否匹配(产品应分组在一起),以及一个辅助函数,用于从名称中获取单词:
function productsMatch(array $product1, array $product2)
{
if (
!empty($product1['id_product'])
&& !empty($product2['id_product'])
&& $product1['id_product'] === $product2['id_product']
) {
// match based on id_product
return true;
}
$words1 = getWordsFromProduct($product1);
$words2 = getWordsFromProduct($product2);
$min_word_count = min(count($words1), count($words2));
$match_word_count = count(array_intersect_key($words1, $words2));
if ($min_word_count >= 1 && $match_word_count === $min_word_count) {
// match based on name similarity
return true;
}
// no match
return false;
}
function getWordsFromProduct(array $product)
{
$name = mb_strtolower($product['name']);
preg_match_all('/\S+/', $name, $matches);
$words = array_flip($matches[0]);
return $words;
}
此功能可用于对产品进行分组:
function groupProducts(array $data)
{
$groups = array();
foreach ($data as $product1) {
foreach ($groups as $key => $products) {
foreach ($products as $product2) {
if (productsMatch($product1, $product2)) {
$groups[$key][] = $product1;
continue 3; // foreach ($data as $product1)
}
}
}
$groups[] = array($product1);
}
return $groups;
}
然后可以使用此函数来提取最短名称并为每个组计数:
function uniqueProducts(array $groups)
{
$uniques = array();
foreach ($groups as $products) {
$shortest_name = '';
$shortest_length = PHP_INT_MAX;
$count = 0;
foreach ($products as $product) {
$length = mb_strlen($product['name']);
if ($length < $shortest_length) {
$shortest_name = $product['name'];
$shortest_length = $length;
}
$count++;
}
$uniques[] = array(
'name' => $shortest_name,
'count' => $count,
);
}
return $uniques;
}
因此,结合所有4个功能,您可以获得如下所示的唯一性(使用php 5.6测试):
$data[0]['name'] = 'product 1 brandX';
$data[0]['id_product'] = '77777777';
$data[1]['name'] = 'brandX product 1';
$data[1]['id_product'] = '77777777';
$data[2]['name'] = 'brandX product 1 RED';
$data[2]['id_product'] = '77777777';
$data[3]['name'] = 'product 1 brandX';
$data[3]['id_product'] = '';
$data[4]['name'] = 'product 2 brandY';
$data[4]['id_product'] = '8888888';
$data[5]['name'] = 'product 2 brandY RED';
$data[5]['id_product'] = '';
$groups = groupProducts($data);
$uniques = uniqueProducts($groups);
var_dump($uniques);
给出输出:
array(2) {
[0]=>
array(2) {
["name"]=>
string(16) "product 1 brandX"
["count"]=>
int(4)
}
[1]=>
array(2) {
["name"]=>
string(16) "product 2 brandY"
["count"]=>
int(2)
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.