简体   繁体   中英

How to find the closest matching array

Customers of a furniture store website can select products and add them to a "style book". Each product belongs to a "style". The furniture store has some stylists that each have made their own style book that represents their style and expertise. I want to be able to find the stylist that best matches a customer's stylebook. For each style book I have a count of the number of products per style.

$stylists = [
    'Nanda'     => [
        'Design'  => 20,
        'Retro'   => 0,
        'Rustiek' => 0,
    ],
    'Angelique' => [
        'Design'  => 0,
        'Retro'   => 20,
        'Rustiek' => 0,
    ],
    'Lissy'     => [
        'Design'  => 10,
        'Retro'   => 10,
        'Rustiek' => 0,
    ],
];

The same for the customer's style book:

$customer = [
    'Design'  => 15,
    'Retro'   => 10,
    'Rustiek' => 0,
];

In this case Lissy should be the best match.

The number of products isn't important since this depends on how active the stylist is. More important is that the stylist matches most of the customer's styles. For example:

'Stylist'     => [
    'Design'  => 10,
    'Retro'   => 10,
    'Rustiek' => 0,
]

Should still be a better match than

'Stylist'     => [
    'Design'  => 300,
    'Retro'   => 0,
    'Rustiek' => 180,
]

I have tried giving the stylists' style books scores and percentages based on the order of importance of the customer's style book but still I don't get the best match a 100% of the times. Google also didn't get me anywhere.

As we have already discussed, the problem with your model is, that it relies on the number of products. But what we need is an indicator of the style the stylist is using. In other words we eliminate the count and replace it with a relatively weighted indicator (percentages in this case). For example a stylist with a product portfolio of:

[
    style1 => 30,
    style2 => 10,
    style3 => 5
]

The product count is 45 = 30 + 10 + 5 this will result in a style-profile like this:

[
    style1 => 0.66,
    style2 => 0.22,
    style3 => 0.11
]

To match the stylist-style-profile with the client-style-profile we need to do the same thing for the client-stylebook [15, 10, 0] :

[
    style1 => 0.60
    style2 => 0.40
    style3 => 0.00
]

The idea behind this is, that we rate how a stylist is influenced by a certain style and the outcome will probably be quite similar for the product that we want to find the best fitting stylist to.

If the stylist made products in a style that is not really what we need for the match, we rate this fact with the weighted relative factor eg 0.11. It is not that important, but we still acknowledge the fact that the design might be somewhat biased.

Therefore, if a stylist has a lot of products with a certain style that we are not looking for, it won't change the outcome as much.

Please let me know, if this helps and if you want to change anything. From here we could also implement other options and rules.

Below you find my RatingModel.

<?php

class RatingModel {
    private $name;
    private $preferences;
    private $preferencesWeighted;

    public function RatingModel($name, array $preferences) {
        $this->name = $name;
        $this->preferences = $preferences;
        $this->init();
    }

    private function init() {
        $total = 0;
        foreach ($this->preferences as $value) {
            $total += $value;
        }
        if ($total > 0) {
            foreach ($this->preferences as $value) {
                $this->preferencesWeighted[] = $value / $total;
            }
        } else {
            $this->preferencesWeighted = array_fill(0, sizeof($this->preferences), 0);
        }
    }

    public function getName() {
        return $this->name;
    }

    public function getPreferences() {
        return $this->preferences;
    }

    public function getPreferencesWeighted() {
        return $this->preferencesWeighted;
    }

    public function distanceToModel($ratingModel) {
        $delta = [];
        for ($i = 0; $i < sizeof($this->preferencesWeighted); $i++) {
            $delta[] = abs($this->preferencesWeighted[$i] - $ratingModel->getPreferencesWeighted()[$i]);
        }
        return $delta;
    }

    public function scoreToModel($ratingModel) {
        $distanceToModel = $this->distanceToModel($ratingModel);
        $score = [];
        foreach ($distanceToModel as $value) {
            $score[] = $value * $value;
        }
        return sqrt(array_sum($score));
    }
}

$customer = new RatingModel('Customer', [15, 10, 0]);
$nanda = new RatingModel('Nanda', [20, 0, 0]);
$angelique = new RatingModel('Angelique', [0, 20, 0]);
$lissy = new RatingModel('Lissy', [10, 0, 0]);
$mary = new RatingModel('Mary', [0, 0, 0]);
$max = new RatingModel('Max', [12, 0, 5]);
$simon = new RatingModel('Simon', [17, 2, 5]);
$manuel = new RatingModel('Manuel', [17, 8, 10]);
$betty = new RatingModel('Betty', [16, 9, 5]);
$sally = new RatingModel('Sally', [15, 10, 4]);
$peter = new RatingModel('Peter', [16, 9, 1]);

$stylists = [$nanda, $angelique, $lissy, $mary, $max, $simon, $manuel, $betty, $peter, $sally];

$relativeToClient = [];
foreach ($stylists as $stylist) {
    $relativeToClient[] = [
        'stylist' => $stylist->getName(),
        'distance' => $stylist->distanceToModel($customer),
        'score' => $stylist->scoreToModel($customer)
    ];
}

echo '<pre>';
print_r($stylists);
echo '<hr>';
print_r($customer);
echo '<hr>';
print_r($relativeToClient);
echo '<hr>from best fit to worst (low score means low delta)<hr>';
$results = array_column($relativeToClient, 'score', 'stylist');
asort($results);
print_r($results);
echo '</pre>';

Right below are the results (lower values are better):

Array
(
    [Peter] => 0.067936622048676
    [Sally] => 0.1700528000819
    [Betty] => 0.20548046676563
    [Manuel] => 0.35225222874108
    [Simon] => 0.3942292057505
    [Max] => 0.50765762377392
    [Nanda] => 0.56568542494924
    [Lissy] => 0.56568542494924
    [Mary] => 0.7211102550928
    [Angelique] => 0.84852813742386
)

If we look at the two best fitting stylists we notice, that Peter wins over Sally, because Sally has more Products with a different style.

Sally: [15, 10, 4]
Peter: [16, 9, 1]

You may also notice, that Nanda and Lissy have the same score:

Nanda: [20, 0, 0]
Lissy: [10, 0, 0]

// relatively, for both => [1.00, 0.00, 0.00]

They are both regarded equally fitting. Nanda has 5 products more and Lissy has 5 products less of the first style, but it does not matter, because they both only supply one style and this it what matters: How far they are away from the ideal which is the customer-style.

You could also implement the logic so that you have no bias factor and be more strict when it comes to the comparison. In this case you may want to exclude some of the params.

Eg just comparing [15, 10] and [16, 9] - in this case Sally would actually win, because she has no delta to the customer when it comes to preferences:

Sally:

[
    style1 => 0.60,
    style2 => 0.40
]

Peter:

[
    style1 => 0.64,
    style2 => 0.36
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM