简体   繁体   English

如何辨别列表中的哪些对象与相同类型的另一个对象最相似?

[英]How can I tell which objects in a list have the most in common with another object of the same type?

What I'd like to do is have a collection of objects with properties, and pass in an object to act as a query template. 我想做的是具有属性的对象的集合,并传入一个对象以充当查询模板。 How can I sort or prioritize the objects whose property values have the most in common with a given input object of the same type? 如何对属性值与相同类型的给定输入对象最相同的对象进行排序或区分优先级?

More details: 更多细节:

        List<A> myList = new List<A>() {new A() {b="x"},
                                        new A() {c="r"},
                                        new A() {b="x",c="r"},};

        var myTemplate = new A() {b = "x", c="r"};

I'd like this example to match on the third item, but in the case where property c is null or "f" , it should return the first and third item. 我希望此示例在第三项上进行匹配,但是在属性cnull"f"的情况下,它应返回第一项和第三项。 If property c is "r" , but b is null or "f" , it should return the second and third item, because they match on c . 如果属性c"r" ,但b is null"f" ,则应返回第二和第三项,因为它们在c匹配。

You'll basically have to come up with a formula for determining how similar the two objects are. 您基本上必须提出一个公式来确定两个对象的相似程度。 Pick a weight for each property and then use simple comparison to say whether that property should be counted as the same. 为每个属性选择一个权重,然后使用简单比较说出该属性是否应计为相同属性。 Fuzzy matching of some type could be used, though that is going to be more complex. 可以使用某种类型的模糊匹配,尽管这将变得更加复杂。

Something simple could be: 一些简单的事情可能是:

public byte Similarity(SomeType other)
{
    byte similarity = 0;
    if (this.Property1 == other.Property1)
        similarity += 25;
    if (this.Property2 == other.Property2)
        similarity += 13;
    if (this.Property3 == other.Property3)
        similarity += 12;
    if (SomeFuzzyComparisonReturnsVerySimilar(this.Property4, other.Property4))
        similarity += 50;
    return similarity;
}

That is a simple method that I am defining to return a number from 0 to 100; 这是我定义的一种简单方法,它返回0到100之间的一个数字。 100 being the same and 0 being totally different. 100相同,0完全不同。

Once you have that, it is a fairly simple matter to select out the items that are similar enough for you to consider; 一旦有了这些,选择足够相似的项目供您考虑就很简单了。 eg: 例如:

var similarObjects = ListOfSomeTypes.Where(s => s.Similarity(templateObject) > 75);

Or to sort them: 或对它们进行排序:

var sortedBySimilarity = ListOfSomeTypes.OrderByDescending(s => s.Similarity(templateObject));

Ultimately though my point is that you have to come up with your own definition of "having the most in common with", once you have that the rest will probably be pretty easy. 最终,尽管我的观点是您必须提出自己的“具有最大的共同点”定义,但一旦有了,其余的可能就很容易了。 Not that coming up with that will necessarily be easy. 并非没有必然要那么容易。

With the additional details in your question, a possible formula would be: 考虑到您的问题的其他详细信息,可能的公式为:

public byte Similarity(A other)
{
    byte similarity = 0;
    if (this.b == null | other.b == null)
        similarity += 25;
    else if (this.b == other.b)
        similarity += 50;
    if (this.c == null | other.c == null)
        similarity += 25;
    else if (this.c == other.c)
        similarity += 50;
    return similarity;
}

This weights exact matches highest, null values in one object slightly less, and differences not at all. 此权重与一个对象中的最高,空值精确匹配,但比完全少一些,而差异完全没有。

I've done a ton of fuzzy matching over huge data sets, and there are lots of scenarios to consider. 我已经对大量数据进行了大量的模糊匹配,并且有很多情况需要考虑。 You seem to be approaching a simple or generic case, and for those cases without lots of data involved some kind of general string distance comparisons seem appropriate. 您似乎正在处理一种简单或通用的情况,对于那些没有大量数据的情况,某种一般的字符串距离比较似乎是合适的。

If performance matters, my best advice is "know your data." 如果性能很重要,我最好的建议是“了解您的数据”。 Write your own scoring, as suggested above. 如上所述,写下您自己的得分。

Having said that, we use Levenshtein distance for fuzzy string matching. 话虽如此,我们使用Levenshtein距离进行模糊字符串匹配。 It is very non-specific in terms of the "distance" between two strings, so it may or may not be appropriate for a given problem. 就两个字符串之间的“距离”而言,它是非常不确定的,因此它可能适合也可能不适用于给定的问题。 Here is a quick copy/paste of the algorithm in C#. 这是C#中算法的快速复制/粘贴。 It ports to most languages very easily. 它很容易移植到大多数语言。 This will throw an exception on null inputs, so be sure to add your own special case handling as you see fit. 这将在空输入上引发异常,因此请确保添加自己认为合适的特殊情况处理。

public static int LevenshteinDistance(string s, string t)
{
    var sLen = s.Length;
    var tLen = t.Length;

    var d = new int[sLen + 1, tLen + 1];

    for (var i = 0; i <= sLen; d[i, 0] = i++) { }
    for (var j = 0; j <= tLen; d[0, j] = j++) { }

    for (var i = 1; i <= sLen; i++)
    {
        for (var j = 1; j <= tLen; j++)
        {
            var cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
            d[i, j] = Math.Min(
                Math.Min(d[i - 1, j] + 1,   // a deletion
                d[i, j - 1] + 1),           // an insertion
                d[i - 1, j - 1] + cost);    // a substitution
        }
    }

    return d[sLen, tLen];
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果我有2个对象对象及其类型,我该如何获取它们的值? - If I have 2 object objects and their Type how can I get their value? linq lambda将具有相同类型的对象列表的对象转换为另一个对象 - linq lambda convert object with list of objects of the same type to another object 我已经声明了一个 Session object,其中包含一个对象列表,但无法遍历它 - I have declared a Session object which contains a List of objects but can't iterate through it 如何对具有相同成员的不同类型参数使用相同的函数? - How can I use the same function for different type parameters which have the same member? 如何判断 COM object 与哪个线程相关联? (STA) - How can I tell which thread a COM object is associated with? (STA) 如何告诉一个类应该创建哪些对象? 类型与对象混淆:( - How to tell a class which objects it should create? Type vs. object confusion :( 如何使用Fluent NHibernate映射包含与父类型相同类型的实体的List? - How can I map a List which contains entities of same type as of parent type using Fluent NHibernate? 如何将对象列表传递给无法更改对象属性的方法? - How can i pass a list of objects to a method which can not change the property of an object? 我如何允许C#添加另一种类型的结构以在其中添加或添加相同类型的列表? - how i can allow C# to add another type of structure to add within or add list of same type? C# - 如何使用引用实现一组接口的任何对象的类型? - C# - How can I have an type that references any object which implements a set of Interfaces?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM