[英]Compare two lists or arrays of arbitrary length in C#; order is important
说我有两个列表或字符串数组。 例如:
清单1:“ a”,“ c”,“ b”,“ d”,“ f”,“ e”
清单2:“ a”,“ d”,“ e”,“ f”,“ h”
列表1和列表2具有任意长度。 列表1可能包含不在列表2中的元素,反之亦然。
我想知道何时在列表2中找到列表1中的项目,更具体地说,我想知道何时在列表2中找到列表1中的项目,但发现顺序与在列表2中找到的顺序不同。 ,相对于列表2中的项目。(希望下面的示例阐明此声明)。
例如,在两个列表中都找到“ a”,并且“ a”是两个列表中的第一项。 因此,到目前为止一切正常。 “ c”和“ b”仅在第一列表中找到,因此可以忽略。 “ h”仅在第二个列表中找到,因此也可以忽略。
在第一和第二列表中都找到“ d”。 在原始列表的“ a”(第一项)之后找到它。 即使第一个列表中的位置与第二个列表中的位置不同,也可以,因为它的相对顺序在两个列表中是相同的(第二个匹配在列表之间是相同的)。
在上面的示例中,“ f”和“ e”在列表1中的顺序为“错误”,因为在第二个列表中,“ e”出现在“ f”之前。 因此,我想在第一个列表中报告“ e”和“ f”的顺序错误。 我该怎么做?
解决方案应该在C#中。 谢谢!
string[] list1 = {"a", "c", "b", "d", "f", "e"};
string[] list2 = {"a", "d", "e", "f", "h"};
int i = 0;
var list1a = list1.Intersect(list2).Select(l=> new { item = l, order= i++});
int j = 0;
var list2a = list2.Intersect(list1).Select(l=> new { item = l, order= j++});
var r = from l1 in list1a join l2 in list2a on l1.order equals l2.order
where l1.item != l2.item
select new {result=string.Format("position {1} is item [{0}] on list1 but its [{2}] in list2", l1.item, l1.order, l2.item )};
r.Dump();
位置2是list1上的项目[f],但它是list2上的[e]
位置3是list1上的[e]项目,但在list2中是[f]项目
Levenshtein距离? 我认为您已经接受的第一个解决方案存在缺陷。 即使插入了一件小东西,它也会告诉您一切都乱七八糟:
string[] list1 = { "a", "c", "b", "d", "j", "e", "f" };
string[] list2 = { "a", "d", "e", "f", "h", "j" };
表示j,e,f乱序,因为插入了j。
这指出了您面临的问题。 对于什么是乱序的问题,有多种解决方案,甚至不止一种最佳解决方案。 是J顺序还是e和f? 他们都乱了吗? 有一种称为Levenshtein距离算法的东西,它找到从集合A开始到集合B结束所需的最少插入和删除操作。有多种最佳解决方案,这里只是找到其中一种。
下面的算法正确输出了在list1中插入的j和e,f的移位,但顺序仍然正确。
using System;
using System.Collections.Generic;
using System.Text;
using System.Collections;
using Math = System.Math;
namespace LevCompareLists {
class Program {
static void Main(string[] args) {
string[] list1 = { "a", "c", "b", "d", "j", "e", "f" };
string[] list2 = { "a", "d", "e", "f", "h", "j" };
int?[] aMap21 = Levenshtein(list2, list1);
int?[] aMap12 = Levenshtein(list1, list2);
}
public static int?[] Levenshtein(string[] Src, String[] Dst) {
// this finds a minimum difference solution of inserts and deletes that maps Src to Dst
// it returns the map from the perspective of Dst, i.e.:
// each element of the return array contains the Src index of the corresponging element in B
// a null value means the element in B was inserted, and never existed in A
//
// Given A = {"a", "c", "b", "d", "j", "e", "f"}
// B = {"a", "d", "e", "f", "h", "j"};
//
// Levenshtein(B, A):
// a c b d j e f <-- A
// 0 1 2 3 <-- aMap
// a d e f h j <-- B
//
// Levenshtein(A, B):
// a d e f h j <-- B
// 0 3 5 6 <-- aMap
// a c b d j e f <-- A
//
// see: http://en.wikipedia.org/wiki/Levenshtein_distance
int cSrc = Src.Length; //length of s
int cDst = Dst.Length; //length of t
if (cSrc == 0 || cDst == 0) return null;
//**** create the Levenshtein matrix
// it has at 1 extra element in each dimension to contain the edges
int[,] aLev = new int[cSrc + 1, cDst + 1]; // the matrix
int iSrc, iDst;
// Load the horizontal and vertical edges
for (iSrc = 0; iSrc <= cSrc; aLev[iSrc, 0] = iSrc++) ;
for (iDst = 0; iDst <= cDst; aLev[0, iDst] = iDst++) ;
// load the interior
for (iSrc = 1; iSrc <= cSrc; iSrc++)
for (iDst = 1; iDst <= cDst; iDst++)
aLev[iSrc, iDst] = Math.Min(Math.Min(aLev[iSrc - 1, iDst] + 1, aLev[iSrc, iDst - 1] + 1),
aLev[iSrc - 1, iDst - 1] + ((Dst[iDst - 1] == Src[iSrc - 1]) ? 0 : 2));
DumpLevMatrix(aLev, Src, Dst); // Debug
//**** create the return map, using the Levenshtein matrix
int?[] aMap = new int?[cDst]; // this is the return map
iSrc = cSrc; // start in lower right corner of the Levenshtein matrix
iDst = cDst; // start in lower right corner of the Levenshtein matrix
// work backwards to pick best solution
while ((iSrc >= 0) || (iDst >= 0)) {
if ((iSrc > 0) && (iDst > 0)) {
// enter here if iSrc and iDst are in the lev matrix and not on its edge
int nCur = aLev[iSrc, iDst];
int nIns = nCur - aLev[iSrc, iDst - 1]; // if move along B to find match, it was an insert
int nDel = nCur - aLev[iSrc - 1, iDst]; // if move along A to find match, it was a deletion
if (nIns == 1) // this char was NOT in A, but was inserted into B
iDst--; // Leave map of B[j] to nowher, scan to previous B (--j)
else if (nDel == 1) // this char was in A, but is missing in B
iSrc--; // Don't map any B, scan to previous A (--i)
else // Match
aMap[iDst-- - 1] = iSrc-- - 1; // After map B[j] to A[i], scan to prev A,B (--i, --j)
} else {
if (iDst > 0) // remaining chars are inserts, Leave map of B[j] to nowher, scan to previous B (--j)
iDst--;
else if (iSrc > 0) // Delete to the end, deletes do nothing
iSrc--;
else
break;
}
}
DumpMap(aMap, Dst); // Debug
return aMap;
}
// just for debugging
static void DumpLevMatrix(int[,] aLev, string[] Src, string[] Dst) {
StringBuilder sb = new StringBuilder();
int cSrc = Src.Length;
int cDst = Dst.Length;
int iSrc, iDst;
sb.Length = 6;
for (iDst = 0; iDst < cDst; ++iDst)
sb.AppendFormat("{0,-3}", Dst[iDst]);
Console.WriteLine(sb.ToString());
for (iSrc = 0; iSrc <= cSrc; ++iSrc) {
if (iSrc == 0)
sb.Length = 3;
else {
sb.Length = 0;
sb.AppendFormat("{0,-3}", Src[iSrc - 1]);
}
for (iDst = 0; iDst <= cDst; ++iDst)
sb.AppendFormat("{0:00}", aLev[iSrc, iDst]).Append(" ");
Console.WriteLine(sb.ToString());
}
}
// just for debugging
static void DumpMap(int?[] aMap, string[] Dst) {
StringBuilder sb = new StringBuilder();
for (int iMap = 0; iMap < aMap.Length; ++iMap)
sb.AppendFormat("{0,-3}", Dst[iMap]); // dst and map are same size
Console.WriteLine(sb.ToString());
sb.Length = 0;
for (int iMap = 0; iMap < aMap.Length; ++iMap)
if (aMap[iMap] == null)
sb.Append(" ");
else
sb.AppendFormat("{0:00}", aMap[iMap]).Append(" ");
Console.WriteLine(sb.ToString());
}
}
}
我没有代码,但这应该是两个主要步骤:
也许它甚至有助于清理两个列表。 然后,您可以为每个列表使用一个指针,将其设置为第一项,并增加它们直到不匹配为止。
怎么样
list1.Intersect(list2).SequenceEquals(list2.Intersect(list1))
这个怎么样:
string[] list1 = { "a", "c", "b", "d", "f", "e" };
string[] list2 = { "a", "d", "e", "f", "h" };
var indexedList1 = list1.Select((x, i) => new
{
Index = i,
Item = x
});
var indexedList2 = list2.Select((x, i) => new
{
Index = i,
Item = x
});
var intersectedWithIndexes = indexedList2
.Join(indexedList1,
x => x.Item,
y => y.Item,
(x, y) => new
{
ExpectedIndex = x.Index,
ActualIndex = y.Index,
x.Item
})
.Where(x => x.ActualIndex != x.ExpectedIndex)
.ToArray();
var outOfOrder = intersectedWithIndexes
.Select((x, i) => new
{
Item = x,
index = i
})
.Skip(1)
.Where(x => x.Item.ActualIndex < intersectedWithIndexes[x.index - 1].ActualIndex ||
x.Item.ExpectedIndex < intersectedWithIndexes[x.index - 1].ExpectedIndex)
.Select(x => new
{
ExpectedBefore = x.Item,
ExpectedAfter = intersectedWithIndexes[x.index - 1]
});
foreach (var item in outOfOrder)
{
Console.WriteLine("'{0}' and '{1}' are out of order at index {2}",
item.ExpectedBefore.Item,
item.ExpectedAfter.Item,
item.ExpectedBefore.ActualIndex);
}
输出:
'f' and 'e' are out of order at index 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.