I have two lists of string, and I want to extract from each list the index if the string at current index is in the second list(and vice versa), the string cant match exactly or can be a shorthand of another list, for example, consider this two list
List<string> aList = new List<string> { "Id", "PartCode", "PartName", "EquipType" };
List<string> bList = new List<string> { "PartCode", "PartName", "PartShortName", "EquipmentType" };
in the above example, I want from aList
the indexes: 1,2,3
and from bList
indexes 0,1,3
indexes 1,2 from aList
are obvious the string matched completely, but the interesting part are "EquipType" and "EquipmentType" which match becuse "EquipType" is a shorthand of "EquipmentType"
but "PartName" is not a shorthand of "PartShortName" so there indexes are not needed
these is my code
List<string> aList = new List<string> { "Id", "PartCode", "PartName", "EquipType" };// 1, 2 , 3
List<string> bList = new List<string> { "PartCode", "PartName", "PartShortName", "EquipmentType" };//0, 1 ,3
List<int> alistIndex = new List<int>();
List<int> blistIndex = new List<int>();
for (int i = 0; i < aList.Count; i++)
{
string a = aList[i];
for (int j = 0; j < bList.Count(); j++)
{
string b = bList[j];
string bigger, smaller;
int biggerCount, smallerCount;
if (a.Length > b.Length)
{
bigger = a; smaller = b;
biggerCount = a.Length ; smallerCount = b.Length ;
}
else
{
bigger = b; smaller = a;
biggerCount = b.Length; smallerCount = a.Length ;
}
int countCheck = 0;
for (int k = 0; k < biggerCount; k++)
{
if (smaller.Length != countCheck)
{
if (bigger[k] == smaller[countCheck])
countCheck++;
}
}
if (countCheck == smaller.Length)
{
alistIndex.Add(i);
blistIndex.Add(j);
res = true;
break;
}
else
res = false;
}
}
alistIndex.ForEach(i => Console.Write(i));
Console.WriteLine(Environment.NewLine);
blistIndex.ForEach(i => Console.Write(i));
Console.ReadKey();
the above code works just fine and looks very similar to this solution
but if change the order of the second list like so
List<string> bList = new List<string> { "PartCode", "PartShortName", "PartName", "EquipmentType" };
i will get index 0, 1 and 3 (but i want 0 2 and 3)
should i check the distance for every pair and return the lowest? or should i work ia different method
Thanks
ps i also found this GitHub, but i don't know if it will do the trick for me
I do feel that what you are trying to do is a bad idea... Id is the abbreviation of Idiotic , just to give an example :-) Still... I wanted to do some experiments on Unicode.
Now, this code will split words on uppercase letters. PartName
is Part + Name
because the N
is uppercase. It doesn't support ID
as Identifier
(because it should be IDentifier
) but it does support NSA
as NotSuchAgency
:-) So full acronyms are ok, while FDA
isn't equivalent to FoodAndDrugAdministration
, so acronyms with conjunctions are KO.
public static bool ShorthandCompare(string str1, string str2)
{
if (str1 == null)
{
throw new ArgumentNullException(nameof(str1));
}
if (str2 == null)
{
throw new ArgumentNullException(nameof(str2));
}
if (str1 == string.Empty)
{
return str2 == string.Empty;
}
if (object.ReferenceEquals(str1, str2))
{
return true;
}
var ee1 = StringInfo.GetTextElementEnumerator(str1);
var ee2 = StringInfo.GetTextElementEnumerator(str2);
bool eos1, eos2 = true;
while ((eos1 = ee1.MoveNext()) && (eos2 = ee2.MoveNext()))
{
string ch1 = ee1.GetTextElement(), ch2 = ee2.GetTextElement();
// The string.Compare does some nifty tricks with unicode
// like string.Compare("ì", "i\u0300") == 0
if (string.Compare(ch1, ch2) == 0)
{
continue;
}
UnicodeCategory uc1 = char.GetUnicodeCategory(ch1, 0);
UnicodeCategory uc2 = char.GetUnicodeCategory(ch2, 0);
if (uc1 == UnicodeCategory.UppercaseLetter)
{
while (uc2 != UnicodeCategory.UppercaseLetter && (eos2 = ee2.MoveNext()))
{
ch2 = ee2.GetTextElement();
uc2 = char.GetUnicodeCategory(ch2, 0);
}
if (!eos2 || string.Compare(ch1, ch2) != 0)
{
return false;
}
continue;
}
else if (uc2 == UnicodeCategory.UppercaseLetter)
{
while (uc1 != UnicodeCategory.UppercaseLetter && (eos1 = ee1.MoveNext()))
{
ch1 = ee1.GetTextElement();
uc1 = char.GetUnicodeCategory(ch1, 0);
}
if (!eos1 || string.Compare(ch1, ch2) != 0)
{
return false;
}
continue;
}
// We already know they are different!
return false;
}
if (eos1)
{
while (ee1.MoveNext())
{
string ch1 = ee1.GetTextElement();
UnicodeCategory uc1 = char.GetUnicodeCategory(ch1, 0);
if (uc1 == UnicodeCategory.UppercaseLetter)
{
return false;
}
}
}
else if (eos2)
{
while (ee2.MoveNext())
{
string ch2 = ee2.GetTextElement();
UnicodeCategory uc2 = char.GetUnicodeCategory(ch2, 0);
if (uc2 == UnicodeCategory.UppercaseLetter)
{
return false;
}
}
}
return true;
}
and then
List<string> aList = new List<string> { "Id", "PartCode", "PartName", "EquipType" };
List<string> bList = new List<string> { "PartCode", "PartName", "PartShortName", "EquipmentType" };
List<List<int>> matches = new List<List<int>>();
for (int i = 0; i < aList.Count; i++)
{
var lst = new List<int>();
matches.Add(lst);
for (int j = 0; j < bList.Count; j++)
{
if (ShorthandCompare(aList[i], bList[j]))
{
lst.Add(j);
}
}
}
Note that the result is a List<List<int>>
, because you could have multiple matches for a single word of aList
!
Now... the interesting part of the ShorthandCompare
is that it tries to be "intelligent" and handle non-BMP Unicode characters (through the use of StringInfo.GetTextElementEnumerator
) and handle decomposed Unicode characters (the ì
character can be obtained in Unicode through i
+ \̀
, that is its dieresis). It does it through the use of string.Compare
that, differently than string.Equals
, is Unicode-aware ( string.CompareOrdinal
is more similar to string.Equals
and not Unicode-aware).
bool cmp1 = ShorthandCompare("IdìoLe\u0300ss", "Idi\u0300oticLèsser"); // true
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.