I have MyObject with field: id, a, b, c, e, f and I have List with 500 000 items, now how can I remove all duplicate items with of the same value of the parameter a, c, f?
I am looking for only the fastest and most efficient method.
UPDATE
I implemented comparator
Fields in my class are of different types so I use ToString()
. It is good way?
IdLoc
, IdMet
, Ser
are long?
Value
is Object
IdDataType
is long
class Comparer : IEqualityComparer<MyObject>
{
public bool Equals(MyObject x, MyObject y)
{
return x.IdLoc == y.IdLoc && x.IdMet == y.IdMet && x.Ser == y.Ser &&
x.IdDataType == y.IdDataType && x.Time == y.Time && x.Value == y.Value;
}
public int GetHashCode(MyObject obj)
{
string idLoc = obj.IdLoc.HasValue ? obj.IdLoc.ToString() : String.Empty;
string idMet = obj.IdMet.HasValue ? obj.IdMet.ToString() : String.Empty;
string ser = obj.Ser.HasValue ? obj.Ser.ToString() : String.Empty;
string value = obj.Value != null ? obj.Value.ToString() : String.Empty;
return (idLoc + idMet + ser + value + obj.IdDataType.ToString() + obj.Time.ToString()).GetHashCode();
}
}
Removing duplicates
Elements 566 890
1) Time: 2 sec
DateTime start = DateTime.Now;
List<MyObject> removed = retValTmp.Distinct(new Comparer()).ToList();
double sec = Math.Round((DateTime.Now - start).TotalSeconds, 3);
2) Time: 1.5 sec
start = DateTime.Now;
List<MyObject> retList = new List<MyObject>();
HashSet<MyObject> removed2 = new HashSet<MyObject>(new Comparer());
foreach (var item in retValTmp)
{
if (!removed2.Contains(item))
{
removed2.Add(item);
retList.Add(item);
}
}
double sec2 = Math.Round((DateTime.Now - start).TotalSeconds, 3);
4) Also I tried out this way:
start = DateTime.Now;
var removed3 = retValTmp.Select(myObj => new { myObj.IdLoc, myObj.IdMet, myObj.Ser, myObj.Value, myObj.IdDataType, myObj.Time }).Distinct().ToList();
double sec3 = Math.Round((DateTime.Now - start).TotalSeconds, 3);
Time: 0.35 sec
but returned list is not in my class, and why the number of elements in the list of 1 and 2 is different than the list of 3?
UPDATE2
public int GetDataHashCode(MyObject obj)
{
long idLoc = obj.IdLoc.HasValue ? obj.IdLoc.Value : 0;
long idMet = obj.IdMet.HasValue ? obj.IdMet.Value : 0;
long ser = obj.SerHasValue ? obj.Ser.Value : 0;
int valueHash = 0;
if (obj.Value != null)
valueHash = obj.Value.GetHashCode();
else
valueHash = valueHash.GetHashCode();
return (idLoc.GetHashCode() + idMet.GetHashCode() + ser.GetHashCode() + valueHash + obj.IdDataType.GetHashCode() + obj.Time.GetHashCode()).GetHashCode();
}
use:
foreach (MyObject daItem in retValTmp)
{
int key = GetDataHashCode(daItem);
if (!clearDict.ContainsKey(key))
clearDict.Add(key, daItem);
}
Element: 750 000
Time: 0.23 sec!
If what you are looking for is speed, and don't mind using up some memory then I would recommend that you use a HashSet
, if you are interested in doing some custom comparison, then you can make an IEqualityComparer<T>
, something like this:
var original = new ArrayList(); // whatever your original collection is
var unique = new HasSet<YourClass>(new MyCustomEqualityComparer());
foreach(var item in original)
{
if(!unique.Contains(item))
unique.Add(item);
}
return unique;
the issue here is that you may end up gobbling up twice the original memory.
I made some extra research and I think you can achieve just what you want by simply doing:
var original // your original data
var unique = new HashSet<YourClass>(origin, new CustomEqualityComparer());
that should take care of removing duplicated data as no duplication is allowed in a HashSet
. I'd recommend that you also take a look at this question about GetHasCode
implementation guidelines.
If you want to know some more about the HashSet
class follow these links:
About HashSet
About IEqualityComparer constructor
IEqualityComparer documentation
hope this helps
One efficient method would be first to to a quicksort (or similar n Log n sort), based on a hash of (a, c, f) and then you can iterate through the resultant list, picking one every time the value of (a, c, f) changes.
That would give an log n speed solution, which is probably the best you can do.
Well you can always use LINQ Distinct()
like this :
var matches = list.Distinct(new Comparer()).ToList();
But for Ditsinct()
to work you need to impletemnt Comparer for your Class:
class Comparer : IEqualityComparer<MyObject>
{
public bool Equals(MyObject x, MyObject y)
{
return x.a == y.a && x.c == y.c && x.f == y.f;
}
public int GetHashCode(MyObject obj)
{
return (obj.a + obj.c + obj.f).GetHashCode();
}
}
Drakko! You can use the Distinct()
method to get only the values that has different values for the properties you specify.
You could do something like this:
List<MyObj> list = new List<MyObj>();
//Run the code that is going to populate your list.
var result = list.Select(myObj => new { myObj.a, myObj.c, myObj.f})
.Distinct().ToList();
//Result contains the data based on the difference.
Code from this link worked great for me. https://nishantrana.me/2014/08/14/remove-duplicate-objects-in-list-in-c/
public class MyClass
{
public string ID { get; set; }
public string Value { get; set; }
}
List<MyClass> myList = new List<MyClass>();
var xrmOptionSet = new MyClass();
xrmOptionSet.ID = "1";
xrmOptionSet.Value = "100";
var xrmOptionSet1 = new MyClass();
xrmOptionSet1.ID = "2";
xrmOptionSet1.Value = "200";
var xrmOptionSet2 = new MyClass();
xrmOptionSet2.ID = "1";
xrmOptionSet2.Value = "100";
myList.Add(xrmOptionSet);
myList.Add(xrmOptionSet1);
myList.Add(xrmOptionSet2);
// here we are first grouping the result by label and then picking the first item from each group
var myDistinctList = myList.GroupBy(i => i.ID)
.Select(g => g.First()).ToList();
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.