[英]Get highest frequency of combination of two different values in a list in C#
最好使用 System.linq
所以在我的程序中,我有一個包含產品列表的購買列表。 我正在嘗試確定哪兩種產品更常在一次銷售中一起購買。 例如:
產品 - 香蕉、蘋果、泰迪熊、啤酒、尿布、電子游戲、汽車
購買1 - 香蕉、泰迪熊、尿布、啤酒
購買2 - 尿布、汽車、啤酒
購買3 - 香蕉、電子游戲、汽車
最常一起購買的產品 = 尿布和啤酒。
有誰知道這樣做的最佳方法? 在實踐中,我的購買詞典中有大約 240 萬個元素,產品詞典中有 8013 個獨特的產品。
我相信您需要創建一個Dictionary
來跟蹤您的購買對。 也許您可以對采購清單進行排序,以便條目始終按字母順序排列,然后您可以遍歷您擁有的清單。
下面是我制作的一個示例項目,可以滿足您的要求,盡管產品只是char
。 它可能不是最有效的,但它至少會向您展示一種完成您所要求的方法。
class Program
{
class Purchase{
public List<char> Products = new List<char>();
}
static void Main(string[] args)
{
do
{
// here is our list of products
char[] products = new char[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g' };
// randomly populate a list of purchases
Random random = new Random();
List<Purchase> purchases = new List<Purchase>();
for (int i = 0; i < 10000; i++)
{
var currPurchase = new Purchase();
foreach (var p in products)
if (random.Next(0, 2) > 0)
currPurchase.Products.Add(p);
purchases.Add(currPurchase);
}
// for all of our products, we need to make a dictionary of all the possible pairs
Dictionary<(char, char), int> count = new Dictionary<(char, char), int>();
// now for each purchase, we go through and see all the pairs that happened, and update those dictionary entries
foreach (var purchase in purchases)
{
// get the pairs, update the pairs
for (int i = 0; i < purchase.Products.Count; i++)
{
for (int j = i + 1; j < purchase.Products.Count; j++)
{
if (count.ContainsKey((purchase.Products[i], purchase.Products[j])))
count[(purchase.Products[i], purchase.Products[j])]++;
else
count[(purchase.Products[i], purchase.Products[j])] = 1;
}
}
}
// then get the pair that had the highest frequency
var highest = count.Max(kvp => kvp.Value);
// and then get all the keys that occurred that many times (assuming we could have two pairs of equal frequency)
var mostFrequent = count.Where(kvp => kvp.Value == highest).Select(x => x.Key);
Console.WriteLine($"Most Frequent Pairs (Occurred {highest} times): ");
foreach (var pair in mostFrequent)
Console.Write(pair + "; ");
//type quit and hit enter to quit
} while (Console.ReadLine() != "quit");
}
}
這是我使用 LINQ 的嘗試:
var purchases = new List<List<string>>
{
new List<string>() { "banana", "teddy bear", "beer", "diaper" },
new List<string>() { "beer", "diaper", "car" },
new List<string>() { "banana", "video game", "car" }
};
var itemsPairsOccurences = purchases
// cross join every purchase with itself and produce items pairs, i.e.: correlate every item included in a purchase with every item from the same purchase
.SelectMany(purchase1 => purchase1.SelectMany(_ => purchase1, (item1, item2) => Tuple.Create(item1, item2)))
// filter out pairs containing the same items (e.g. banana-banana) and duplicate pairs (banana-car remains, car-banana is skipped)
.Where(itemsPair => string.CompareOrdinal(itemsPair.Item1, itemsPair.Item2) < 0)
// group all pairs from all purchases by unique pairs and count occurrences
.GroupBy(itemsPair => itemsPair, (itemsPair, allItemsPairs) => KeyValuePair.Create(itemsPair, allItemsPairs.Count()));
// choose a pair with the most occurrences
var mostFrequent1 = itemsPairsOccurences.Aggregate((itemsPair1, itemsPair2) => itemsPair1.Value > itemsPair2.Value ? itemsPair1 : itemsPair2);
// OR order pairs by occurencces and select the first one
var mostFrequent2 = itemsPairsOccurences.OrderByDescending(itemsPair => itemsPair.Value).FirstOrDefault();
恕我直言,在 RAM 中執行所有這些操作可能會占用大量資源。
雖然這可能是過早的優化,但我會使用以下邏輯來解決這個問題:
這也適用於 3、4 等產品的組合,盡管您需要切換到多頭或冒着整數溢出的風險。
我不確定你是否可以在 LINQ 中輕松做到這一點,這不是我真正喜歡的茶; 在 SQL 中,除了計算我猜的素數之外,這將是一件輕而易舉的事……(盡管您可能可以將其預先計算(或下載)到靜態表中,特別是如果您打算更頻繁地運行它)。
PS:您可以將 Purchases 列表拆分為 n 個獨立線程,最后合並生成的字典; 如果您的硬件支持它,這可能會使其速度更快。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.