[英]Proper use of lists and sets in python
I try to write a-priori algorithm in python and I have a problem when the algorithm have to check the k-dimensional itemsets. 我尝试在python中编写先验算法,当算法必须检查k维项目集时遇到问题。 So far, I have written this code: 到目前为止,我已经编写了以下代码:
def A_Priori_Algorithm_Next_Passes(file, freqk, k, s):
input_file = open(file, 'r')
csv_reader = csv.reader(input_file, delimiter=',')
baskets = []
for row in csv_reader:
unique_row_items = set([field.strip().lower() for field in row])
baskets.append(unique_row_items)
input_file.close()
all_items = []
counts = {}
freq = {}
length = len(baskets)
i = 0
while(i < length):
items = GetUniqueItems(baskets[i])
items_list = list(items)
length_1 = len(items_list)
itemset_pairs = GetPairs(freqk)
u = 0
while(u < len(itemset_pairs)):
all_items.append(tuple(itemset_pairs[u]))
u = u + 1
candidates = []
q = 0
while(q < len(itemset_pairs)):
a1 = itemset_pairs[q][0]
a2 = itemset_pairs[q][1]
#print(a1)
#print(a2)
#candidate_sum = a1 + ',' + a2
candidate_set = set(a1).union(set(a2))
candidate = []
candidate.append(candidate_set)
if(tuple(candidate) not in candidates):
candidates.append(tuple(candidate))
if((len(candidate) == (k + 1)) and ((candidate < items) == True)):
#print(candidate)
if(tuple(candidate) not in counts):
counts[tuple(candidate)] = 1
else:
counts[tuple(candidate)] = counts[tuple(candidate)] + 1
q = q + 1
i = i + 1
i = 0
while(i < len(all_items)):
if(all_items[i] in counts):
if(counts[tuple(all_items[i])] >= s):
freq[all_items[i]] = counts[all_items[i]]
i = i + 1
return freq
My problem is that I can't recognise when to use list and when to use a set. 我的问题是我无法识别何时使用列表以及何时使用集合。 In this if-statement "if((len(candidate) == (k + 1)) and ((candidate < items) == True)):" the program never gets in. Have you any idea of what I haven't understand? 在此if语句中,“ if((len(candidate)==(k + 1))and((candidate <items)== True)):”该程序永远不会进入。您对我所没有的了解吗?不明白吗? the pseudocode for the algorithm is: 该算法的伪代码为:
Algorithm: A-Priori algorithm (k + 1) pass.
Input: F, a file containing baskets
Input: freqk, a table containg the frequencies of itemsets of size k in baskets above the threshold s
Input: k, the size of the itemsets in freqk
Input: s, the support
Output: freq, a table containg the frequencies of itemsets of size k + 1 with threshold s
1 counts ← ∅
2 freq ← ∅
3 foreach basket in F do
4 items ← GetUniqueItems(basket)
5 itemset_pairs = GetPairs(freqk)
6 candidates ← ∅
7 foreach pair in itemset_pairs do
8 (fp,sp) ← pair
9 candidate ← fp ∪ sp
10 if not candidate in candidates then
11 Add(candidates, candidate)
12 if |candidate| = k + 1 and candidate ⊆ items then
13 counts[candidate] ← counts[candidate] + 1
14 foreach itemset, count in counts do
15 if count ≥ s then
16 freq[itemset] = count
17 return freq
Thanks in advance! 提前致谢!
Sets are superior for testing membership (if x in set), and a set must contain hashable data, and cannot/will not contain duplicates (try set([1, 1, 3, 4])
). 集合对于测试成员资格(如果x在集合中)优越,并且一个集合必须包含可散列的数据,并且不能/将不包含重复项(请尝试set([1, 1, 3, 4])
1,1,3,4 set([1, 1, 3, 4])
)。 Sets make available a lot of set theory functions, eg, intersection. 集使许多集理论功能可用,例如交集。 They are slower for adding members, are not ordered, and it's generally a good idea to use a list() if you don't have a good reason to use a set(). 它们添加成员的速度较慢,没有顺序,如果您没有充分的理由使用set(),通常最好使用list()。 I encourage you to read the official python documentation on set() . 我鼓励您阅读set()上的官方python文档 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.