![](/img/trans.png)
[英]Counting occurrences of a word in chunks in python (list comprehension)
[英]Efficient counting of word occurrences in Python
假設我有兩個表:
表格1:
ID CODE DATE value1 value2 text
-----------------------------------------------------
1 13A 2012-05-04 12.0 0.0 null
2 13B 2011-06-08 5.5 0.0 null
3 13C 2012-07-05 4.0 0.0 null
4 13D 2010-09-09 7.7 0.0 null
1 13A .....................................
1 13D .....................................
3 13D .....................................
表2:
CODE DESCRIPTION
------------------
13A DISEASE1
13B DISEASE2
13C DISEASE3
13D DISEASE4
我想找到一種有效的方法來計算每個id的代碼出現次數,並根據第二張表中的代碼創建計數向量。例如:
[2,0,0,1]代表id = 1的人的計數向量,其中每個值都是table2中代碼的出現
我設法做到了這一點,但看起來效率不是很高……有沒有更有效的方法?
sql = "SELECT * FROM table1"
cursor.execute(sql)
table1 = cursor.fetchall()
sql2 = "SELECT CODE FROM table2"
cursor.execute(sql2)
codes = cursor.fetchall()
list1 = []
list2 = []
cnt = Counter()
countList = []
n=len(codes)
for id,iter in itertools.groupby(table1,operator.itemgetter('ID')):
idList = list(iter)
list1.append(list((z['CODE']) for z in idList))
for pat in list1:
for code in codes:
cnt=pat.count(code.get('CODE'))
list2.append(cnt)
countList = [list2[i:i+n] for i in range(0, len(list2), n)]
使用生成器可能會加快速度:
import itertools
import operator
def code_counter(table, codes):
for key, group in itertools.groupby(table, key=operator.itemgetter('ID')):
group_codes = [item['CODE'] for item in group]
yield [group_codes.count(code) for code in codes]
if __name__ == '__main__':
cursor.execute("SELECT * FROM table1")
table1 = cursor.fetchall()
cursor.execute("SELECT CODE FROM table2")
codes = [code.get('code') for code in cursor.fetchall()]
for chunk in code_counter(table1, codes):
print(chunk)
您可能想要對table1
進行大塊迭代。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.