[英]Why doesn't the number of occurences increase?
Here is my problem: I have a dictionary ( dico
) and I want to count the number of times, for 2 different keys, that they both appear on the same line in the file "file.tsv" which looks like this:这是我的问题:我有一本字典(
dico
),我想计算两个不同键的次数,它们都出现在文件“file.tsv”中的同一行,如下所示:
sp_345_4567 pe_645_4567876 ap_456_45678 pe_645_4556789 ...
sp_345_567 pe_645_45678 ...
pe_645_45678 ap_456_345678 ...
sp_345_56789 ap_456_345 ...
pe_645_45678 ap_456_345678 ...
sp_345_56789 ap_456_345 ...
...
For example, the values of the banana and apple keys appear on line 1 so no matter how many times they appear they are still present, and so we have 1 line in common, and I want to do it on all the lines of the file例如,香蕉和苹果键的值出现在第 1 行,所以无论它们出现多少次,它们仍然存在,所以我们有 1 行共同,我想在文件的所有行上都这样做
For that I added the pattern '_\w+'
behind each value and then made a regex with the function re.search
.为此,我在每个值后面添加了模式
'_\w+'
,然后使用 function re.search
创建了一个正则表达式。
from itertools import product
import csv
dico = {
"banana": "sp_345",
"apple": "ap_456",
"pear": "pe_345",
"cherry": "ap_345",
"coco": "sp_543",
}
counter = {}
with open("file.tsv") as file:
reader = csv.reader(file, delimiter="\t")
for line in reader:
for key1, key2 in product(dico, dico):
if key1 >= key2:
continue
counter[key1, key2] = 0
k1 = k2 = False
for el in line:
if re.search(dico[key1]+'_\w+', el):
k1 = True
elif re.search(dico[key2]+'_\w+', el):
k2 = True
if k1 and k2:
counter[key1, key2] += 1
break
for key, val in counter.items():
print(key, val)
But the occurrences is stop at 0:但发生在 0 处停止:
Apple banana 0
pear banana 0
pear apple 0
k1
and k2
can't both be True
because you are initializing both with False
and setting at most one to True
. k1
和k2
不能都为True
,因为您同时使用False
进行初始化,并且最多将一个设置为True
。
elif re.search(dico[key2]+'_\w+', el):
k2 = True
should be应该
if re.search(dico[key2]+'_\w+', el):
k2 = True
Your line你的线
counter[key1, key2] = 0
should only happen when (key1, key2) doesn't have a value yet.应该只在 (key1, key2) 还没有值时发生。 For example by adding a test:
例如通过添加一个测试:
if (key1, key2) not in counter:
counter[key1, key2] = 0
Or you could set counter[key1, key2] to 0 for all pairs before opening the csv.或者您可以在打开 csv 之前将所有对的 counter[key1, key2] 设置为 0。 As in:
如:
for key1, key2 in product(dico, dico):
if key1 < key2:
counter[key1, key2] = 0
counter = {}
with open("file.tsv") as file:
....
Also还
elif re.search(dico[key2]+'_\w+', el):
should be应该
if re.search(dico[key2]+'_\w+', el):
Otherwise you will never find key2 when you found key1否则,当您找到 key1 时,您将永远找不到 key2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.