简体   繁体   English

为什么出现次数没有增加?

[英]Why doesn't the number of occurences increase?

Here is my problem: I have a dictionary ( dico ) and I want to count the number of times, for 2 different keys, that they both appear on the same line in the file "file.tsv" which looks like this:这是我的问题:我有一本字典( dico ),我想计算两个不同键的次数,它们都出现在文件“file.tsv”中的同一行,如下所示:

sp_345_4567 pe_645_4567876  ap_456_45678    pe_645_4556789 ...
sp_345_567  pe_645_45678 ...
pe_645_45678    ap_456_345678 ...
sp_345_56789    ap_456_345 ...
pe_645_45678    ap_456_345678 ...
sp_345_56789    ap_456_345 ...
...

For example, the values of the banana and apple keys appear on line 1 so no matter how many times they appear they are still present, and so we have 1 line in common, and I want to do it on all the lines of the file例如,香蕉和苹果键的值出现在第 1 行,所以无论它们出现多少次,它们仍然存在,所以我们有 1 行共同,我想在文件的所有行上都这样做

For that I added the pattern '_\w+' behind each value and then made a regex with the function re.search .为此,我在每个值后面添加了模式'_\w+' ,然后使用 function re.search创建了一个正则表达式。

from itertools import product
import csv

dico = {
    "banana": "sp_345",
    "apple": "ap_456",
    "pear": "pe_345",
    "cherry": "ap_345",
    "coco": "sp_543",
}

counter = {}
with open("file.tsv") as file:
    reader = csv.reader(file, delimiter="\t")
    for line in reader:
        for key1, key2 in product(dico, dico):
            if key1 >= key2:
                continue
            counter[key1, key2] = 0
            k1 = k2 = False
            for el in line:
                if re.search(dico[key1]+'_\w+', el):
                    k1 = True
                elif re.search(dico[key2]+'_\w+', el):
                    k2 = True
                if k1 and k2:
                    counter[key1, key2] += 1
                    break

for key, val in counter.items():
    print(key, val)

But the occurrences is stop at 0:但发生在 0 处停止:

Apple banana 0
pear banana 0
pear apple 0

k1 and k2 can't both be True because you are initializing both with False and setting at most one to True . k1k2不能都为True ,因为您同时使用False进行初始化,并且最多将一个设置为True

elif re.search(dico[key2]+'_\w+', el):
    k2 = True

should be应该

if re.search(dico[key2]+'_\w+', el):
     k2 = True

Your line你的线

counter[key1, key2] = 0

should only happen when (key1, key2) doesn't have a value yet.应该只在 (key1, key2) 还没有值时发生。 For example by adding a test:例如通过添加一个测试:

if (key1, key2) not in counter:
    counter[key1, key2] = 0

Or you could set counter[key1, key2] to 0 for all pairs before opening the csv.或者您可以在打开 csv 之前将所有对的 counter[key1, key2] 设置为 0。 As in:如:

for key1, key2 in product(dico, dico):
    if key1 < key2:
        counter[key1, key2] = 0
counter = {}
with open("file.tsv") as file:
    ....  

Also

elif re.search(dico[key2]+'_\w+', el):

should be应该

if re.search(dico[key2]+'_\w+', el):

Otherwise you will never find key2 when you found key1否则,当您找到 key1 时,您将永远找不到 key2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM