简体   繁体   English

如何查找字符串中子字符串的出现次数并将其存储到 Python 字典中?

[英]How to find and store the number of occurrences of substrings in strings into a Python dictionary?

I have a problem, didn't know how to create a matrix我有一个问题,不知道如何创建矩阵

I have a dictionary of this type:我有一本这种类型的字典:

dico = {
"banana": "sp_345",
"apple": "ap_456",
"pear": "pe_645",

} }

and a file like that:和这样的文件:

sp_345_4567 pe_645_4567876  ap_456_45678    pe_645_4556789
sp_345_567  pe_645_45678
pe_645_45678    ap_456_345678
sp_345_56789    ap_456_345
pe_645_45678    ap_456_345678
sp_345_56789    ap_456_345
s45678  f45678  f456789 ap_456_52546135

What I want to do is to create a matrix where we find more than n times a value from the dictionary in the line.我想要做的是创建一个矩阵,在该矩阵中,我们从行中的字典中找到超过 n 倍的值。

This is how I want to proceed:这就是我想要继续的方式:

step 1 create a dictionary with the associated values and numbers of lines:第 1 步创建一个包含相关值和行数的字典:

Like that:像那样:

dictionary = {'1': 'sp_345_4567','pe_645_4567876', 'ap_456_45678', 'pe_645_4556789'; '2': 'sp_345_567', 'pe_645_45678'; '3:' 'pe_645_45678','ap_456_345678'; '4:' etc ..

Then I want to make a comparison between the values with my first dictionary called dico and see for example in the number of times the banana key appears in each line (and therefore do it for all the keys of my dictionary) except that the problem is that the values of my dico are not equal to those of my dictionary because they are followed by this pattern'_\w+''然后我想将值与我的第一个字典 dico 进行比较,并查看例如香蕉键出现在每一行中的次数(因此对我字典的所有键执行此操作),除了问题是我的 dico 的值不等于我的字典的值,因为它们后面跟着这个模式'_\w+''

The idea would be to make a final_dict that would look like this to be able to make a matrix at the end:这个想法是制作一个看起来像这样的final_dict,以便能够在最后制作一个矩阵:

final_dict = {'line1': 'Banana' : '1' ; 'Apple': '1'; 'Pear':2; 'line2': etc ...

Here is my code that don't work:这是我的代码不起作用:

import pprint
import re
import csv

dico = {
    "banana": "sp_345",
    "apple": "ap_456",
    "pear": "pe_645",
}

dictionary = {}
final_dict = {}
cnt = 0
with open("test.txt") as file :
    reader = csv.reader(file, delimiter ='\t')
    for li in reader:
        grp = li
        number = 1
        for li in reader:
            dictionary[number] = grp
            number += 1
            pprint.pprint(dictionary)
            number_fruit = {}
            for key1, val1 in dico.items():
                for key2, val2 in dictionary.items():
                     if val1 == val2+'_\w+':
                         final_dict[key1] = val2

Thanks for the help谢谢您的帮助

EDIT:编辑:

I've tried using a dict comprehension我试过使用字典理解

import csv
import re

dico = {
    "banana": "sp_345",
    "apple": "ap_456",
    "pear": "pe_645",
}

with open("test.txt") as file :
    reader = csv.reader(file, delimiter ='\t')
    for li in reader:
        pattern = re.search(dico["banana"]+"_\w+", str(li))
        if pattern:
            final_dict = {"line" + str(index + 1):{key:line.count(text) for key, text in dico.items()} for index, line in enumerate(reader)}
        print(final_dict)

But when I print my final dictionary, it only put 0 for banana...但是当我打印我的最终字典时,它只为香蕉放了 0 ......

{'line1': {'banana': 0, 'apple': 0, 'pear': 0}, 'line2': {'banana': 0, 'apple': 0, 'pear': 0}, 'line3': {'banana': 0, 'apple': 0, 'pear': 0}, 'line4': {'banana': 0, 'apple': 0, 'pear': 0}, 'line5': {'banana': 0, 'apple': 0, 'pear': 0}, 'line6': {'banana': 0, 'apple': 0, 'pear': 0}}

So yeah, now it looks like a bit more of what I wanted but the occurences doesn't rise.... :/ Maybe my condition should be inside the dict comprehension??所以,是的,现在它看起来更像是我想要的,但发生率并没有上升......:/也许我的情况应该在字典理解范围内?

Why it doesn't work为什么它不起作用

Your test你的测试

if val1 == val2+'_\w+':
    ...

doesn't work because you are testing string equality between val1 which could be "sp_345_4567" and val2+'_\w+' , which is a string and could be litterally "sp_345_\w+'" , and they are not equal.不起作用,因为您正在测试val1之间的字符串相等性,它可能是"sp_345_4567"val2+'_\w+' ,它是一个字符串,可能是乱七八糟"sp_345_\w+'" ,它们不相等。

What you could do about it你能做些什么

  • You might want to test for containment, for example例如,您可能想要测试遏制
if val1 in val2:
    ...

You can check that "sp_345" in "sp_345_4567" returns true .您可以检查"sp_345" in "sp_345_4567"返回true

  • You might also want to actually count the number of times "sp_345" appears in another string, and you can do this using .count :您可能还想实际计算"sp_345"出现在另一个字符串中的次数,您可以使用.count来执行此操作:
"sp_345_567  pe_645_45678".count("sp_345") # returns 1
"sp_345_567  pe_645_45678".count("_") # returns 2
  • You could also do it using regular expressions as you've tried to:您也可以尝试使用正则表达式来执行此操作:
import re
pattern = "sp_345_" + "\\w+"

if re.match(pattern, "sp_345_4567"):
    # pattern was found! Do stuff here.
    pass

# alternatively:
print(re.findall(pattern, "sp_345_4567"))
# prints ['sp_345_4567']

How can you apply that to build your final_dict你如何应用它来构建你的final_dict

You can rewrite your code in a much simpler way using dictionary comprehension :您可以使用字典理解以更简单的方式重写代码:

import csv

dico = {
    "banana": "sp_345",
    "apple": "ap_456",
    "pear": "pe_645",
}

with open("test.txt") as file :
    reader = csv.reader(file, delimiter ='\t')
    final_dict = {"line" + str(index + 1):{key:line.count(text) for key, text in dico.items()} for index, line in enumerate(reader)}

I'm building an outer dictionary with keys like "line1" , "line2" ... and for each of them, the value is an inner dictionary with keys like "banana" or "apple" and each value is the number of times they appear on the line.我正在构建一个带有"line1""line2"类的键的外部字典......对于它们中的每一个,值都是一个带有"banana""apple"类的键的内部字典,每个值都是次数他们出现就行了。

If you want to know how many times the banana appears on line 4 , you'd use如果您想知道banana在第4行出现了多少次,您可以使用

print(final_dict["line4"]["banana"])

Note that I would recommend using a list rather than a dictionary to map results to line numbers, so that the previous query would become:请注意,我建议使用列表而不是字典将 map 结果转换为行号,以便前面的查询变为:

print(final_list[3]["banana"])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用字典在 Python 中出现的次数 - Number of occurrences in Python using dictionary 如何在python中查找数字的最大连续出现次数 - How to find the maximum consecutive occurrences of a number in python Python-如何返回一个字典,用于计算字符串列表中的出现次数? - Python- How to return a dictionary that counts occurrences in a list of strings? 如何找到数组中的哪些字符串是python中另一个字符串的子字符串? - How to find which strings in an array are substrings to another string in python? 如何匹配字典值中的字符串和子字符串 - How to match strings and substrings from dictionary values 如何拆分文本 python 计算字符串列表中的出现次数 - How to split text in python count number of occurrences in a list of strings 计算值字典列表python的出现次数 - Count the number of occurrences for values dictionary list python 计算python字典中某个值的出现次数? - count the number of occurrences of a certain value in a dictionary in python? 如何在不使用计数器或python中的字典的情况下查找项目出现在字符串列表中的次数? - How to find the number of times an item appears in a list of list of strings without using counter, or dictionary in python? Python - 在字符串中查找字符串列表的出现 - Python - find occurrences of list of strings within string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM