[英]How can I count occurrence of an item in nested dictionaries in .json file using Python (and possibly iterate over)?
I'm trying for some time but since I'm still quite a begginer I'm having a hard time.我正在尝试一段时间,但由于我仍然是一个初学者,所以我很难过。 I have a file with jsons and all of them have this structure:我有一个带有 jsons 的文件,它们都具有以下结构:
{
"cds":{
"ENSLAFT00000035968.1":{
"A":407,
"C":312,
"G":320,
"T":320,
"Y":0,
"M":0,
"S":0,
"R":0,
"W":0,
"K":0,
"N":0,
"D":0,
"B":0,
"H":0,
"V":0,
"all":1359
},
"cdna":{
"ENSLAFT00000034174.1":{
"A":825,
"C":700,
"G":663,
"T":584,
"Y":0,
"M":0,
"S":0,
"R":0,
"W":0,
"K":0,
"N":0,
"D":0,
"B":0,
"H":0,
"V":0,
"all":2772
}
}
}
The first keys (cds and cdna) have each about over 1000 values (genes, the ENSLAFT+number).第一个键(cds 和 cdna)每个都有大约 1000 多个值(基因,ENSLAFT+数字)。 I would like to count all of the "N" occurrences (if some has fe 50 and some has 10, add them together and have 60).我想计算所有“N”次出现(如果有些有 50 次,有些有 10 次,则将它们加在一起并有 60 次)。 Shall I use Counter
from collections
or sum()
or len()
or some combination of them somehow...?我collections
Counter
sum()
或len()
或它们的某种组合以某种方式...? And how to make a cycle like that for each file in my folder with jsons with the same structure?以及如何使用具有相同结构的 jsons 为我的文件夹中的每个文件创建一个这样的循环? It sounds easy for me but I don't have much experience, so far I'm only able to count using pandas DataFrame or with not so complicated data...这对我来说听起来很容易,但我没有太多经验,到目前为止,我只能使用 pandas DataFrame 或不那么复杂的数据来计算......
I appreciate any help and further study recommendations!我感谢任何帮助和进一步的学习建议!
You can use a for-loop:您可以使用 for 循环:
data = {"cds": {"ENSLAFT00000035968.1": {"A": 407, "C": 312, "G": 320, "T": 320, "Y": 0, "M": 0, "S": 0, "R": 0, "W": 0, "K": 0, "N": 0, "D": 0, "B": 0, "H": 0, "V": 0, "all": 1359}}, "cdna": {"ENSLAFT00000034174.1": {"A": 825, "C": 700, "G": 663, "T": 584, "Y": 0, "M": 0, "S": 0, "R": 0, "W": 0, "K": 0, "N": 0, "D": 0, "B": 0, "H": 0, "V": 0, "all": 2772}}}
counter = 0
for value in data.values():
# key would be cds or cdna, value is the dict of genes
for gene in value.values():
# key would be ENSLAFT00000035968.1, ...
if 'N' in gene:
counter += gene['N']
print(counter)
You can check the key to only count some:您可以检查密钥以仅计算一些:
counter = 0
for key, value in data.items():
# key would be cds or cdna, value is the dict of genes
if key == "cds":
for gene in value.values():
# key would be ENSLAFT00000035968.1, ...
if 'N' in gene:
counter += gene['N']
print(counter)
You could go about it by brute force looking for a regex in a string version of your JSON, eg您可以通过蛮力寻找 go 在 JSON 的字符串版本中寻找正则表达式,例如
import json
import re
s = {"cds": {"ENSLAFT00000035968.1": {"A": 407, "C": 312, "G": 320, "T": 320, "Y": 0, "M": 0, "S": 0, "R": 0, "W": 0, "K": 0, "N": 0, "D": 0, "B": 0, "H": 0, "V": 0, "all": 1359}, "cdna": {"ENSLAFT00000034174.1": {"A": 825, "C": 700, "G": 663, "T": 584, "Y": 0, "M": 0, "S": 0, "R": 0, "W": 0, "K": 0, "N": 0, "D": 0, "B": 0, "H": 0, "V": 0, "all": 2772}}}}
s_str = json.dumps(s)
m = re.findall(r'"N":\s(\d+)', s_str)
print(m) # prints ['0', '0']
print(len(m)) # prints 2
Or go the cleaner, longer route of a recursive function...或者 go 是递归 function 的更清洁、更长的路线...
def rec_find(s, cur=0):
if type(s) not in (dict, ):
return 0
resp = 0
if "N" in s.keys():
resp += 1
for k in s.keys():
resp += rec_find(s[k], resp)
return resp
print(rec_find(s)) # prints 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.