[英]How can I count occurrence of an item in nested dictionaries in .json file using Python (and possibly iterate over)?
我正在尝试一段时间,但由于我仍然是一个初学者,所以我很难过。 我有一个带有 jsons 的文件,它们都具有以下结构:
{
"cds":{
"ENSLAFT00000035968.1":{
"A":407,
"C":312,
"G":320,
"T":320,
"Y":0,
"M":0,
"S":0,
"R":0,
"W":0,
"K":0,
"N":0,
"D":0,
"B":0,
"H":0,
"V":0,
"all":1359
},
"cdna":{
"ENSLAFT00000034174.1":{
"A":825,
"C":700,
"G":663,
"T":584,
"Y":0,
"M":0,
"S":0,
"R":0,
"W":0,
"K":0,
"N":0,
"D":0,
"B":0,
"H":0,
"V":0,
"all":2772
}
}
}
第一个键(cds 和 cdna)每个都有大约 1000 多个值(基因,ENSLAFT+数字)。 我想计算所有“N”次出现(如果有些有 50 次,有些有 10 次,则将它们加在一起并有 60 次)。 我collections
Counter
sum()
或len()
或它们的某种组合以某种方式...? 以及如何使用具有相同结构的 jsons 为我的文件夹中的每个文件创建一个这样的循环? 这对我来说听起来很容易,但我没有太多经验,到目前为止,我只能使用 pandas DataFrame 或不那么复杂的数据来计算......
我感谢任何帮助和进一步的学习建议!
您可以使用 for 循环:
data = {"cds": {"ENSLAFT00000035968.1": {"A": 407, "C": 312, "G": 320, "T": 320, "Y": 0, "M": 0, "S": 0, "R": 0, "W": 0, "K": 0, "N": 0, "D": 0, "B": 0, "H": 0, "V": 0, "all": 1359}}, "cdna": {"ENSLAFT00000034174.1": {"A": 825, "C": 700, "G": 663, "T": 584, "Y": 0, "M": 0, "S": 0, "R": 0, "W": 0, "K": 0, "N": 0, "D": 0, "B": 0, "H": 0, "V": 0, "all": 2772}}}
counter = 0
for value in data.values():
# key would be cds or cdna, value is the dict of genes
for gene in value.values():
# key would be ENSLAFT00000035968.1, ...
if 'N' in gene:
counter += gene['N']
print(counter)
您可以检查密钥以仅计算一些:
counter = 0
for key, value in data.items():
# key would be cds or cdna, value is the dict of genes
if key == "cds":
for gene in value.values():
# key would be ENSLAFT00000035968.1, ...
if 'N' in gene:
counter += gene['N']
print(counter)
您可以通过蛮力寻找 go 在 JSON 的字符串版本中寻找正则表达式,例如
import json
import re
s = {"cds": {"ENSLAFT00000035968.1": {"A": 407, "C": 312, "G": 320, "T": 320, "Y": 0, "M": 0, "S": 0, "R": 0, "W": 0, "K": 0, "N": 0, "D": 0, "B": 0, "H": 0, "V": 0, "all": 1359}, "cdna": {"ENSLAFT00000034174.1": {"A": 825, "C": 700, "G": 663, "T": 584, "Y": 0, "M": 0, "S": 0, "R": 0, "W": 0, "K": 0, "N": 0, "D": 0, "B": 0, "H": 0, "V": 0, "all": 2772}}}}
s_str = json.dumps(s)
m = re.findall(r'"N":\s(\d+)', s_str)
print(m) # prints ['0', '0']
print(len(m)) # prints 2
或者 go 是递归 function 的更清洁、更长的路线...
def rec_find(s, cur=0):
if type(s) not in (dict, ):
return 0
resp = 0
if "N" in s.keys():
resp += 1
for k in s.keys():
resp += rec_find(s[k], resp)
return resp
print(rec_find(s)) # prints 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.