[英]Trying to group different values that have some similarities in a dictionary
I'm parsing a JSON that kinda looks like this:我正在解析一个看起来像这样的 JSON:
[{"acc":P1,"Lenght":855,..."MBDB-1":{"source_id":"2btp_A","regions":[[70,73],[231,234]],"content_fraction":0.033,"content_count":8},"MBDB-2":{...},"MDB-2":{...}},\
{"acc":P2,"Lenght":145,...,"MBDB-14":{...},...}]
And I'm trying to generate a dictionary with only the information that I want (ie, "acc", "Lenght"
) and all the information INSIDE the keys that starts with "MBDB", no matter what comes after that (the actual file is huge, with a lot of information that I don't really need).而且我正在尝试生成一个字典,其中仅包含我想要的信息(即
"acc", "Lenght"
)以及以“MBDB”开头的键内的所有信息,无论之后发生什么(实际文件很大,有很多我并不真正需要的信息)。
For the first two items, it's fairly easy.对于前两个项目,这相当容易。 This is what I got:
这就是我得到的:
import json
my_dict= dict.fromkeys(['ID', 'MISSING','LENGHT'])
with open("...\mypath\Json1.json") as f:
data = json.loads(f.read())
for i in data:
if "acc" in i:
my_dict["ID"]=i["acc"]
But I'm really lost on how to append each of the values of "MBDB-something" to the MISSING
key.但是我真的迷失了如何将“MBDB-something”的每个值 append 到
MISSING
键。 As far as I understand, I can't use startswith()
, because I'm working with a dict (generated by json.loads()
).据我了解,我不能使用
startswith()
,因为我正在使用 dict (由json.loads()
生成)。
This is what the result should look like:结果应该是这样的:
ID LENGHT source_id regions content_count
0 P1 855 2btp_A [[70,73],[231,234]] 8
1 P1 855 ... [...] #
2 P2 145 ... [...] #
So I can later use .explode
and perform different operations on some of the information that these keys hold.所以我以后可以使用
.explode
并对这些键持有的一些信息执行不同的操作。 I feel that I'm out of my league to solve this issue, so any advice is welcome: EDIT.我觉得我无法解决这个问题,所以欢迎任何建议:编辑。 I've edited the desired output to be the content of the different keys INSIDE all the "MBDB" keys.
我已将所需的 output 编辑为所有“MBDB”键内的不同键的内容。
Since the key are consistent in the json object, you can insert one item in a list based on every "MBDB" key that you find.由于密钥在 json object 中是一致的,因此您可以根据找到的每个“MBDB”密钥在列表中插入一项。
# load data
with open("...\mypath\Json1.json") as f:
data = json.loads(f.read())
out = [] # final output
for d in data:
for k, v in d.items():
if "MBDB" in k:
out.append({
"ID": d["acc"],
"LENGTH": d["Lenght"],
"source_id": v["source_id"],
"regions": v["regions"],
"content_count": v["content_count"]
})
Final output here will be a list of dict.最后的 output 这里将是一个字典列表。 you can use pandas to convert it into a dataframe.
您可以使用 pandas 将其转换为 dataframe。
df = pandas.DataFrame(out)
# output
ID LENGTH source_id regions content_count
0 P1 855 2btp_A [[70, 73], [231, 234]] 8
1 P1 855 2btp_B [[70, 73], [231, 234]] 8
2 P2 855 2btp_A [[70, 73], [231, 234]] 8
3 P2 855 2btp_B [[70, 73], [231, 234]] 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.