简体   繁体   中英

Python: Parse Json to find matching lines matching few keys

I have a Json which looks like this.

{"processId":"p1","userId":"user1","reportName":"report1","threadId":"12234", "some_other_keys":"respective values"}
{"userId":"user1","processId":"p1","reportName":"report1","threadId":"12335", "some_other_keys":"respective values"}
{"reportName":"report2","processId":"p1","userId":"user1","threadId":"12434", "some_other_keys":"respective values"}
{"threadId":"12734", "some_other_keys":"respective values", "processId":"p1","userId":"user2","reportName":"report1"}
{"processId":"p1","reportName":"report1","threadId":"12534", "some_other_keys":"respective values","userId":"user2"}
{"processId":"p1","userId":"user1","reportName":"report2","threadId":"12934", "some_other_keys":"respective values"}
{"processId":"p1","userId":"user1","reportName":"report1","threadId":"12834", "some_other_keys":"respective values"}
{"processId":"p1","userId":"user1","reportName":"report2","threadId":"12634", "some_other_keys":"respective values"}

Objective: write a function which returns all different sets of lines which has same values of "processId","userId","reportName".

So, in this particular example, the function should return three different sets.

Set1 ( for "processId":"p1","userId":"user1","reportName":"report1"):

{"processId":"p1","userId":"user1","reportName":"report1","threadId":"12234", "some_other_keys":"respective values"}*
{"userId":"user1","processId":"p1","reportName":"report1","threadId":"12335", "some_other_keys":"respective values"}*
{"processId":"p1","userId":"user1","reportName":"report1","threadId":"12834", "some_other_keys":"respective values"}

Set2 ("processId":"p1","userId":"user1","reportName":"report2"):

{"reportName":"report2","processId":"p1","userId":"user1","threadId":"12434", "some_other_keys":"respective values"}*
{"processId":"p1","userId":"user1","reportName":"report2","threadId":"12934", "some_other_keys":"respective values"}
{"processId":"p1","userId":"user1","reportName":"report2","threadId":"12634", "some_other_keys":"respective values"}

Set3 ("processId":"p1","userId":"user2","reportName":"report1"):

{"threadId":"12734", "some_other_keys":"respective values", "processId":"p1","userId":"user2","reportName":"report1"}
{"processId":"p1","reportName":"report1","threadId":"12534", "some_other_keys":"respective values","userId":"user2"}

So, one function is returning three sets (this can be more or less also depending on the number of matching sets)

I need a solution for the above problem as a (a) performance efficient code (b) code with less number of lines, as I'll be processing a large number of lines. So want my code to run faster and also the code should be with fewer lines.

I already have a solution for this problem with multiple if conditions and for loops (I'm using Python json to parse the json and get the elements). But wanted a more efficient code.

IIUC, use itertools.groupby with operator.itemgetter :

from operator import itemgetter
from itertools import groupby

keys = ["processId","userId","reportName"]

f = lambda x: itemgetter(*keys)(x)
srt = sorted(d, key=f)
for k, g in groupby(srt, key=f):
    print(k)
    print(list(g))

Output:

('p1', 'user1', 'report1')
[{'some_other_keys': 'respective values', 'threadId': '12234', 'userId': 'user1', 'processId': 'p1', 'reportName': 'report1'},
 {'some_other_keys': 'respective values', 'threadId': '12335', 'userId': 'user1', 'processId': 'p1', 'reportName': 'report1'}, 
 {'some_other_keys': 'respective values', 'threadId': '12834', 'userId': 'user1', 'processId': 'p1', 'reportName': 'report1'}]
('p1', 'user1', 'report2')
[{'some_other_keys': 'respective values', 'threadId': '12434', 'userId': 'user1', 'processId': 'p1', 'reportName': 'report2'}, 
 {'some_other_keys': 'respective values', 'threadId': '12934', 'userId': 'user1', 'processId': 'p1', 'reportName': 'report2'}, 
 {'some_other_keys': 'respective values', 'threadId': '12634', 'userId': 'user1', 'processId': 'p1', 'reportName': 'report2'}]
('p1', 'user2', 'report1')
[{'threadId': '12734', 'userId': 'user2', 'reportName': 'report1', 'processId': 'p1', 'some_other_keys': 'respective values'}, 
 {'some_other_keys': 'respective values', 'threadId': '12534', 'userId': 'user2', 'processId': 'p1', 'reportName': 'report1'}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM