简体   繁体   中英

Filter Json with Ids contained in csv sheet using python

I have a csv file with some "id". I imported a json file and I needed to filter from this Json only the ids that are in the worksheet Does anyone knows how to do that? I have no idea, I am very new in python. I am usin Jupyter notebook

How to filter data fetching from variable var_filter

import json
import pandas as pd
from IPython.display import display

# read csv with ids
var_filter = pd.read_csv('file.csv')
display(act_filter)


# Load json
with open('file.json') as f:
  data = json.load(f)
print(data)

The json structure is:

[
    {
        "id": "179328741654819",
        "t_values": [
            {
                "t_id": "963852456741",
                "value": "499.66",
                "date_timestamp": "2020-09-22T15:18:17",
                "type": "in"
            },
            {
                "t_id": "852951753456",
                "value": "1386.78",
                "date_timestamp": "2020-10-31T14:46:44",
                "type": "in"
            }
        ]
    },
    {
        "id": "823971648264792",
        "t_values": [
            {
                "t_id": "753958561456",
                "value": "672.06",
                "date_timestamp": "2020-03-16T22:41:16",
                "type": "in"
            },
            {
                "t_id": "321147951753",
                "value": "773.88",
                "date_timestamp": "2020-05-08T18:29:31",
                "type": "out"
            },
            {
                "t_id": "258951753852",
                "value": "733.13",
                "date_timestamp": null,
                "type": "in"
            }
        ]
    }
]   

You can iterate over the elements in the data variable and check if its id value is in the dataframe's id column. Simple method below, see this article for other methods

Note that I convert the value of the JSONs id to an int as that is what pandas is using as value type for the column

code

import json
from pprint import pprint
import pandas as pd


var_filter = pd.read_csv("id.csv")

# Load json
with open("data.json") as f:
    data = json.load(f)


result = []
for elem in data:
    if int(elem["id"]) in var_filter["id"].values:
        result.append(elem)
pprint(result)

id.csv

id
823971648264792

output

[{'id': '823971648264792',
  't_values': [{'date_timestamp': '2020-03-16T22:41:16',
                't_id': '753958561456',
                'type': 'in',
                'value': '672.06'},
               {'date_timestamp': '2020-05-08T18:29:31',
                't_id': '321147951753',
                'type': 'out',
                'value': '773.88'},
               {'date_timestamp': None,
                't_id': '258951753852',
                'type': 'in',
                'value': '733.13'}]}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM