简体   繁体   English

如何比较两个字典并找到匹配值

[英]How to compare two dicts and find matching values

I'm pulling data from an API for a weather system.我正在为天气系统从 API 中提取数据。 The API returns a single JSON object with sensors broken up into two sub-nodes for each sensor. API 返回单个 JSON object 传感器,每个传感器分为两个子节点。 I'm trying to associate two (or more) sensors with their time-stamps.我正在尝试将两个(或更多)传感器与其时间戳相关联。 Unfortunately, not every sensor polls every single time (although they're supposed to).不幸的是,并非每个传感器每次都进行轮询(尽管他们应该这样做)。

In effect, I have a JSON object that looks like this:实际上,我有一个 JSON object 看起来像这样:

{
    "sensor_data": {
        "mbar": [{
            "value": 1012,
            "timestamp": "2019-10-31T00:15:00"
        }, {
            "value": 1011,
            "timestamp": "2019-10-31T00:30:00"
        }, {
            "value": 1010,
            "timestamp": "2019-10-31T00:45:00"
        }],
        "temperature": [{
            "value": 10.3,
            "timestamp": "2019-10-31T00:15:00"
        }, {
            "value": 10.2,
            "timestamp": "2019-10-31T00:30:00"
        }, {
            "value": 10.0,
            "timestamp": "2019-10-31T00:45:00"
        }, {
            "value": 9.8,
            "timestamp": "2019-10-31T01:00:00"
        }]
    }
}

This examples shows I have one extra temperature reading, and this example is a really small one.这个例子表明我有一个额外的温度读数,这个例子非常小。

How can I take this data and associate a single reading for each timestamp, gathering as much sensor data as I can pull from matching timestamps?如何获取这些数据并为每个时间戳关联一个读数,从匹配的时间戳中收集尽可能多的传感器数据? Ultimately, I want to export the data into a CSV file, with each row representing a slice in time from the sensor, to be graphed or further analyzed after.最终,我想将数据导出到 CSV 文件中,每一行代表来自传感器的时间切片,以便在之后绘制或进一步分析。

For lists that are exactly the same length, I have a solution:对于长度完全相同的列表,我有一个解决方案:

sensor_id = '007_OHMSS'
sensor_data = read_json('sensor_data.json') # wrapper function for open and load json
list_a = sensor_data['mbar']
list_b = sensor_data['temperature']

pair_perfect_sensor_list(sensor_id, list_a, list_b)
def pair_perfect_sensor_lists(sensor_id, list_a, list_b):
    # in this case, list a will be mbar, list_b will be temperature
    matches = list()
    if len(list_a) == len(list_b):
        for idx, reading in enumerate(list_a):
            mbar_value = reading['value']
            timestamp = reading['timestamp']
            t_reading = list_b[idx]
            t_time = t_reading['timestamp']
            temp_value = t_reading['value']
            print(t_time == timestamp)

            if t_time == timestamp:
                match = {
                    'sensor_id': sensor_id,
                    'mbar_index': idx,
                    'time_index': idx,
                    'mbar_value': mbar_value,
                    'temp_value': temp_value,
                    'mbar_time': timestamp,
                    'temp_time': t_time,
                }
                print('here is your match:')
                print(match)
                matches.append(match)
            else:
                print("IMPERFECT!")
                print(t_time)
                print(timestamp)
        return matches
    return failure

When there's not a match, I want to skip a reading for the missing sensor (in this case, the last mbar reading) and just do an N/A.当没有匹配时,我想跳过丢失传感器的读数(在这种情况下,最后一个mbar读数)并只做一个 N/A。

In most cases, the offset is just one node - meaning temp has one extra reading, somewhere in the middle.在大多数情况下,偏移量只是一个节点——这意味着 temp 在中间某处有一个额外的读数。

I was using the idx index to optimize the speed of the process, so I don't have to loop through the second (or third, or nth) dict to see if the timestamp exists in it, but I know that's not preferred either, because dicts aren't ordered.我使用 idx 索引来优化进程的速度,所以我不必遍历第二个(或第三个或第 n 个)dict 来查看时间戳是否存在于其中,但我知道这也不是首选,因为字典没有排序。 In this case, it appears every sub-node sensor dict is ordered by timestamp, so I was trying to leverage that convenience.在这种情况下,似乎每个子节点传感器字典都按时间戳排序,所以我试图利用这种便利。

Is this a common problem?这是个常见的问题吗? If so, just point me to the terminology.如果是这样,请指出我的术语。 But I've searched already and cannot find a reasonable, efficient answer besides "loop through each sub-dict and look for a match".但是我已经搜索过了,除了“遍历每个子字典并寻找匹配项”之外,找不到合理、有效的答案。

Open to any ideas, because I'll have to do this often, and on large (25 MB files or larger, sometimes) JSON objects.接受任何想法,因为我必须经常这样做,并且在大型(25 MB 文件或更大,有时)JSON 对象上。 The full dump is up and over 300 MB, but I've sliced them up by sensor IDs so they're more manageable.完整转储已超过 300 MB,但我已按传感器 ID 对它们进行了切片,因此它们更易于管理。

You can use.get to avoid type errors to get an output like this.您可以使用 .get 来避免类型错误,从而获得这样的 output。

st=yourjsonabove

mbar={}
for item in st['sensor_data']['mbar']: 
    mbar[item['timestamp']] = item['value']

temperature={}
for item in st['sensor_data']['temperature']:
    temperature[item['timestamp']] = item['value']

for timestamp in temperature: 
   print("Timestamp:" , timestamp, "Sensor Reading: ", mbar.get(timestamp), "Temperature Reading: ", temperature[timestamp]) 

leading to output:导致 output:

Timestamp: 2019-10-31T00:15:00 Sensor Reading:  1012 Temperature Reading:  10.3
Timestamp: 2019-10-31T00:30:00 Sensor Reading:  1011 Temperature Reading:  10.2
Timestamp: 2019-10-31T00:45:00 Sensor Reading:  1010 Temperature Reading:  10.0
Timestamp: 2019-10-31T01:00:00 Sensor Reading:  None Temperature Reading:  9.8

Does that help?这有帮助吗?

Try like below.尝试如下。

dict1 = {'red':[1,2,3],'blue':[2,3,4],'orange':[3,4,5]}

dict2 = {'green':[3,4,5],'yellow':[2,3,4],'red':[5,2,6]}

matches = []
for key1 in dict1:
    for key2 in dict2:
        if dict1[key1] == dict2[key2]:
            matches.append((key1, key2))

print(matches)

Output will be Output 将

[('blue', 'yellow'), ('orange', 'green')]

You could make a dict with timestamp keys of your sensor readings like您可以使用传感器读数的时间戳键制作字典,例如

mbar = {s['timestamp']:s['value'] for s in sensor_data['mbar']}
temp = {s['timestamp']:s['value'] for s in sensor_data['temperature']}

Now it is easy to compare using the difference of the key sets现在很容易使用密钥集的差异进行比较

mbar.keys() - temp.keys()
temp.keys() - mbar.keys()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM