简体   繁体   English

如何根据时间戳同步或合并 dataframe 与 json 数据

[英]how to synchronize or merge dataframe with json data based on timestamp

there are many examples out there on how to merge two pandas dataframe but my Problem is that I have two kind of data.有很多关于如何合并两个 pandas dataframe 的例子,但我的问题是我有两种数据。 data1 is a csv data that I read it with pandas and turn it to dataframe and the other data2 is in json format. data1 是 csv 数据,我使用 pandas 读取它并将其转换为 dataframe 和其他 data2 为 Z4566DEEC1F6DZFC45 格式7。

here is an example of the json data:这是 json 数据的示例:

[{'timestamp': 1572430625231, 'url': 'brakePressure', 'value': 10},
 {'timestamp': 1572430625275, 'url': 'lateralAcceleration', 'value': 120},
 {'timestamp': 1572430625290, 'url': 'longitudinalAcceleration', 'value': 110},
 {'timestamp': 1572430625299, 'url': 'acceleratorPosition', 'value': 1230},
 {'timestamp': 1572430625310, 'url': 'currentTorque', 'value': 10}]

as you can see every feature value is inside a dictionary with a timestamp.如您所见,每个特征值都在带有时间戳的字典中。 the problem is if I convert this to a dataframe the rows would be a timestamp, url and value but I don't want that, I want that my columns (features) would be brakePressure, lateralAcceleration etc.. and in every column there are all values that correspond to that feature and that are inside this json.问题是如果我将其转换为 dataframe 行将是时间戳,url 和值,但我不希望这样,我希望我的列(功能)将是刹车压力,横向加速等。在每一列中与该功能相对应且在此 json 内的所有值。

my Goal is to merge the two datasets based on timestamp.我的目标是根据时间戳合并两个数据集。 this is so hard because here in the json I have a timestamp assosiated with every single feature value, on the contrary in the csv data I have a timestamp that correspond to a row (that mean n feature value and not a single one).这太难了,因为在 json 中,我有一个与每个单个特征值相关的时间戳,相反,在 csv 数据中,我有一个对应于一行的时间戳(这意味着 n 个特征值而不是单个特征值)。 I tried so hard to do this but no chance, so I thought maybe I can search which is the closest timestamp and then replace a single value at a time, here is my try:我非常努力地做到这一点但没有机会,所以我想也许我可以搜索哪个是最接近的时间戳,然后一次替换一个值,这是我的尝试:

def sync_vehicle_gps_data(dataset=vehicle_data, gps_data=gps_data):
    vehicle = dataset.copy()
    gps = gps_data.copy()
    d = {}

    for json in vehicle:

        timestamp, feature, val = json.values()
        index = abs(gps['timestamp'] - timestamp).idxmin()
        print("closest value index = ", index)
        gps.at[index, feature] = val

    return gps

the vehicle_data are the json data and the gps_data is a pandas DataFrame, as you can see I search throught the whole dataset which timestamp is the closest to the single feature timestamp and then I update that specific value but this didn't work well for me. vehicle_data 是 json 数据,而 gps_data 是 pandas DataFrame,如您所见,我在整个数据集中搜索了哪个时间戳值最接近的特定功能,但我没有更新单个时间戳. I ended up having messed up data.我最终搞砸了数据。 Is there any way to do this in python?在 python 中有什么方法可以做到这一点吗? I can also use any other libraries if there is one so I'm not restricted to pandas.如果有任何其他库,我也可以使用任何其他库,因此我不限于 pandas。

the expected output is that I can append those values in the json above to the existing dataframe so that means in this example that a new columns 'brakePressure', 'lateralAcceleraiton' etc.. would be added and the value of each feature(as it is in the json above, yes they are zeros but it's only an example) would be put in the row where the timestamp of that row is the nearest to the timestamp of the timestamp key of each feature in the json above. the expected output is that I can append those values in the json above to the existing dataframe so that means in this example that a new columns 'brakePressure', 'lateralAcceleraiton' etc.. would be added and the value of each feature(as it在上面的 json 中,是的,它们是零,但这只是一个示例)将放在该行的时间戳最接近上面 json 中每个功能的时间戳键的时间戳的行中。 I know it is a very complex problem, it's not easy to explain but I hope you understand what I mean.我知道这是一个非常复杂的问题,很难解释,但我希望你明白我的意思。 here is an example: let's say this is the gps data这是一个例子:假设这是 gps 数据

      timestamp        X      Y     Z 
      1572430510880  595.00  179.00 -60.00
      1572430510890   -0.23   -0.09   0.01
      1572430510900   -0.11   -0.02   0.04
      1572430510910   -1.96   -5.19  -6.10

I want this output ( I ll show this only for one feature):我想要这个 output(我将只为一个功能展示这个):

      timestamp        X      Y        Z    brakePressure
      1572430510880  595.00  179.00 -60.00   10
      1572430510890   -0.23   -0.09   0.01   nan
      1572430510900   -0.11   -0.02   0.04   nan
      1572430510910   -1.96   -5.19  -6.10   nan

the value of the brakePressure feature in the dictionary was put in the first row because the closest gps timestamp to the timestamp of the brakePressure in the dictionary above is also in the first row.字典中的刹车压力特征的值被放在第一行,因为最接近上述字典中刹车压力时间戳的 gps 时间戳也在第一行。 Basically I want to do that same thing for all the features in the json, I want to synchronize all those feature values with the gps data基本上我想对 json 中的所有功能做同样的事情,我想将所有这些功能值与 gps 数据同步

Ref: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html参考: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html

You can create 0's and 1's as column values:您可以创建 0 和 1 作为列值:

import json

json_data = [{}, ]
df1 = pd.read_json(json.dumps(json_data))

which gives这使

                timestamp                       url  value
0 2019-10-30 10:17:05.231             brakePressure     10
1 2019-10-30 10:17:05.275       lateralAcceleration    120
2 2019-10-30 10:17:05.290  longitudinalAcceleration    110
3 2019-10-30 10:17:05.299       acceleratorPosition   1230
4 2019-10-30 10:17:05.310             currentTorque     10

And, then接着

ready_to_join_df = pd.get_dummies(df1, prefix="", prefix_sep="")

which results in:这导致:

                timestamp  value  acceleratorPosition  brakePressure  currentTorque  lateralAcceleration  longitudinalAcceleration
0 2019-10-30 10:17:05.231     10                    0              1              0                    0                         0
1 2019-10-30 10:17:05.275    120                    0              0              0                    1                         0
2 2019-10-30 10:17:05.290    110                    0              0              0                    0                         1
3 2019-10-30 10:17:05.299   1230                    1              0              0                    0                         0
4 2019-10-30 10:17:05.310     10                    0              0              1                    0                         0

And now you can join two dataframes on key timestamp.现在您可以在关键时间戳上加入两个数据帧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据时间戳将多个 JSON 对象合并到 Python DataFrame 中? - How to merge multiple JSON objects into a Python DataFrame based on timestamp? 基于时间戳合并数据框中的行 - Merge rows in dataframe based on Timestamp 如何合并两个数据帧,其中一个数据帧具有开始时间/结束时间,而另一个数据帧具有时间戳数据 - How to merge two dataframe where one dataframe has the start time/end time and the other dataframe has the timestamp data 如何基于时间戳合并3个Pandas数据帧? - How to merge 3 Pandas Dataframes based on Timestamp? 如何合并数据框中的某些数据 - How to merge some data in dataframe 如何合并多个数据帧并按时间戳对它们进行排序 - Pandas Python - How merge multiple dataframe and sort them by timestamp - Pandas Python 如何将数据帧与数组时间戳合并并在数组条件下绘图? - How to merge dataframe with array timestamp and plot on the condition of an array? 如何合并两个不同的数据帧,但时间戳略有不同 - How to merge two different dataframe with a slight difference in timestamp 根据 dataframe 1 上的值以及 dataframe 2 中的索引和列合并 2 个数据框 - Merge 2 data frame based on values on dataframe 1 and index and column from dataframe 2 如何根据前一行合并数据框中的行? - How to merge rows in a Dataframe based on a previous row?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM