简体   繁体   中英

Merge and plot multiple pandas dataframes

Would like help to accomplish the following:

Summary

  1. Re-index a data frame
  2. Merge multiple data frames
  3. Plot the new data frame based on a time column

More Details

I have a data set (raw_data) that looks like this:

id  timestamp       key             value
1   1576086899000   temperature     70
2   1576086899000   sleep           8
3   1576086899000   heartrate       65
4   1576086876000   temperature     72
5   1576086876000   sleep           7.5
6   1576086876000   heartrate       62
7   1576086866000   temperature     74
8   1576086866000   sleep           7.8
9   1576086866000   heartrate       64

I pivoted it using the following:

df = rawdata.pivot(index='timestamp', columns='key', values='value')

This made the index a timestamp value, and each column a key name with it's corresponding value.

Because each of the rows does not always contain a value for each key/value pair, I created a new data frame for the specific key, and dropped any NaN values:

sleep_df = pd.DataFrame({'date': df.index, 'value': df.sleep}).dropna()

This still kept the index as column timestamp, but created a duplicate column called time. I then formatted the time column as a year-month-day value with:

sleep_df['date]' = pd.to_datetime(sleep_df['date'], unit='ms').map(lambda x: x.strftime('%Y-%m-%d'))

Therefore, my resulting data set looks like for each of these tables looks like the following:

timestamp       date         sleep            
1576086899000   2020-04-05      8
1576086876000   2020-04-04     7.5
1576086866000   2020-04-03     7.8

My end goal would be to:

  1. Merge each of these tables and plot them against the time column. I believe the index should be kept timestamp for this reason, because it can then merge values where the timestamp the data was recorded was the same.
  2. In future analysis, I'd love to figure out if I could merge data based on the date rather than on the timestamp since some data may have not been recorded at the exact time. Would I have to make the index: "date" instead? I assume I'd have to make sure that there was only one entry for each individual date otherwise merging tables could get funky.
  3. I think I figured out the plotting for this data. I made the index of the table to be the date field, and converted all the values to be type int and it plotted against the date just fine. Is there a better way to do this?

Thank you for the help in advance, SO has been so great as a learning tool.

If your goal is to plot the time series for each key, I would suggest not pivoting, separating dataframes and re-merging them. I would suggest working on the initial DataFrame directly, as you can draw a plot where each line represents a specific key, for instance with seaborn :

import pandas as pd
import seaborn as sns
df = pd.DataFrame({"timestamp": pd.date_range("2020-04-20 00:00:00", periods=8, freq="D"),
                   "key": ["temperature", "sleep", "heartrate", "temperature", "sleep", "heartrate", "temperature", "sleep"],
                   "value": [70, 8, 65, 72, 7.5, 62, 74, 7.8]})
df["date"] = pd.to_datetime(df["timestamp"])
g = sns.relplot(x="date", y="value", hue="key", kind="line", data=df)
g.fig.autofmt_xdate()

It is not necessary to index by date . Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM