Would like help to accomplish the following:
Summary
More Details
I have a data set (raw_data) that looks like this:
id timestamp key value
1 1576086899000 temperature 70
2 1576086899000 sleep 8
3 1576086899000 heartrate 65
4 1576086876000 temperature 72
5 1576086876000 sleep 7.5
6 1576086876000 heartrate 62
7 1576086866000 temperature 74
8 1576086866000 sleep 7.8
9 1576086866000 heartrate 64
I pivoted it using the following:
df = rawdata.pivot(index='timestamp', columns='key', values='value')
This made the index a timestamp value, and each column a key name with it's corresponding value.
Because each of the rows does not always contain a value for each key/value pair, I created a new data frame for the specific key, and dropped any NaN values:
sleep_df = pd.DataFrame({'date': df.index, 'value': df.sleep}).dropna()
This still kept the index as column timestamp, but created a duplicate column called time. I then formatted the time column as a year-month-day value with:
sleep_df['date]' = pd.to_datetime(sleep_df['date'], unit='ms').map(lambda x: x.strftime('%Y-%m-%d'))
Therefore, my resulting data set looks like for each of these tables looks like the following:
timestamp date sleep
1576086899000 2020-04-05 8
1576086876000 2020-04-04 7.5
1576086866000 2020-04-03 7.8
My end goal would be to:
timestamp
for this reason, because it can then merge values where the timestamp the data was recorded was the same.date
field, and converted all the values to be type int
and it plotted against the date just fine. Is there a better way to do this?Thank you for the help in advance, SO has been so great as a learning tool.
If your goal is to plot the time series for each key, I would suggest not pivoting, separating dataframes and re-merging them. I would suggest working on the initial DataFrame directly, as you can draw a plot where each line represents a specific key, for instance with seaborn
:
import pandas as pd
import seaborn as sns
df = pd.DataFrame({"timestamp": pd.date_range("2020-04-20 00:00:00", periods=8, freq="D"),
"key": ["temperature", "sleep", "heartrate", "temperature", "sleep", "heartrate", "temperature", "sleep"],
"value": [70, 8, 65, 72, 7.5, 62, 74, 7.8]})
df["date"] = pd.to_datetime(df["timestamp"])
g = sns.relplot(x="date", y="value", hue="key", kind="line", data=df)
g.fig.autofmt_xdate()
It is not necessary to index by date
. Hope it helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.