简体   繁体   English

来自字典中嵌套值的Pandas Dataframe索引

[英]Pandas Dataframe index from nested values in dictionary

I am creating a pandas dataframe from historical weather data downloaded from weather underground. 我正在根据从地下天气下载的历史天气数据创建熊猫数据框。

import json
import requests
import pandas as pd
import numpy as np
import datetime
from dateutil.parser import parse
address = "http://api.wunderground.com/api/7036740167876b59/history_20060405/q/CA/San_Francisco.json"
r = requests.get(address)
wu_data = r.json()

Because I do not need all the data I only use the list of observations. 因为我不需要所有数据,所以只使用观察列表。 This list contains two elements - date and utcdate - that are actually dictionaries. 该列表包含两个元素-date和utcdate-实际上是字典。

df = pd.DataFrame.from_dict(wu_data["history"]["observations"])

I would like to index the dataframe I have created with the parsed date from the 'pretty' key within the dictionary. 我想从字典中的“ pretty”键中,以解析日期为索引创建的数据框建立索引。 I can access this value by using the array index, but I can't figure out how to do this directly without a loop. 我可以使用数组索引访问此值,但是我无法弄清楚如何直接执行此操作而没有循环。 For example, for the 23th element I can write 例如,对于第23个元素,我可以写

pretty_date  = df["date"].values[23]["pretty"]
print pretty_date
time = parse(pretty_date)
print time

And I get 我得到

11:56 PM PDT on April 05, 2006
2006-04-05 23:56:00

This is what I am doing at the moment 这就是我目前正在做的

g = lambda x: parse(x["pretty"])
df_dates = pd.DataFrame.from_dict(df["date"])
df.index = df_date["date"].apply(g)

df is now reindexed. df现在已重新编制索引。 At this point I can remove the columns I do not need. 此时,我可以删除不需要的列。

Is there a more direct way to do this? 有更直接的方法可以做到这一点吗?

Please notice that sometimes there are multiple observations for the same date, but I deal with data cleaning, duplicates, etc. in a different part of the code. 请注意,有时在同一日期有多个观察值,但是我在代码的不同部分处理数据清除,重复等问题。

Since the dtype held in pretty is just object, you can simply grab them to a list and get indexed. 由于保存在prettydtype只是对象,因此您可以简单地将它们捕获到列表中并进行索引。 Not sure if this is what you want: 不确定这是否是您想要的:

# by the way, `r.json` should be without ()`
wu_data = r.json
df = pd.DataFrame.from_dict(wu_data["history"]["observations"])

# just index using list comprehension, getting "pretty" inside df["date"] object.
df.index = [parse(df["date"][n]["pretty"]) for n in range(len(df))]

df.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2006-04-05 00:56:00, ..., 2006-04-05 23:56:00]
Length: 24, Freq: None, Timezone: None

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM