来自字典中嵌套值的Pandas Dataframe索引

Question

I am creating a pandas dataframe from historical weather data downloaded from weather underground. 我正在根据从地下天气下载的历史天气数据创建熊猫数据框。

import json
import requests
import pandas as pd
import numpy as np
import datetime
from dateutil.parser import parse
address = "http://api.wunderground.com/api/7036740167876b59/history_20060405/q/CA/San_Francisco.json"
r = requests.get(address)
wu_data = r.json()

Because I do not need all the data I only use the list of observations. 因为我不需要所有数据，所以只使用观察列表。 This list contains two elements - date and utcdate - that are actually dictionaries. 该列表包含两个元素-date和utcdate-实际上是字典。

df = pd.DataFrame.from_dict(wu_data["history"]["observations"])

I would like to index the dataframe I have created with the parsed date from the 'pretty' key within the dictionary. 我想从字典中的“ pretty”键中，以解析日期为索引创建的数据框建立索引。 I can access this value by using the array index, but I can't figure out how to do this directly without a loop. 我可以使用数组索引访问此值，但是我无法弄清楚如何直接执行此操作而没有循环。 For example, for the 23th element I can write 例如，对于第23个元素，我可以写

pretty_date  = df["date"].values[23]["pretty"]
print pretty_date
time = parse(pretty_date)
print time

And I get 我得到

11:56 PM PDT on April 05, 2006
2006-04-05 23:56:00

This is what I am doing at the moment 这就是我目前正在做的

g = lambda x: parse(x["pretty"])
df_dates = pd.DataFrame.from_dict(df["date"])
df.index = df_date["date"].apply(g)

df is now reindexed. df现在已重新编制索引。 At this point I can remove the columns I do not need. 此时，我可以删除不需要的列。

Is there a more direct way to do this? 有更直接的方法可以做到这一点吗？

Please notice that sometimes there are multiple observations for the same date, but I deal with data cleaning, duplicates, etc. in a different part of the code. 请注意，有时在同一日期有多个观察值，但是我在代码的不同部分处理数据清除，重复等问题。

Answer 1

Since the dtype held in pretty is just object, you can simply grab them to a list and get indexed. 由于保存在pretty的dtype只是对象，因此您可以简单地将它们捕获到列表中并进行索引。 Not sure if this is what you want: 不确定这是否是您想要的：

# by the way, `r.json` should be without ()`
wu_data = r.json
df = pd.DataFrame.from_dict(wu_data["history"]["observations"])

# just index using list comprehension, getting "pretty" inside df["date"] object.
df.index = [parse(df["date"][n]["pretty"]) for n in range(len(df))]

df.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2006-04-05 00:56:00, ..., 2006-04-05 23:56:00]
Length: 24, Freq: None, Timezone: None

Hope this helps. 希望这可以帮助。

来自字典中嵌套值的Pandas Dataframe索引

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-10-27 10:18:56

来自字典中嵌套值的Pandas Dataframe索引

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-10-27 10:18:56

解决方案1
1 已采纳 2014-10-27 10:18:56