[英]A dictionary in a Pandas dataframe column in Python
I am reading a csv file that a column contains a multi keys dict.我正在阅读一个 csv 文件,其中一列包含一个多键字典。 Here is an example:
这是一个例子:
import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b':[{'AUS': {'arv': '10:00', 'vol': 5}, 'DAL': {'arv': '9:00', 'vol': 1}}, {'DAL': {'arv': '10:00', 'vol': 6}, 'NYU': {'arv': '10:00', 'vol': 3}}, {'DAL': {'arv': '8:00', 'vol': 6}, 'DAL': {'arv': '10:00', 'vol': 1}, 'GBD': {'arv': '12:00', 'vol': 1}}]})
What I am trying to do is perform a query on the column b of the above dataframe and return the corresponding values as presented in the following.我要做的是对上述 dataframe 的 b 列执行查询,并返回如下所示的相应值。 However, I am trying to see if there is a more intuitive and more efficient way to perform similar operations in a large dataset without looping through the dict.
但是,我想看看是否有一种更直观、更有效的方法来在大型数据集中执行类似的操作,而无需遍历 dict。
#convert column b of df to a dict
df_dict = df.b.to_dict()
print(df_dict)
{0: {'AUS': {'arv': '10:00', 'vol': 5}, 'DAL': {'arv': '9:00', 'vol': 1}}, 1: {'DAL': {'arv': '10:00', 'vol': 6}, 'NYU': {'arv': '10:00', 'vol': 3}}, 2: {'DAL': {'arv': '10:00', 'vol': 1}, 'GBD': {'arv': '12:00', 'vol': 1}}}
def get_value(my_str, my_time):
total = 0
for key in df_dict:
if my_str in df_dict[key].keys():
if df_dict[key].get(my_str).get('arv') == my_time:
total = total + df_dict[key].get(my_str).get('vol')
return total
print("total vol is at 10:00 is: ", get_value('DAL', '10:00'))
total vol is at 10:00 is: 7
I suggest you to reorganize your data presentation in DataFrame:我建议你在 DataFrame 中重新组织你的数据表示:
>>> from collections import defaultdict, Counter
>>> import pandas as pd
>>> input_data = {0: {"AUS": {"arv": "10:00", "vol": 5}, "DAL": {"arv": "9:00", "vol": 1}}, 1: {"DAL": {"arv": "10:00", "vol": 6}, "NYU": {"arv": "10:00", "vol": 3}}, 2: {"DAL": {"arv": "10:00", "vol": 1}, "GBD": {"arv": "12:00", "vol": 1}}}
>>> data = defaultdict(Counter)
>>> for value in input_data.values():
... for name in value:
... data[value[name]["arv"]][name] = value[name]["vol"]
...
>>> data
defaultdict(<class "collections.Counter">, {"10:00": Counter({"DAL": 7, "AUS": 5, "NYU": 3}), "9:00": Counter({"DAL": 1}), "12:00": Counter({"GBD": 1})})
>>> frame = pd.DataFrame(data).T
>>> frame
AUS DAL NYU GBD
10:00 5.0 7.0 3.0 NaN
9:00 NaN 1.0 NaN NaN
12:00 NaN NaN NaN 1.0
>>> frame[frame.index == "10:00"]["DAL"]
10:00 7.0
Name: DAL, dtype: float64
While dukkee's answer works, I believe if you want to manipulate your dataframe in other ways his organization is a bit counterintuitive.虽然 dukkee 的回答有效,但我相信如果你想以其他方式操纵你的 dataframe,他的组织有点违反直觉。 I would also reorganize the dataframe, though this way:
我还将重组 dataframe,尽管这样:
input_data = {
'a':[1,2,3],
'b':[{'AUS': {'arv': '10:00', 'vol': 5},
'DAL': {'arv': '9:00', 'vol': 1}
},
{'DAL': {'arv': '10:00', 'vol': 6},
'NYU': {'arv': '10:00', 'vol': 3}
},
{'DAL': {'arv': '8:00', 'vol': 6},
'DAL': {'arv': '10:00', 'vol': 1},
'GBD': {'arv': '12:00', 'vol': 1}
}]
}
data_list = [[input_data['a'][i], key, value['arv'], value['vol']]
for i, dic in enumerate(input_data['b'])
for key, value in dic.items()]
df = pd.DataFrame(data_list, columns=['a', 'abr', 'arv', 'vol'])
Which results in:结果是:
>>> df
a abr arv vol
0 1 AUS 10:00 5
1 1 DAL 9:00 1
2 2 DAL 10:00 6
3 2 NYU 10:00 3
4 3 DAL 10:00 1
5 3 GBD 12:00 1
I believe that's the way you should organize your data.我相信这就是您应该组织数据的方式。 Having dictionaries as values in a dataframe seems counterintuitive to me.
将字典作为 dataframe 中的值对我来说似乎违反直觉。 This way you can use
loc
to solve your problem:这样您就可以使用
loc
来解决您的问题:
>>> df.loc[(df['arv']=='10:00') & (df['abr']=='DAL')]
a abr arv vol
2 2 DAL 10:00 6
4 3 DAL 10:00 1
>>> vol_sum = sum(df.loc[(df['arv']=='10:00') & (df['abr']=='DAL')]['vol'])
>>> print(f"total vol at 10:00 is: {vol_sum}")
"total vol at 10:00 is: 7"
Little plus compared to dukkee: no need to use collections, and list comprehensions are faster than for-loops!与 dukkee 相比,一点优势:无需使用 collections,列表推导比 for 循环更快! Note that in one of your dictionaries you have two times
'DAL'
as a key, so the first one gets erased.请注意,在您的一个字典中,您有两次
'DAL'
作为键,因此第一个被删除。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.