简体   繁体   English

如何将 dataframe 中的列转换为 python 中的嵌套字典?

[英]How to convert a column in a dataframe to a nested dictionary in python?

I have a column with named work records like this:我有一个列,其中包含这样的命名工作记录:

Records记录
Name: hours on date, Name: hours on date名称:日期的小时数,名称:日期的小时数
Aya: 20 on 18/9/2021, Asmaa: 10 on 20/9/2021, Aya: 20 on 20/9/2021 Aya:2021 年 9 月 18 日 20 人,Asmaa:2021 年 9 月 20 日 10 人,Aya:2021 年 9 月 20 日 20 人

I want to reach a structure for this column, so that when I try to aggregate on a range of dates (say from 1/9/2021 until 30/9/2021), it gives me the total hours spent by each name.我想为这个专栏找到一个结构,这样当我尝试汇总一个日期范围(比如从 2021 年 1 月 9 日到 2021 年 9 月 30 日)时,它会给出每个名字花费的总小时数。

I tried changing the column to a list then to a dictionary, but it is not working.我尝试将列更改为列表,然后再更改为字典,但它不起作用。

How can I change this column structure in python?如何更改 python 中的列结构? Should I use regex?我应该使用正则表达式吗?

{18/9/2021: {Aya:20}, 20/9/2021: {Asmaa:10}, 20/9/2021: {Aya:20} }

You can use a dict here, but it will have to be nested, because you have multiple entries per date.您可以在此处使用字典,但必须嵌套,因为每个日期有多个条目。

import pandas as pd
df = pd.DataFrame({'Records': ['Name: hours on date, Name: hours on date',
  'Aya: 20 on 18/9/2021, Asmaa: 10 on 20/9/2021, Aya: 20 on 20/9/2021']})

# Keep only rows that have the actual data
data = df.loc[~df['Records'].str.contains('Name')]

# Split on the comma delimiter and explode into a unique row per employee
data = data['Records'].str.split(',').explode()

# Use regex to capture the relevant data and construct the dictionary
data = data.str.extract('([a-zA-z]+)\:\s(\d{1,2})\son\s(\d{1,2}\/\d{1,2}\/\d{4})').reset_index(drop=True)

data.groupby(2).apply(lambda x: dict(zip(x[0],x[1]))).to_dict()

Output Output

{'18/9/2021': {'Aya': '20'}, '20/9/2021': {'Asmaa': '10', 'Aya': '20'}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM