繁体   English   中英

如何根据 pandas dataframe 中的匹配条件对整行进行 append?

[英]How to append entire rows based on matching conditions in a pandas dataframe?

我有一个 dataframe 看起来像:

import pandas as pd
df_ref = pd.DataFrame({'district':['A Nzo DM','A Nzo DM','uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM'],
'visit_date':['2021-07-31','2021-07-31','2021-07-31','2021-07-31','2021-08-31','2021-08-31','2021-08-31'],
'province':['EC','EC','NC','NC','NC','NC','NC'],
'age_group':['35-49','50-59','18-34','35-49','18-34','35-49','Unidentified'],
'sex':['Male','Female','Female','Male','Female','Male','Female'],
'vaccinations':[1,5,6,8,9,10,14]})

初始表 数据将用于数据可视化软件我需要每个district的每个'visit_date (already sampled to month) to be mapped [![enter image description here][1]][1] whereby each性别(Male and Female) has these age groups mapped to it (18-34,35-49,50-59,60+,Unidentified) for each month = ( visit_date`)。 结果将是:

maz = {'district':['A Nzo DM','A Nzo DM','A Nzo DM','A Nzo DM','A Nzo DM',
'A Nzo DM','A Nzo DM','A Nzo DM','A Nzo DM','A Nzo DM',
'uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM',
'uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM',
'uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM',
'uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM','uMgungundlovu DM'],
'visit_date':['2021-07-31','2021-07-31','2021-07-31','2021-07-31','2021-07-31',
                '2021-07-31','2021-07-31','2021-07-31','2021-07-31','2021-07-31',
                '2021-07-31','2021-07-31','2021-07-31','2021-07-31','2021-07-31',
                '2021-07-31','2021-07-31','2021-07-31','2021-07-31','2021-07-31',
                '2021-08-31','2021-08-31','2021-08-31','2021-08-31','2021-08-31',
                '2021-08-31','2021-08-31','2021-08-31','2021-08-31','2021-08-31'],
'province':['EC','EC','EC','EC','EC',
            'EC','EC','EC','EC','EC',
            'NC','NC','NC','NC','NC',
            'NC','NC','NC','NC','NC',
            'NC','NC','NC','NC','NC',
            'NC','NC','NC','NC','NC'],
'age_group':['18-34','35-49','50-59','60+','Unidentified',
                '18-34','35-49','50-59','60+','Unidentified',
                '18-34','35-49','50-59','60+','Unidentified',
                '18-34','35-49','50-59','60+','Unidentified',
                '18-34','35-49','50-59','60+','Unidentified',
                '18-34','35-49','50-59','60+','Unidentified'],
'sex':['Male','Female','Male','Female',
       'Male','Female','Male','Female',
       'Male','Female','Male','Female',
       'Male','Female','Male','Female',
       'Male','Female','Male','Female',
       'Male','Female','Male','Female',
       'Male','Female','Male','Female',
       'Male','Female'],}
df_output = pd.DataFrame(maz)

输出

IIUC,您需要“age_group”和“sex”列的乘积,然后与列的 rest 进行“交叉”合并,然后删除重复项

t = pd.DataFrame(
    itertools.product(df_ref["age_group"], df_ref["sex"]), columns=["age_group", "sex"]
).drop_duplicates(ignore_index=True)
out = pd.merge(
    df_ref[["district", "visit_date", "province"]], t, how="cross"
).drop_duplicates(ignore_index=True)

打印出):

注意:这没有 60+,因为输入 dataframe 没有它。

            district  visit_date province     age_group     sex
0           A Nzo DM  2021-07-31       EC         35-49    Male
1           A Nzo DM  2021-07-31       EC         35-49  Female
2           A Nzo DM  2021-07-31       EC         50-59    Male
3           A Nzo DM  2021-07-31       EC         50-59  Female
4           A Nzo DM  2021-07-31       EC         18-34    Male
5           A Nzo DM  2021-07-31       EC         18-34  Female
6           A Nzo DM  2021-07-31       EC  Unidentified    Male
7           A Nzo DM  2021-07-31       EC  Unidentified  Female
8   uMgungundlovu DM  2021-07-31       NC         35-49    Male
9   uMgungundlovu DM  2021-07-31       NC         35-49  Female
10  uMgungundlovu DM  2021-07-31       NC         50-59    Male
11  uMgungundlovu DM  2021-07-31       NC         50-59  Female
12  uMgungundlovu DM  2021-07-31       NC         18-34    Male
13  uMgungundlovu DM  2021-07-31       NC         18-34  Female
14  uMgungundlovu DM  2021-07-31       NC  Unidentified    Male
15  uMgungundlovu DM  2021-07-31       NC  Unidentified  Female
16  uMgungundlovu DM  2021-08-31       NC         35-49    Male
17  uMgungundlovu DM  2021-08-31       NC         35-49  Female
18  uMgungundlovu DM  2021-08-31       NC         50-59    Male
19  uMgungundlovu DM  2021-08-31       NC         50-59  Female
20  uMgungundlovu DM  2021-08-31       NC         18-34    Male
21  uMgungundlovu DM  2021-08-31       NC         18-34  Female
22  uMgungundlovu DM  2021-08-31       NC  Unidentified    Male
23  uMgungundlovu DM  2021-08-31       NC  Unidentified  Female

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM