简体   繁体   English

提取列中的嵌套元素并存储到新列中

[英]Extracting nested elements in column and storing into new columns

I have some data that I want to expand into new columns.我有一些数据要扩展到新列中。 The data looks like:数据看起来像:

    id  d
0   403 {'cases': 1, 'suspects': 22, 'negative': 0, 's', ...}
1   402 {'cases': 0, 'suspects': 18, 'negative': 0, 's', ...}
2   401 {'cases': 0, 'suspects': 31, 'negative': 0, 's', ...}

I am trying to get it such that the nested column d is spread out into new columns.我试图让嵌套列d展开到新列中。 I can get some of the data from d using:我可以使用以下方法从d获取一些数据:

rows = []
for i, row in myDF.iterrows():
    for stat in row['d']['stats']:
        new_row = {
            **row.to_dict(),
            **stat,
        }
        rows.append(new_row)

However I cannot get it all.但是我不能全部得到。 How can I extract the objects such that I have a new column with the corresponding cases as the observations?如何提取对象,以便我有一个新列,其中包含相应的cases作为观察结果?

Expected Output (column names do not have to be exact):预期输出(列名不必精确):

cases   suspects   negative   diag_casesELISA_sex_F   diag_suspects_sex_M   diag_suspects_sex_F   diag_suspectsPCR_sex_F diag_suspectsPCR_sex_M
  1         22         0                1                      11                     10                   1                           NA
  0         18         0                NA                     9                       9                   NA                          NA
  0         31         0                NA                     12                     18                   NA                          1   

Data:数据:

myDF = pd.DataFrame.from_dict({'id': {0: '403', 1: '402', 2: '401'}, 'd': {0: {'cases': 1, 'suspects': 22, 'negative': 0, 'stats': [{'diag': 'casesELISA', 'sex': 'F', 'cases': 1}, {'diag': 'suspects', 'sex': 'M', 'cases': 11}, {'diag': 'suspects', 'sex': 'F', 'cases': 10}, {'diag': 'suspectsPCR', 'sex': 'F', 'cases': 1}]}, 1: {'cases': 0, 'suspects': 18, 'negative': 0, 'stats': [{'diag': 'suspects', 'sex': 'M', 'cases': 9}, {'diag': 'suspects', 'sex': 'F', 'cases': 9}]}, 2: {'cases': 0, 'suspects': 31, 'negative': 0, 'stats': [{'diag': 'suspects', 'sex': 'M', 'cases': 12}, {'diag': 'suspects', 'sex': 'F', 'cases': 18}, {'diag': 'suspectsPCR', 'sex': 'M', 'cases': 1}]}}})

You can write a custom function here and use pd.Series.apply .您可以在此处编写自定义函数并使用pd.Series.apply

def transform_dict(d):
    new = {}
    for k, v in d.items():
        if isinstance(v, list):
            for _dict in v:
                key = "_".join(
                    [key + "_" + val for key, val in _dict.items() if key != "cases"]
                )
                new[key] = _dict["cases"]
        else:
            new[k] = v
    return pd.Series(new)


out = df["d"].apply(transform_dict)

#out
   cases  suspects  negative  ...  diag_suspects_sex_F  diag_suspectsPCR_sex_F  diag_suspectsPCR_sex_M
0    1.0      22.0       0.0  ...                 10.0                     1.0                     NaN
1    0.0      18.0       0.0  ...                  9.0                     NaN                     NaN
2    0.0      31.0       0.0  ...                 18.0                     NaN                     1.0
#out.columns
Index(
    [
        "cases",
        "suspects",
        "negative",
        "diag_casesELISA_sex_F",
        "diag_suspects_sex_M",
        "diag_suspects_sex_F",
        "diag_suspectsPCR_sex_F",
        "diag_suspectsPCR_sex_M",
    ],
    dtype="object",
)
# out.values
array([[ 1., 22.,  0.,  1., 11., 10.,  1., nan],
       [ 0., 18.,  0., nan,  9.,  9., nan, nan],
       [ 0., 31.,  0., nan, 12., 18., nan,  1.]])

Explanation:解释:

transform_dict(df['d'][0])

cases                      1
suspects                  22
negative                   0
diag_casesELISA_sex_F      1
diag_suspects_sex_M       11
diag_suspects_sex_F       10
diag_suspectsPCR_sex_F     1
dtype: int64

We are transforming every dict in column d to a Series .我们正在将d列中的每个dict转换为Series

Using apply使用申请

def parse_row_dfields(x):
    """
    use in apply axis such as x is a row
    """
    dico ={}
    d = x["d"]
    dico["id"] = x["id"]
    
    dico["cases"] = d["cases"]
    dico["suspects"]  = d["suspects"]
    dico["negative"] = d["negative"]
    stats_list  = d['stats']
    for dico_stat in stats_list:
        dico[dico_stat["diag"]+ "_" +dico_stat["sex"]] = dico_stat["cases"]
    return dico

x = myDF.apply(parse_row_dfields, axis=1) 
pd.DataFrame.from_dict(dict(x)).T

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM