[英]Extracting nested elements in column and storing into new columns
I have some data that I want to expand into new columns.我有一些数据要扩展到新列中。 The data looks like:数据看起来像:
id d
0 403 {'cases': 1, 'suspects': 22, 'negative': 0, 's', ...}
1 402 {'cases': 0, 'suspects': 18, 'negative': 0, 's', ...}
2 401 {'cases': 0, 'suspects': 31, 'negative': 0, 's', ...}
I am trying to get it such that the nested column d
is spread out into new columns.我试图让嵌套列d
展开到新列中。 I can get some of the data from d
using:我可以使用以下方法从d
获取一些数据:
rows = []
for i, row in myDF.iterrows():
for stat in row['d']['stats']:
new_row = {
**row.to_dict(),
**stat,
}
rows.append(new_row)
However I cannot get it all.但是我不能全部得到。 How can I extract the objects such that I have a new column with the corresponding cases
as the observations?如何提取对象,以便我有一个新列,其中包含相应的cases
作为观察结果?
Expected Output (column names do not have to be exact):预期输出(列名不必精确):
cases suspects negative diag_casesELISA_sex_F diag_suspects_sex_M diag_suspects_sex_F diag_suspectsPCR_sex_F diag_suspectsPCR_sex_M
1 22 0 1 11 10 1 NA
0 18 0 NA 9 9 NA NA
0 31 0 NA 12 18 NA 1
Data:数据:
myDF = pd.DataFrame.from_dict({'id': {0: '403', 1: '402', 2: '401'}, 'd': {0: {'cases': 1, 'suspects': 22, 'negative': 0, 'stats': [{'diag': 'casesELISA', 'sex': 'F', 'cases': 1}, {'diag': 'suspects', 'sex': 'M', 'cases': 11}, {'diag': 'suspects', 'sex': 'F', 'cases': 10}, {'diag': 'suspectsPCR', 'sex': 'F', 'cases': 1}]}, 1: {'cases': 0, 'suspects': 18, 'negative': 0, 'stats': [{'diag': 'suspects', 'sex': 'M', 'cases': 9}, {'diag': 'suspects', 'sex': 'F', 'cases': 9}]}, 2: {'cases': 0, 'suspects': 31, 'negative': 0, 'stats': [{'diag': 'suspects', 'sex': 'M', 'cases': 12}, {'diag': 'suspects', 'sex': 'F', 'cases': 18}, {'diag': 'suspectsPCR', 'sex': 'M', 'cases': 1}]}}})
You can write a custom function here and use pd.Series.apply
.您可以在此处编写自定义函数并使用pd.Series.apply
。
def transform_dict(d):
new = {}
for k, v in d.items():
if isinstance(v, list):
for _dict in v:
key = "_".join(
[key + "_" + val for key, val in _dict.items() if key != "cases"]
)
new[key] = _dict["cases"]
else:
new[k] = v
return pd.Series(new)
out = df["d"].apply(transform_dict)
#out
cases suspects negative ... diag_suspects_sex_F diag_suspectsPCR_sex_F diag_suspectsPCR_sex_M
0 1.0 22.0 0.0 ... 10.0 1.0 NaN
1 0.0 18.0 0.0 ... 9.0 NaN NaN
2 0.0 31.0 0.0 ... 18.0 NaN 1.0
#out.columns
Index(
[
"cases",
"suspects",
"negative",
"diag_casesELISA_sex_F",
"diag_suspects_sex_M",
"diag_suspects_sex_F",
"diag_suspectsPCR_sex_F",
"diag_suspectsPCR_sex_M",
],
dtype="object",
)
# out.values
array([[ 1., 22., 0., 1., 11., 10., 1., nan],
[ 0., 18., 0., nan, 9., 9., nan, nan],
[ 0., 31., 0., nan, 12., 18., nan, 1.]])
Explanation:解释:
transform_dict(df['d'][0])
cases 1
suspects 22
negative 0
diag_casesELISA_sex_F 1
diag_suspects_sex_M 11
diag_suspects_sex_F 10
diag_suspectsPCR_sex_F 1
dtype: int64
We are transforming every dict in column d
to a Series
.我们正在将d
列中的每个dict转换为Series
。
Using apply使用申请
def parse_row_dfields(x):
"""
use in apply axis such as x is a row
"""
dico ={}
d = x["d"]
dico["id"] = x["id"]
dico["cases"] = d["cases"]
dico["suspects"] = d["suspects"]
dico["negative"] = d["negative"]
stats_list = d['stats']
for dico_stat in stats_list:
dico[dico_stat["diag"]+ "_" +dico_stat["sex"]] = dico_stat["cases"]
return dico
x = myDF.apply(parse_row_dfields, axis=1)
pd.DataFrame.from_dict(dict(x)).T
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.