[英]Change a column containing list of dict to columns in a DataFrame
I have the following DataFrame which contains a column that is a list of dict items:我有以下 DataFrame 包含一个列,该列是 dict 项的列表:
d = pd.DataFrame([
['Green', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
['Apply', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
['Range', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
['Peop', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]]
], columns=['Name', 'Legal Description'])
and I want to transform it to a simple DataFrame like so:我想把它转换成一个简单的 DataFrame 像这样:
d = pd.DataFrame([
['Green', 'STERLING GREEN SO', '01', 'L0038', 'B0008'],
['Apply', 'STERLING GREEN SO', '01', 'L0038', 'B0008'],
['Range', 'STERLING GREEN SO', '01', 'L0038', 'B0008'],
['Peop', 'STERLING GREEN SO', '01', 'L0038', 'B0008']
], columns=['Name', 'Legal Description', 'Desc', 'Sec', 'Lot', 'Block'])
IMO, the ideal solution would be to act upstream and get a properly formatted dictionary or dataframe. IMO,理想的解决方案是在上游采取行动并获得格式正确的字典或 dataframe。
The issue with your list of single-keyed dictionaries is that you have to merge them.您的单键字典列表的问题是您必须合并它们。 You can use a dictionary comprehension for that and convert to Series:
您可以为此使用字典理解并转换为系列:
d2 = d['Legal Description'].apply(lambda c:
pd.Series({next(iter(x.keys())).strip(':'):
next(iter(x.values())) for x in c})
)
Then join to the original dataframe:然后加入原dataframe:
d.drop(columns='Legal Description').join(d2)
output: output:
Name Desc Sec Lot Block
0 Green STERLING GREEN SO 01 L0038 B0008
1 Apply STERLING GREEN SO 01 L0038 B0008
2 Range STERLING GREEN SO 01 L0038 B0008
3 Peop STERLING GREEN SO 01 L0038 B0008
If possible, you should wrangle your data before creating the DataFrame.如果可能,您应该在创建 DataFrame 之前整理您的数据。 It's faster than reshaping the DataFrame after being created.
它比创建后重新塑造 DataFrame 更快。 For instance, something like
例如,像
data = [
['Green', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
['Apply', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
['Range', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
['Peop', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]]
]
records = []
for row in data:
rec = {}
name, legal_desc = row
rec['Name'] = name
rec.update(x for d in legal_desc for x in d.items())
records.append(rec)
d = pd.DataFrame(records)
Output: Output:
>>> d
Name Desc: Sec: Lot: Block:
0 Green STERLING GREEN SO 01 L0038 B0008
1 Apply STERLING GREEN SO 01 L0038 B0008
2 Range STERLING GREEN SO 01 L0038 B0008
3 Peop STERLING GREEN SO 01 L0038 B0008
>>> records
[{'Name': 'Green', 'Desc:': 'STERLING GREEN SO', 'Sec:': '01', 'Lot:': 'L0038', 'Block:': 'B0008'}, {'Name': 'Apply', 'Desc:': 'STERLING GREEN SO', 'Sec:': '01', 'Lot:': 'L0038', 'Block:': 'B0008'}, {'Name': 'Range', 'Desc:': 'STERLING GREEN SO', 'Sec:': '01', 'Lot:': 'L0038', 'Block:': 'B0008'}, {'Name': 'Peop', 'Desc:': 'STERLING GREEN SO', 'Sec:': '01', 'Lot:': 'L0038', 'Block:': 'B0008'}]
You can also use:您还可以使用:
df.set_index('Name', inplace=True)
df = df['Legal Description'].explode().apply(pd.Series).groupby(level=0).sum().reset_index()
OUTPUT
Name Desc: Sec: Lot: Block:
0 Apply STERLING GREEN SO 01 L0038 B0008
1 Green STERLING GREEN SO 01 L0038 B0008
2 Peop STERLING GREEN SO 01 L0038 B0008
3 Range STERLING GREEN SO 01 L0038 B0008
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.