[英]python , how split one dataframe in cells into original dataframe?
python Original dataframe ( 2 column ): python 原装 dataframe(2列):
matchNum accumulatedscore
78 [{'periodvalue': 'FirstHalf', 'periodstatus': 'ResultFinal', 'home': '0', 'away': '0'}, {'periodvalue': 'SecondHalf', 'periodstatus': 'ResultFinal', 'home': '1', 'away': '0'}]
56 [{'periodvalue': 'FirstHalf', 'periodstatus': 'ResultFinal', 'home': '2', 'away': '1'}, {'periodvalue': 'SecondHalf', 'periodstatus': 'ResultFinal', 'home': '4', 'away': '3'}]
How can I change them into original dataframe I hope...如何将它们更改为原始 dataframe 我希望...
matchNum home1 away1 home2 away2 matchNum home1 away1 home2 away2
78 0 0 1 0
56 2 1 4 3
It is so difficult.....太难了。。。。。。
Original dataframe:原厂dataframe:
I hope this:我希望这:
The simplest way- without lambdas, just only transformations:).最简单的方法 - 没有 lambda,只有转换:)。 As accumulatedscore
actually contains json values.因为accumulatedscore
分数实际上包含 json 值。
import pandas as pd
import json
d = {
"matchNum": [78, 56],
"accumulatedscore":
[
'[{"periodvalue": "FirstHalf", "periodstatus": "ResultFinal", "home": "0", "away": "0"}, {"periodvalue": "SecondHalf", "periodstatus": "ResultFinal", "home": "1", "away": "0"}]',
'[{"periodvalue": "FirstHalf", "periodstatus": "ResultFinal", "home": "2", "away": "1"}, {"periodvalue": "SecondHalf", "periodstatus": "ResultFinal", "home": "4", "away": "3"}]'
]
}
df = pd.DataFrame(d)
dfa = (
df.join(pd.json_normalize(df["accumulatedscore"].apply(json.loads)))
.rename(columns={0: "dict1", 1: "dict2"})
.drop("accumulatedscore", axis=1)
)
dfb = (
dfa.join(pd.json_normalize(dfa["dict1"]))
.join(pd.json_normalize(dfa["dict2"]), rsuffix="2")
.rename(columns={"home": "home1", "away": "away1"})[["matchNum", "home1", "away1", "home2", "away2"]]
)
dfb
matchNum home1 away1 home2 away2
0 78 0 0 1 0
1 56 2 1 4 3
Assuming your pandas DataFrame be like:假设您的 pandas DataFrame 是这样的:
d = {'matchNum': [78, 56],
'accumulatedscore':["[{'periodvalue': 'FirstHalf', 'periodstatus': 'ResultFinal', 'home': '0', 'away': '0'}, {'periodvalue': 'SecondHalf', 'periodstatus': 'ResultFinal', 'home': '1', 'away': '0'}]",
"[{'periodvalue': 'FirstHalf', 'periodstatus': 'ResultFinal', 'home': '2', 'away': '1'}, {'periodvalue': 'SecondHalf', 'periodstatus': 'ResultFinal', 'home': '4', 'away': '3'}]"
]}
import pandas as pd
df = pd.DataFrame(d)
You can simply convert the string which has the form of a python dictionary ( refer here ):您可以简单地转换具有 python 字典形式的字符串( 请参阅此处):
import ast
df['home1']= df.apply(lambda x: ast.literal_eval(x['accumulatedscore'])[0]['home'] , axis=1)
df['away1']= df.apply(lambda x: ast.literal_eval(x['accumulatedscore'])[0]['away'], axis=1)
df['home2']= df.apply(lambda x: ast.literal_eval(x['accumulatedscore'])[1]['home'], axis=1)
df['away2']= df.apply(lambda x: ast.literal_eval(x['accumulatedscore'])[1]['away'], axis=1)
df = df.drop(columns = 'accumulatedscore')
Your df would be like:你的 df 就像:
matchNum home1 away1 home2 away2
0 78 0 0 1 0
1 56 2 1 4 3
You can extract only the relevant key-value pairs from each dictionary in df['accumulatedscore']
, explode
the Series, convert it to a DataFrame, and combine duplicate indices:您可以仅从df['accumulatedscore']
中的每个字典中提取相关的键值对,分解系列,将其转换为explode
,并组合重复索引:
df1 = (df.merge(df['accumulatedscore']
.apply(lambda lst:tuple({'home'+str(i): d['home'], 'away'+str(i): d['away']}
for i, d in enumerate(lst, 1)))
.explode()
.apply(pd.Series)
.groupby(level=0).first(),
left_index=True, right_index=True)
.drop('accumulatedscore', axis=1))
Output: Output:
matchNum home1 away1 home2 away2
0 78 0 0 1 0
1 56 2 1 4 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.