[英]Key/Value Pairs in Pandas Dataframe
I have a dataframe that I created by merging multiple MATLAB .mat
files and then loading the merged list of dictionaries to pandas.我有一个数据框,它是通过合并多个 MATLAB
.mat
文件然后将合并的字典列表加载到 Pandas 来创建的。
KEY_COLUMN VALUE_COLUMN
0 [[[KEY1]], [[KEY2]], [[KEY3]], [[KEY4]]] [[VALUE], [VALUE], [VALUE], [VALUE]]
1 [[[KEY2]], [[KEY3]], [[KEY1]], [[KEY4]]] [[VALUE], [VALUE], [VALUE], [VALUE]]
2 [[[KEY1]], [[KEY3]], [[KEY4]], [[KEY2]]] [[VALUE], [VALUE], [VALUE], [VALUE]]
{'TYPE': {0: array([[array(['START'], dtype='<U5')],
[array(['DIST'], dtype='<U6')],
[array(['DISTFALSE'], dtype='<U7')],
[array(['DISTTRUE'], dtype='<U7')],
[array(['ENCFALSE'], dtype='<U11')],
[array(['ENCTRUE'], dtype='<U12')]], dtype=object),
1: array([[array(['DISTFALSE'], dtype='<U5')],
[array(['START'], dtype='<U10')],
[array(['DIST'], dtype='<U11')],
[array(['DISTTRUE'], dtype='<U11')],
[array(['ENCTRUE'], dtype='<U10')],
[array(['ENCFALSE'], dtype='<U11')]], dtype=object)},
'TIME': {0: array([[ 24413],
[ 27481],
[ 29382],
[ 31923],
[ 31249],
[ 34690]]),
1: array([[ 364582],
[ 31234],
[ 43123],
[ 24444],
[ 55551],
[ 12355]])}}
now I would want to have the KEYS be columns and VALUES be rows of the dataframe like here:现在我想让 KEYS 为列,而 VALUES 为数据框的行,如下所示:
KEY1 KEY2 KEY3 KEY4
0 VALUE VALUE VALUE VALUE
1 VALUE VALUE VALUE VALUE
2 VALUE VALUE VALUE VALUE
The issue is that the order of keys (and consecutively values) is not the same.问题是键(和连续值)的顺序不一样。 It differs between the current rows.
它在当前行之间有所不同。
How to achieve that?如何做到这一点? Many thanks!
非常感谢!
I have used the following approach to solve this:我使用以下方法来解决这个问题:
df = pd.DataFrame({'TYPE': {0: np.array([[np.array(['START'], dtype='<U5')],[np.array(['DIST'], dtype='<U6')],[np.array(['DISTFALSE'], dtype='<U7')],[np.array(['DISTTRUE'], dtype='<U7')],[np.array(['ENCFALSE'], dtype='<U11')],[np.array(['ENCTRUE'], dtype='<U12')]], dtype=object),
1: np.array([[np.array(['DISTFALSE'], dtype='<U5')],[np.array(['START'], dtype='<U10')],[np.array(['DIST'], dtype='<U11')],[np.array(['DISTTRUE'], dtype='<U11')],[np.array(['ENCTRUE'], dtype='<U10')],[np.array(['ENCFALSE'], dtype='<U11')]], dtype=object)},
'TIME': {0: np.array([[ 24413],[ 27481],[ 29382],[ 31923],[ 31249],[ 34690]]),
1: np.array([[ 364582],[ 31234],[ 43123],[ 24444],[ 55551],[ 12355]])}})
# Assuming a df as shown in the problem statement
#Initialize an empty dictionary to hold extracted keys and values
keyvals = {}
for i in range(0, df.shape[0]):
keyrow = df.iloc[i, 0].flatten()
valrow = df.iloc[i, 1].flatten()
for j,k in zip(keyrow, valrow):
try:
keyvals[j].append(k)
except:
keyvals[j] = []
keyvals[j].append(k)
finally:
pass
finDf = pd.DataFrame(dict([(k,pd.Series(v)) for k,v in keyvals.items()]))
finDf is finally in this form: finDf 最终是这种形式:
DISTF START DIST DISTTRUE ENCTRUE ENCFALSE DISTFAL DISTTRU
0 364582.0 24413 27481 24444.0 34690 31249 29382.0 31923.0
1 NaN 31234 43123 NaN 55551 12355 NaN NaN
Let's create a new dataframe by mapping key value pairs inside a list comprehension and using np.squeeze
to remove the single dimensions:让我们通过在列表
np.squeeze
映射键值对并使用np.squeeze
删除单个维度来创建一个新的数据帧:
df1 = pd.DataFrame([dict(zip(*map(np.squeeze, v))) for v in df.to_numpy()])
Result:结果:
# for sample data
KEY1 KEY2 KEY3 KEY4
0 VALUE VALUE VALUE VALUE
1 VALUE VALUE VALUE VALUE
2 VALUE VALUE VALUE VALUE
# for actual data
START DIST DISTFAL DISTTRU ENCFALSE ENCTRUE DISTF DISTTRUE
0 24413 27481 29382.0 31923.0 31249 34690 NaN NaN
1 31234 43123 NaN NaN 12355 55551 364582.0 24444.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.