Pandas 数据框中的键/值对

Question

I have a dataframe that I created by merging multiple MATLAB .mat files and then loading the merged list of dictionaries to pandas.我有一个数据框，它是通过合并多个 MATLAB .mat文件然后将合并的字典列表加载到 Pandas 来创建的。

    KEY_COLUMN                                  VALUE_COLUMN
0   [[[KEY1]], [[KEY2]], [[KEY3]], [[KEY4]]]    [[VALUE], [VALUE], [VALUE], [VALUE]]
1   [[[KEY2]], [[KEY3]], [[KEY1]], [[KEY4]]]    [[VALUE], [VALUE], [VALUE], [VALUE]]
2   [[[KEY1]], [[KEY3]], [[KEY4]], [[KEY2]]]    [[VALUE], [VALUE], [VALUE], [VALUE]]

{'TYPE': {0: array([[array(['START'], dtype='<U5')],
         [array(['DIST'], dtype='<U6')],
         [array(['DISTFALSE'], dtype='<U7')],
         [array(['DISTTRUE'], dtype='<U7')],
         [array(['ENCFALSE'], dtype='<U11')],
         [array(['ENCTRUE'], dtype='<U12')]], dtype=object),
  1: array([[array(['DISTFALSE'], dtype='<U5')],
         [array(['START'], dtype='<U10')],
         [array(['DIST'], dtype='<U11')],
         [array(['DISTTRUE'], dtype='<U11')],
         [array(['ENCTRUE'], dtype='<U10')],
         [array(['ENCFALSE'], dtype='<U11')]], dtype=object)},
 'TIME': {0: array([[ 24413],
         [ 27481],
         [ 29382],
         [ 31923],
         [ 31249],
         [ 34690]]),
  1: array([[ 364582],
         [ 31234],
         [ 43123],
         [ 24444],
         [ 55551],
         [ 12355]])}}

now I would want to have the KEYS be columns and VALUES be rows of the dataframe like here:现在我想让 KEYS 为列，而 VALUES 为数据框的行，如下所示：

    KEY1     KEY2     KEY3     KEY4
0   VALUE    VALUE    VALUE    VALUE
1   VALUE    VALUE    VALUE    VALUE
2   VALUE    VALUE    VALUE    VALUE

The issue is that the order of keys (and consecutively values) is not the same.问题是键（和连续值）的顺序不一样。 It differs between the current rows.它在当前行之间有所不同。

How to achieve that?如何做到这一点？ Many thanks!非常感谢！

Answer 1

I have used the following approach to solve this:我使用以下方法来解决这个问题：

df = pd.DataFrame({'TYPE': {0: np.array([[np.array(['START'], dtype='<U5')],[np.array(['DIST'], dtype='<U6')],[np.array(['DISTFALSE'], dtype='<U7')],[np.array(['DISTTRUE'], dtype='<U7')],[np.array(['ENCFALSE'], dtype='<U11')],[np.array(['ENCTRUE'], dtype='<U12')]], dtype=object),
  1: np.array([[np.array(['DISTFALSE'], dtype='<U5')],[np.array(['START'], dtype='<U10')],[np.array(['DIST'], dtype='<U11')],[np.array(['DISTTRUE'], dtype='<U11')],[np.array(['ENCTRUE'], dtype='<U10')],[np.array(['ENCFALSE'], dtype='<U11')]], dtype=object)},
 'TIME': {0: np.array([[ 24413],[ 27481],[ 29382],[ 31923],[ 31249],[ 34690]]),
  1: np.array([[ 364582],[ 31234],[ 43123],[ 24444],[ 55551],[ 12355]])}})

# Assuming a df as shown in the problem statement

#Initialize an empty dictionary to hold extracted keys and values
keyvals = {}
    
for i in range(0, df.shape[0]):
    keyrow = df.iloc[i, 0].flatten()
    valrow = df.iloc[i, 1].flatten()
    for j,k in zip(keyrow, valrow):
        try:
            keyvals[j].append(k)
        except:
            keyvals[j] = []
            keyvals[j].append(k)
        finally:
            pass

finDf = pd.DataFrame(dict([(k,pd.Series(v)) for k,v in keyvals.items()]))

finDf is finally in this form: finDf 最终是这种形式：

      DISTF  START   DIST  DISTTRUE  ENCTRUE  ENCFALSE  DISTFAL  DISTTRU
0  364582.0  24413  27481   24444.0    34690     31249  29382.0  31923.0
1       NaN  31234  43123       NaN    55551     12355      NaN      NaN

Answer 2

Let's create a new dataframe by mapping key value pairs inside a list comprehension and using np.squeeze to remove the single dimensions:让我们通过在列表np.squeeze映射键值对并使用np.squeeze删除单个维度来创建一个新的数据帧：

df1 = pd.DataFrame([dict(zip(*map(np.squeeze, v))) for v in df.to_numpy()])

Result:结果：

# for sample data
    KEY1   KEY2   KEY3   KEY4
0  VALUE  VALUE  VALUE  VALUE
1  VALUE  VALUE  VALUE  VALUE
2  VALUE  VALUE  VALUE  VALUE

# for actual data
   START   DIST  DISTFAL  DISTTRU  ENCFALSE  ENCTRUE     DISTF  DISTTRUE
0  24413  27481  29382.0  31923.0     31249    34690       NaN       NaN
1  31234  43123      NaN      NaN     12355    55551  364582.0   24444.0

Pandas 数据框中的键/值对

问题描述

2 个解决方案

解决方案1
0 2020-09-11 11:24:20

解决方案2
0 2020-09-11 11:25:43

Pandas 数据框中的键/值对

问题描述

2 个解决方案

解决方案1 0 2020-09-11 11:24:20

解决方案2 0 2020-09-11 11:25:43

解决方案1
0 2020-09-11 11:24:20

解决方案2
0 2020-09-11 11:25:43