简体   繁体   English

使用 python/pandas 将任意多列转换为键值对

[英]convert arbitrarily-many columns into key-value pairs using python/pandas

I'm trying to convert a very wide csv file with r rows and c columns into a dict or dataframe with r*c rows and three columns of the form row_id, col_name, col_value. I'm trying to convert a very wide csv file with r rows and c columns into a dict or dataframe with r*c rows and three columns of the form row_id, col_name, col_value. Since the number of columns is very large -- more than 10,000 -- this can't be done manually.由于列的数量非常大——超过 10,000——这不能手动完成。

Say for example I start with a pandas dataframe:例如,我从 pandas dataframe 开始:

import pandas as pd

df = pd.DataFrame({'id': {0: '1',  1: '2',  2: '3'},
 'c1': {0: 'S', 1: 'S', 2: 'D'},
 'c2': {0: 'XX',  1: 'WX',  2: 'WX'},
 'c3': {0: '32',  1: '63',  2: '32'}})

df = df.set_index('id')

that looks like this:看起来像这样:

    id  c1  c2  c3
0   1   S   XX  32
1   2   S   WX  63
2   3   D   WX  32

Keep in mind that this example dataframe has only three columns, but the solution needs to work through a very large number of columns.请记住,此示例 dataframe 只有三列,但解决方案需要处理大量列。

The objective is to convert this to a dict or dataframe that looks like this:目标是将其转换为如下所示的 dict 或 dataframe:

    id  key     value
0   1   c1  S
1   1   c2  XX
2   1   c3  32
3   2   c1  S
4   2   c2  WX
5   2   c3  63
6   3   c1  D
7   3   c2  WX
8   3   c3  32

I have written something that achieves the desired output, by iterating by column and row from dataframe into a new dataframe:我已经编写了一些实现所需 output 的东西,方法是按列和行从 dataframe 迭代到新的 dataframe 中:

data = []

for i, row in df.iterrows():
    for j, column in row.iteritems():
        a_dictionary = i, j, column
        data.append(a_dictionary)

df_out = pd.DataFrame(data)
df_out.columns = ['id', 'key', 'value']

But I've read one can and should avoid using for loops in pandas and python.但我读过一本可以并且应该避免在 pandas 和 python 中使用for循环。 So what would a proper solution look like?那么一个合适的解决方案应该是什么样的呢?

Have you considered using pd.melt ?您是否考虑过使用pd.melt

import pandas as pd
df = pd.DataFrame({'id': {0: '1',  1: '2',  2: '3'},
 'c1': {0: 'S', 1: 'S', 2: 'D'},
 'c2': {0: 'XX',  1: 'WX',  2: 'WX'},
 'c3': {0: '32',  1: '63',  2: '32'}})

out = pd.melt(df,
              id_vars=['id'],
              value_vars=df.columns[1:])
  id variable value
0  1       c1     S
1  2       c1     S
2  3       c1     D
3  1       c2    XX
4  2       c2    WX
5  3       c2    WX
6  1       c3    32
7  2       c3    63
8  3       c3    32

You can do this:你可以这样做:

In [212]: df.stack(dropna=False)\
            .reset_index(name='Value')\
            .rename(columns={'level_1': 'key'})                                                                                                                            
Out[212]: 
  id key Value
0  1  c1     S
1  1  c2    XX
2  1  c3    32
3  2  c1     S
4  2  c2    WX
5  2  c3    63
6  3  c1     D
7  3  c2    WX
8  3  c3    32

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM