简体   繁体   English

将列中列表的数据框转换为行

[英]Convert dataframe of list in columns to rows

I have a pandas DataFrame of this type我有一个这种类型的 Pandas DataFrame

col1  col2           col3
 1   [blue]         [in,out]
 2   [green, green] [in]
 3   [green]        [in]

and I need convert it to a dataframe that keep the first column and distribute all the other values in columns as rows:我需要将其转换为保留第一列的数据框,并将列中的所有其他值作为行分布:

col1 value
1    blue
1    in
1    out
2    green
2    green
2    in
3    green
3    in

Use DataFrame.stack with Series.explode for convert lists, last some data cleaning with DataFrame.reset_index :使用DataFrame.stackSeries.explode的转换列表,持续一段数据与清洁DataFrame.reset_index

df1 = (df.set_index('col1')
         .stack()
         .explode()
         .reset_index(level=1, drop=True)
         .reset_index(name='value'))

Alternative with DataFrame.melt and DataFrame.explode :替代DataFrame.meltDataFrame.explode

df1 = (df.melt('col1')
         .explode('value')
         .sort_values('col1')[['col1','value']]
         .reset_index(drop=True)
)

print (df1)
   col1  value
0     1   blue
1     1     in
2     1    out
3     2  green
4     2  green
5     2     in
6     3  green
7     3     in

Or list comprehension solution:或列表理解解决方案:

L = [(k, x) for k, v in df.set_index('col1').to_dict('index').items() 
            for k1, v1 in v.items() 
            for x in v1]

df1 = pd.DataFrame(L, columns=['col1','value'])
print (df1)
   col1  value
0     1   blue
1     1     in
2     1    out
3     2  green
4     2  green
5     2     in
6     3  green
7     3     in

Another solution could consist of:另一种解决方案可能包括:

  • list comprehension to make col1 with new values and列表理解使col1具有新值和
  • using list concatenation of values in df['col2'] and df['col3'] in order to make value column.使用df['col2']df['col3']中的值的列表连接来制作value列。

The code is following:代码如下:

df_final = pd.DataFrame(
    {
        'col1': [
            i for i, sublist in zip(df['col1'], (df['col2'] + df['col3']).values) 
              for val in range(len(sublist))
        ],
        'value': sum((df['col2'] + df['col3']).values, [])
    }
)
print(df_final)
   col1   value
0     1    blue
1     1      in
2     1     out
3     2   green
4     2   green
5     2      in
6     3   green
7     3      in
d = []
c = []
for i in range(len(df)):
   d.append([j for j in df['c2'][i]])
   d.append([j for j in df['c3'][i]])
   c.append(str(df['c1'][i]) * (len(df['c2'][i])+ len(df['c3'][i])))
   c = [list(j) for j in c]

d = [i for sublist in d for i in sublist]
c = [i for sublist in d for i in sublist]
df1 = pd.DataFrame()
df1['c1'] = c
df1['c2'] = d
df = df1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM