如何从 Pandas 系列中提取不同长度的行值到新列？

Question

Suppose I have a pandas dataframe with a series C where each value is a list.假设我有一个带有 C 系列的 Pandas 数据框，其中每个值都是一个列表。 Since the length of each list is different, How do I slice and append this series to new columns of this DataFrame ?由于每个列表的长度不同，如何将此系列切片并将其附加到此 DataFrame 的新列？

Additional findings: Starting with [ , and ', each letter is appended to the whole list (blanc space included to separate the word)其他发现：以 [ 和 ' 开头，每个字母都附加到整个列表中（包含空格以分隔单词）

What should I do to combine the letters into a single word then apply the solutions?我应该怎么做才能将字母组合成一个单词然后应用解决方案？

Sample df -示例 df -

id   A     B    C                       
0    1     2    ['Alan', 'Rod', 'Ben']  
1    1     3    ['Jeff']                  
2    4     6    ['Pete', 'Joe']

Intermediate df -中级 df -

id   A     B    C                       N1   N2   N3  N4  ....
0    1     2    ['Alan', 'Rod', 'Ben']  [    '    A   l
1    1     3    ['Jeff']                [    '    J   e
2    4     6    ['Pete', 'Joe']         [    '    P   e

Expected df -预期 df -

id   A     B    C                        N1     N2      N3  
0    1     2    ['Alan', 'Rod', 'Ben']  'Alan'  'Rod'   'Ben'   
1    1     3    ['Jeff']                'Jeff'   Nan     Nan   
2    4     6    ['Pete', 'Joe']         'Pete'   'Joe'   Nan

Answer 1

Convert the series to a list, so that you have a list of lists, and then convert it to a dataframe with pandas.DataFrame(listoflists) .将系列转换为列表，以便您拥有列表列表，然后使用pandas.DataFrame(listoflists)将其转换为数据pandas.DataFrame(listoflists) 。 You can then append or merge the new dataframe to the old one.然后，您可以将新数据帧附加或合并到旧数据帧。

Answer 2

df.join(pd.DataFrame(df["C"].apply(pd.Series))).rename(columns={0:"N1",1:"N2",2:"N3"})

   A  B                 C    N1   N2   N3
0  1  2  [Alan, Rod, Ben]  Alan  Rod  Ben
1  1  3            [Jeff]  Jeff  NaN  NaN
2  4  6       [Pete, Joe]  Pete  Joe  NaN

Answer 3

The solution is a greatly simplified version of this question .解决方案是这个问题的一个大大简化的版本。 Just put the lists of unequal length into the pd.DataFrame() constructor, and the number of new columns will be determined automatically.只需将不等长的列表放入pd.DataFrame()构造函数中，就会自动确定新列的数量。

import pandas as pd
import numpy as np

df = pd.DataFrame(
    [[1, 2,['Alan', 'Rod', 'Ben']],
     [1, 3,['Jeff']],
     [4, 6,['Pete', 'Joe']]],
    columns=['A', 'B','C']
)

# 1. unpack and reconstruct a dataframe   
df_unpack = pd.DataFrame(df["C"].to_list())
# optional: None to NaN
# df_unpack.fillna(np.nan)    

print(df_unpack)
      0     1     2
0  Alan   Rod   Ben
1  Jeff  None  None
2  Pete   Joe  None

# 2. concatenate the results
df_out = pd.concat([df, df_unpack], axis=1)

# 3. determine names
df_out.index.name = "id"
df_out.columns = ['A','B','C'] + [f"N{i+1}" for i in range(df_unpack.shape[1])]

print(df_out)
    A  B                 C    N1    N2    N3
id                                          
0   1  2  [Alan, Rod, Ben]  Alan   Rod   Ben
1   1  3            [Jeff]  Jeff  None  None
2   4  6       [Pete, Joe]  Pete   Joe  None

Answer 4

iterate over items and create new columns:迭代项目并创建新列：

newdf = pd.DataFrame();
for i , row in df.iterrows():
    for j in range(len(row['C'])):
        row['ncol{}'.format(j)] = row['C'][j]
    newdf = newdf.append(row,ignore_index=True)

如何从 Pandas 系列中提取不同长度的行值到新列？

问题描述

4 个解决方案

解决方案1
0 2020-10-13 16:27:15

解决方案2
0 2020-10-13 16:35:39

解决方案3
0 2020-10-13 16:37:13

解决方案4
0 2020-10-13 16:39:00

如何从 Pandas 系列中提取不同长度的行值到新列？

问题描述

4 个解决方案

解决方案1 0 2020-10-13 16:27:15

解决方案2 0 2020-10-13 16:35:39

解决方案3 0 2020-10-13 16:37:13

解决方案4 0 2020-10-13 16:39:00

解决方案1
0 2020-10-13 16:27:15

解决方案2
0 2020-10-13 16:35:39

解决方案3
0 2020-10-13 16:37:13

解决方案4
0 2020-10-13 16:39:00