简体   繁体   English

如何从 Pandas 系列中提取不同长度的行值到新列?

[英]How to extract row value of different length to new columns from a Pandas Series?

Suppose I have a pandas dataframe with a series C where each value is a list.假设我有一个带有 C 系列的 Pandas 数据框,其中每个值都是一个列表。 Since the length of each list is different, How do I slice and append this series to new columns of this DataFrame ?由于每个列表的长度不同,如何将此系列切片并将其附加到此 DataFrame 的新列?

Additional findings: Starting with [ , and ', each letter is appended to the whole list (blanc space included to separate the word)其他发现:以 [ 和 ' 开头,每个字母都附加到整个列表中(包含空格以分隔单词)

What should I do to combine the letters into a single word then apply the solutions?我应该怎么做才能将字母组合成一个单词然后应用解决方案?

Sample df -示例 df -

id   A     B    C                       
0    1     2    ['Alan', 'Rod', 'Ben']  
1    1     3    ['Jeff']                  
2    4     6    ['Pete', 'Joe']  

Intermediate df -中级 df -

id   A     B    C                       N1   N2   N3  N4  ....
0    1     2    ['Alan', 'Rod', 'Ben']  [    '    A   l
1    1     3    ['Jeff']                [    '    J   e
2    4     6    ['Pete', 'Joe']         [    '    P   e

Expected df -预期 df -

id   A     B    C                        N1     N2      N3  
0    1     2    ['Alan', 'Rod', 'Ben']  'Alan'  'Rod'   'Ben'   
1    1     3    ['Jeff']                'Jeff'   Nan     Nan   
2    4     6    ['Pete', 'Joe']         'Pete'   'Joe'   Nan

Convert the series to a list, so that you have a list of lists, and then convert it to a dataframe with pandas.DataFrame(listoflists) .将系列转换为列表,以便您拥有列表列表,然后使用pandas.DataFrame(listoflists)将其转换为数据pandas.DataFrame(listoflists) You can then append or merge the new dataframe to the old one.然后,您可以将新数据帧附加或合并到旧数据帧。

df.join(pd.DataFrame(df["C"].apply(pd.Series))).rename(columns={0:"N1",1:"N2",2:"N3"})

   A  B                 C    N1   N2   N3
0  1  2  [Alan, Rod, Ben]  Alan  Rod  Ben
1  1  3            [Jeff]  Jeff  NaN  NaN
2  4  6       [Pete, Joe]  Pete  Joe  NaN

The solution is a greatly simplified version of this question .解决方案是这个问题的一个大大简化的版本。 Just put the lists of unequal length into the pd.DataFrame() constructor, and the number of new columns will be determined automatically.只需将不等长的列表放入pd.DataFrame()构造函数中,就会自动确定新列的数量。

import pandas as pd
import numpy as np

df = pd.DataFrame(
    [[1, 2,['Alan', 'Rod', 'Ben']],
     [1, 3,['Jeff']],
     [4, 6,['Pete', 'Joe']]],
    columns=['A', 'B','C']
)

# 1. unpack and reconstruct a dataframe   
df_unpack = pd.DataFrame(df["C"].to_list())
# optional: None to NaN
# df_unpack.fillna(np.nan)    

print(df_unpack)
      0     1     2
0  Alan   Rod   Ben
1  Jeff  None  None
2  Pete   Joe  None

# 2. concatenate the results
df_out = pd.concat([df, df_unpack], axis=1)

# 3. determine names
df_out.index.name = "id"
df_out.columns = ['A','B','C'] + [f"N{i+1}" for i in range(df_unpack.shape[1])]

print(df_out)
    A  B                 C    N1    N2    N3
id                                          
0   1  2  [Alan, Rod, Ben]  Alan   Rod   Ben
1   1  3            [Jeff]  Jeff  None  None
2   4  6       [Pete, Joe]  Pete   Joe  None

iterate over items and create new columns:迭代项目并创建新列:

newdf = pd.DataFrame();
for i , row in df.iterrows():
    for j in range(len(row['C'])):
        row['ncol{}'.format(j)] = row['C'][j]
    newdf = newdf.append(row,ignore_index=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM