[英]How to extract row value of different length to new columns from a Pandas Series?
Suppose I have a pandas dataframe with a series C where each value is a list.假设我有一个带有 C 系列的 Pandas 数据框,其中每个值都是一个列表。 Since the length of each list is different, How do I slice and append this series to new columns of this DataFrame ?由于每个列表的长度不同,如何将此系列切片并将其附加到此 DataFrame 的新列?
Additional findings: Starting with [ , and ', each letter is appended to the whole list (blanc space included to separate the word)其他发现:以 [ 和 ' 开头,每个字母都附加到整个列表中(包含空格以分隔单词)
What should I do to combine the letters into a single word then apply the solutions?我应该怎么做才能将字母组合成一个单词然后应用解决方案?
Sample df -示例 df -
id A B C
0 1 2 ['Alan', 'Rod', 'Ben']
1 1 3 ['Jeff']
2 4 6 ['Pete', 'Joe']
Intermediate df -中级 df -
id A B C N1 N2 N3 N4 ....
0 1 2 ['Alan', 'Rod', 'Ben'] [ ' A l
1 1 3 ['Jeff'] [ ' J e
2 4 6 ['Pete', 'Joe'] [ ' P e
Expected df -预期 df -
id A B C N1 N2 N3
0 1 2 ['Alan', 'Rod', 'Ben'] 'Alan' 'Rod' 'Ben'
1 1 3 ['Jeff'] 'Jeff' Nan Nan
2 4 6 ['Pete', 'Joe'] 'Pete' 'Joe' Nan
Convert the series to a list, so that you have a list of lists, and then convert it to a dataframe with pandas.DataFrame(listoflists)
.将系列转换为列表,以便您拥有列表列表,然后使用pandas.DataFrame(listoflists)
将其转换为数据pandas.DataFrame(listoflists)
。 You can then append or merge the new dataframe to the old one.然后,您可以将新数据帧附加或合并到旧数据帧。
df.join(pd.DataFrame(df["C"].apply(pd.Series))).rename(columns={0:"N1",1:"N2",2:"N3"})
A B C N1 N2 N3
0 1 2 [Alan, Rod, Ben] Alan Rod Ben
1 1 3 [Jeff] Jeff NaN NaN
2 4 6 [Pete, Joe] Pete Joe NaN
The solution is a greatly simplified version of this question .解决方案是这个问题的一个大大简化的版本。 Just put the lists of unequal length into the pd.DataFrame()
constructor, and the number of new columns will be determined automatically.只需将不等长的列表放入pd.DataFrame()
构造函数中,就会自动确定新列的数量。
import pandas as pd
import numpy as np
df = pd.DataFrame(
[[1, 2,['Alan', 'Rod', 'Ben']],
[1, 3,['Jeff']],
[4, 6,['Pete', 'Joe']]],
columns=['A', 'B','C']
)
# 1. unpack and reconstruct a dataframe
df_unpack = pd.DataFrame(df["C"].to_list())
# optional: None to NaN
# df_unpack.fillna(np.nan)
print(df_unpack)
0 1 2
0 Alan Rod Ben
1 Jeff None None
2 Pete Joe None
# 2. concatenate the results
df_out = pd.concat([df, df_unpack], axis=1)
# 3. determine names
df_out.index.name = "id"
df_out.columns = ['A','B','C'] + [f"N{i+1}" for i in range(df_unpack.shape[1])]
print(df_out)
A B C N1 N2 N3
id
0 1 2 [Alan, Rod, Ben] Alan Rod Ben
1 1 3 [Jeff] Jeff None None
2 4 6 [Pete, Joe] Pete Joe None
iterate over items and create new columns:迭代项目并创建新列:
newdf = pd.DataFrame();
for i , row in df.iterrows():
for j in range(len(row['C'])):
row['ncol{}'.format(j)] = row['C'][j]
newdf = newdf.append(row,ignore_index=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.