[英]Python pandas split column list into multiple columns
I have a Pandas Dataframe as per below, with an index and two columns.我有一个 Pandas Dataframe 如下所示,有一个索引和两列。 "Image_main" column consists of a list of urls.
“Image_main”列包含一个 url 列表。
What I want to do is to separate each of the items in the list of the column "image_main" into new columns in the same dataframe.我想要做的是将“image_main”列列表中的每个项目分成同一个 dataframe 中的新列。 The length of the list is different in each row.
列表的长度在每一行中是不同的。 For example, list in row 1 has 4 urls, while row 3 has only 2 urls.
例如,第 1 行的列表有 4 个 url,而第 3 行只有 2 个 url。
index image_main referenceID
0 ['https://x.com/1.jpg','https://x.com/2.jpg',... 3.297439e+10
1 ['https://y.com/1.jpg','https://y.com/2.jpg',... 4.000220e+12
2 ['https://z.com/1.jpg','https://z.com/2.jpg',... 4.000130e+12
3 ['https://v.com/1.jpg','https://v.com/2.jpg',... 3.296914e+10
4 ['https://a.com/1.jpg','https://a.com/2.jpg',... 4.000080e+12
So far, I have tried below based on the answers given to the following question: Pandas: split column of lists of unequal length into multiple columns .到目前为止,我已经根据对以下问题的答案进行了以下尝试: Pandas: split column of lists of unequal length into multiple columns 。 However, it does not seem to be working since I get the same result as I had before
但是,它似乎不起作用,因为我得到了与以前相同的结果
df['image_main'] = pd.DataFrame(df['image_main'].values.tolist()).add_prefix('code_')
print(df)
image_main referenceID
0 ['https://x.com/1.jpg','https://x.com/2.jpg',... 3.297439e+10
1 ['https://y.com/1.jpg','https://y.com/2.jpg',... 4.000220e+12
2 ['https://z.com/1.jpg','https://z.com/2.jpg',... 4.000130e+12
3 ['https://v.com/1.jpg','https://v.com/2.jpg',... 3.296914e+10
4 ['https://a.com/1.jpg','https://a.com/2.jpg',... 4.000080e+12
How can I split each of the items in the column image_main into new separate columns in the same dataframe?如何将列 image_main 中的每个项目拆分为同一 dataframe 中的新单独列?
The desired result would something similar to below:所需的结果将类似于以下内容:
image_main referenceID. image_1. image 2 ....
0 ...,... 3.297439e+10. 'https://x.com/1.jpg' 'https://x.com/2.jpg'
1 ...,... 3.297439e+10. 'https://y.com/1.jpg' 'https://y.com/2.jpg'
2 ...,... 3.297439e+10. 'https://z.com/1.jpg' 'https://z.com/2.jpg'
3 ...,... 3.297439e+10. 'https://v.com/1.jpg' 'https://v.com/2.jpg'
4 ...,... 3.297439e+10. 'https://a.com/1.jpg' 'https://a.com/2.jpg'
The solution in a thread you linked worked fine when I tried it.当我尝试时,您链接的线程中的解决方案运行良好。
You don't assign the transformation to a column, but join it with a main dataframe您不将转换分配给列,而是将其与主 dataframe
df.join(pd.DataFrame(df["image_main"].values.tolist()).add_prefix('image_'))
To convert image_main
string values to a list, use the following:要将
image_main
字符串值转换为列表,请使用以下命令:
df["image_main"] = df["image_main"].str.replace("\[|\]|\'", "").str.split(",")
df.join(pd.DataFrame(df["image_main"].values.tolist()).add_prefix('image_'))
I think what you're missing is a pd.merge:我认为您缺少的是 pd.merge:
df:
A
0 [1, 2, 3, 4]
1 [1, 2, 3, 4]
2 [1, 2, 3, 4]
merge into new df:合并到新的df:
pd.merge(df, pd.DataFrame(df['A'].values.tolist()).add_prefix('code_'), on=df.index)
output: output:
key_0 A code_0 code_1 code_2 code_3
0 0 [1, 2, 3, 4] 1 2 3 4
1 1 [1, 2, 3, 4] 1 2 3 4
2 2 [1, 2, 3, 4] 1 2 3 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.