简体   繁体   English

Python pandas 将列列表拆分为多列

[英]Python pandas split column list into multiple columns

I have a Pandas Dataframe as per below, with an index and two columns.我有一个 Pandas Dataframe 如下所示,有一个索引和两列。 "Image_main" column consists of a list of urls. “Image_main”列包含一个 url 列表。

What I want to do is to separate each of the items in the list of the column "image_main" into new columns in the same dataframe.我想要做的是将“image_main”列列表中的每个项目分成同一个 dataframe 中的新列。 The length of the list is different in each row.列表的长度在每一行中是不同的。 For example, list in row 1 has 4 urls, while row 3 has only 2 urls.例如,第 1 行的列表有 4 个 url,而第 3 行只有 2 个 url。

index   image_main                                      referenceID
0     ['https://x.com/1.jpg','https://x.com/2.jpg',...  3.297439e+10
1     ['https://y.com/1.jpg','https://y.com/2.jpg',...  4.000220e+12
2     ['https://z.com/1.jpg','https://z.com/2.jpg',...  4.000130e+12
3     ['https://v.com/1.jpg','https://v.com/2.jpg',...  3.296914e+10
4     ['https://a.com/1.jpg','https://a.com/2.jpg',...  4.000080e+12

So far, I have tried below based on the answers given to the following question: Pandas: split column of lists of unequal length into multiple columns .到目前为止,我已经根据对以下问题的答案进行了以下尝试: Pandas: split column of lists of unequal length into multiple columns However, it does not seem to be working since I get the same result as I had before但是,它似乎不起作用,因为我得到了与以前相同的结果

df['image_main'] = pd.DataFrame(df['image_main'].values.tolist()).add_prefix('code_')
print(df)

    image_main                                         referenceID
0   ['https://x.com/1.jpg','https://x.com/2.jpg',...   3.297439e+10
1    ['https://y.com/1.jpg','https://y.com/2.jpg',...   4.000220e+12
2    ['https://z.com/1.jpg','https://z.com/2.jpg',...   4.000130e+12
3    ['https://v.com/1.jpg','https://v.com/2.jpg',...   3.296914e+10
4    ['https://a.com/1.jpg','https://a.com/2.jpg',...   4.000080e+12

How can I split each of the items in the column image_main into new separate columns in the same dataframe?如何将列 image_main 中的每个项目拆分为同一 dataframe 中的新单独列?

The desired result would something similar to below:所需的结果将类似于以下内容:

    image_main     referenceID.     image_1.                  image 2                ....
0   ...,...        3.297439e+10.    'https://x.com/1.jpg'    'https://x.com/2.jpg' 
1   ...,...        3.297439e+10.    'https://y.com/1.jpg'    'https://y.com/2.jpg' 
2   ...,...        3.297439e+10.    'https://z.com/1.jpg'    'https://z.com/2.jpg' 
3   ...,...        3.297439e+10.    'https://v.com/1.jpg'    'https://v.com/2.jpg' 
4   ...,...        3.297439e+10.    'https://a.com/1.jpg'    'https://a.com/2.jpg' 


The solution in a thread you linked worked fine when I tried it.当我尝试时,您链接的线程中的解决方案运行良好。

You don't assign the transformation to a column, but join it with a main dataframe您不将转换分配给列,而是将其与主 dataframe


df.join(pd.DataFrame(df["image_main"].values.tolist()).add_prefix('image_'))

EDIT:编辑:

To convert image_main string values to a list, use the following:要将image_main字符串值转换为列表,请使用以下命令:

df["image_main"] = df["image_main"].str.replace("\[|\]|\'", "").str.split(",")
df.join(pd.DataFrame(df["image_main"].values.tolist()).add_prefix('image_'))

I think what you're missing is a pd.merge:我认为您缺少的是 pd.merge:

df:

     A
0   [1, 2, 3, 4]
1   [1, 2, 3, 4]
2   [1, 2, 3, 4]

merge into new df:合并到新的df:

pd.merge(df, pd.DataFrame(df['A'].values.tolist()).add_prefix('code_'), on=df.index)

output: output:

    key_0   A             code_0    code_1  code_2  code_3
0   0       [1, 2, 3, 4]    1         2       3      4
1   1       [1, 2, 3, 4]    1         2       3      4
2   2       [1, 2, 3, 4]    1         2       3      4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM