将Pandas Dataframe中的列拆分为n列

Question

In a column in a Pandas Dataframe, I have strings like this:在 Pandas Dataframe 的一列中，我有这样的字符串：

column_name_1 column_name_1	column_name_2 column_name_2
a^b^c a^b^c	j j
e^f^g e^f^g	k^l k^l
h^i h^i	m米

I need to split these strings into columns in the same data frame, like this我需要将这些字符串拆分成同一数据框中的列，就像这样

column_name_1 column_name_1	column_name_2 column_name_2	column_name_1_1 column_name_1_1	column_name_1_2 column_name_1_2	column_name_1_3 column_name_1_3	column_name_2_1 column_name_2_1	column_name_2_2 column_name_2_2
a^b^c a^b^c	j j	a一种	b b	c c	j j
e^f^g e^f^g	k^l k^l	e电子	f F	g G	k k	l升
h^i h^i	m米	h H	i一世		m米

I cannot figure out how to do this without knowing in advance how many occurrences of the delimiter there is in the data.如果事先不知道数据中分隔符的出现次数，我无法弄清楚如何执行此操作。

My best effort either includes something like我的最大努力要么包括类似

df[["column_name_1_1","column_name_1_2 ","column_name_1_3"]] = df["column_name_1"].str.split('^',n=2, expand=True)

But it failes with a但它失败了

ValueError: The columns in the computed data do not match the columns in the provided metadata ValueError：计算数据中的列与提供的元数据中的列不匹配

Answer 1

Let's try it with stack + str.split + unstack + join .让我们尝试使用stack + str.split + unstack + join 。

The idea is to split each column by ^ and expand the split characters into a separate column.这个想法是用^拆分每一列，并将拆分字符扩展到一个单独的列中。 stack helps us do a single str.split on a Series object and unstack creates a DataFrame with the same index as the original. stack帮助我们对 Series object 进行单个str.split ， unstack创建一个与原始索引相同的 DataFrame。

tmp = df.stack().str.split('^', expand=True).unstack(level=1).sort_index(level=1, axis=1)
tmp.columns = [f'{y}_{x+1}' for x, y in tmp.columns]
out = df.join(tmp).dropna(how='all', axis=1).fillna('')

Output: Output：

  column_name_1 column_name_2 column_name_1_1 column_name_1_2 column_name_1_3 column_name_1_4 column_name_2_1 column_name_2_2  
0       a^b^c^d             j               a               b               c               d               j                  
1         e^f^g           k^l               e               f               g                               k               l  
2           h^i             m               h               i                                               m

Answer 2

One-liner:单线：

new_df = pd.concat([df] + [pd.DataFrame([pd.Series(s) for s in df[col].str.split('^')]).add_prefix(c.name + '_') for col in df], axis=1).fillna('')

Output: Output：

>>> new_df
  column_name_1 column_name_2 column_name_1_0 column_name_1_1 column_name_1_2 column_name_1_3 column_name_1_0 column_name_1_1
0       a^b^c^d             j               a               b               c               d               j
1         e^f^g           k^l               e               f               g                               k               l
2           h^i             m               h               i                                               m

将Pandas Dataframe中的列拆分为n列

问题描述

2 个解决方案

解决方案1
1 已采纳

解决方案2
1

将Pandas Dataframe中的列拆分为n列

问题描述

2 个解决方案

解决方案1 1 已采纳

解决方案2 1

解决方案1
1 已采纳

解决方案2
1