简体   繁体   English

将Pandas Dataframe中的列拆分为n列

[英]Split column in a Pandas Dataframe into n number of columns

In a column in a Pandas Dataframe, I have strings like this:在 Pandas Dataframe 的一列中,我有这样的字符串:

column_name_1 column_name_1 column_name_2 column_name_2
a^b^c a^b^c j j
e^f^g e^f^g k^l k^l
h^i h^i m

I need to split these strings into columns in the same data frame, like this我需要将这些字符串拆分成同一数据框中的列,就像这样

column_name_1 column_name_1 column_name_2 column_name_2 column_name_1_1 column_name_1_1 column_name_1_2 column_name_1_2 column_name_1_3 column_name_1_3 column_name_2_1 column_name_2_1 column_name_2_2 column_name_2_2
a^b^c a^b^c j j a一种 b b c c j j
e^f^g e^f^g k^l k^l e电子 f F g G k k l
h^i h^i m h H i一世 m

I cannot figure out how to do this without knowing in advance how many occurrences of the delimiter there is in the data.如果事先不知道数据中分隔符的出现次数,我无法弄清楚如何执行此操作。

My best effort either includes something like我的最大努力要么包括类似

df[["column_name_1_1","column_name_1_2 ","column_name_1_3"]] = df["column_name_1"].str.split('^',n=2, expand=True)

But it failes with a但它失败了

ValueError: The columns in the computed data do not match the columns in the provided metadata ValueError:计算数据中的列与提供的元数据中的列不匹配

Let's try it with stack + str.split + unstack + join .让我们尝试使用stack + str.split + unstack + join

The idea is to split each column by ^ and expand the split characters into a separate column.这个想法是用^拆分每一列,并将拆分字符扩展到一个单独的列中。 stack helps us do a single str.split on a Series object and unstack creates a DataFrame with the same index as the original. stack帮助我们对 Series object 进行单个str.splitunstack创建一个与原始索引相同的 DataFrame。

tmp = df.stack().str.split('^', expand=True).unstack(level=1).sort_index(level=1, axis=1)
tmp.columns = [f'{y}_{x+1}' for x, y in tmp.columns]
out = df.join(tmp).dropna(how='all', axis=1).fillna('')

Output: Output:

  column_name_1 column_name_2 column_name_1_1 column_name_1_2 column_name_1_3 column_name_1_4 column_name_2_1 column_name_2_2  
0       a^b^c^d             j               a               b               c               d               j                  
1         e^f^g           k^l               e               f               g                               k               l  
2           h^i             m               h               i                                               m                  

One-liner:单线:

new_df = pd.concat([df] + [pd.DataFrame([pd.Series(s) for s in df[col].str.split('^')]).add_prefix(c.name + '_') for col in df], axis=1).fillna('')

Output: Output:

>>> new_df
  column_name_1 column_name_2 column_name_1_0 column_name_1_1 column_name_1_2 column_name_1_3 column_name_1_0 column_name_1_1
0       a^b^c^d             j               a               b               c               d               j
1         e^f^g           k^l               e               f               g                               k               l
2           h^i             m               h               i                                               m

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM