简体   繁体   English

将pandas dataframe string类型的列根据','字符个数拆分为多列

[英]Split pandas dataframe column of type string into multiple columns based on number of ',' characters

Let's say I have a pandas dataframe that looks like this:假设我有一个 pandas dataframe 看起来像这样:

import pandas as pd
data = {'name': ['Tom, Jeffrey, Henry', 'Nick, James', 'Chris', 'David, Oscar']}
df = pd.DataFrame(data)
df
    name
0   Tom, Jeffrey, Henry
1   Nick, James
2   Chris
3   David, Oscar

I know I can split the names into separate columns using the comma as separator, like so:我知道我可以使用逗号作为分隔符将名称拆分为单独的列,如下所示:

df[["name1", "name2", "name3"]] = df["name"].str.split(", ", expand=True)
df
    name                name1   name2   name3
0   Tom, Jeffrey, Henry Tom     Jeffrey Henry
1   Nick, James         Nick    James   None
2   Chris               Chris   None    None
3   David, Oscar        David   Oscar   None

However, if the name column would have a row that contains 4 names, like below, the code above will yield a ValueError: Columns must be same length as key但是,如果name列的一行包含 4 个名称,如下所示,上面的代码将产生ValueError: Columns must be same length as key

data = {'name': ['Tom, Jeffrey, Henry', 'Nick, James', 'Chris', 'David, Oscar', 'Jim, Jones, William, Oliver']}
  
# Create DataFrame
df = pd.DataFrame(data)
df
    name
0   Tom, Jeffrey, Henry
1   Nick, James
2   Chris
3   David, Oscar
4   Jim, Jones, William, Oliver

How can automatically split the name column into n-number of separate columns based on the ',' separator?如何根据','分隔符自动将name列拆分为n个单独的列? The desired output would be this:所需的 output 将是这样的:

        name                          name1  name2    name3   name4
0       Tom, Jeffrey, Henry           Tom    Jeffrey  Henry   None
1       Nick, James                   Nick   James    None    None
2       Chris                         Chris  None     None    None
3       David, Oscar                  David  Oscar    None    None
4       Jim, Jones, William, Oliver   Jim    Jones    William Oliver

Use DataFrame.join for new DataFrame with rename for new columns names:DataFrame.join用于新的DataFrame并为新列名称rename

f = lambda x: f'name{x+1}'
df = df.join(df["name"].str.split(", ", expand=True).rename(columns=f))
print (df)
                          name  name1    name2    name3   name4
0          Tom, Jeffrey, Henry    Tom  Jeffrey    Henry    None
1                  Nick, James   Nick    James     None    None
2                        Chris  Chris     None     None    None
3                 David, Oscar  David    Oscar     None    None
4  Jim, Jones, William, Oliver    Jim    Jones  William  Oliver

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM