在 Pandas 中将一列拆分为多列

Question

I want to split one current column into 3 columns.我想将当前列拆分为 3 列。 In screenshot we see the builder column, which need to be split in 3 more column such as b.name , city and country.在屏幕截图中，我们看到 builder 列，需要将其拆分为另外 3 个列，例如 b.name 、 city 和 country。 So I use str.split() method in python to split the column which give me good result for 2 column ownerName = df['owner_name'] df[["ownername", "owner_country"]] = df["owner_name"].str.split("-", expand=True)所以我在 python 中使用 str.split() 方法来拆分列，这给了我很好的结果 2 列ownerName = df['owner_name'] df[["ownername", "owner_country"]] = df["owner_name"].str.split("-", expand=True)

But when it come to three columns ownerName = df['owner_name'] df[["ownername", "city", "owner_country"]] = df["owner_name"].str.split("," ,"-", expand=True) , where I use 2 delimiter ',' and '-' it give me this error:但是当涉及到三列ownerName = df['owner_name'] df[["ownername", "city", "owner_country"]] = df["owner_name"].str.split("," ,"-", expand=True) ，我使用 2 个分隔符 ',' 和 '-' 它给了我这个错误：

File "C:\Users....\lib\site-packages\pandas\core\frame.py", line 3160, in setitem self._setitem_array(key, value) File "C:\Users....\lib\site-packages\pandas\core\frame.py", line 3189, in _setitem_array raise ValueError("Columns must be same length as key") ValueError: Columns must be same length as key文件“C:\Users....\lib\site-packages\pandas\core\frame.py”，第 3160 行，在setitem self._setitem_array(key, value) 文件“C:\Users....\ lib\site-packages\pandas\core\frame.py", line 3189, in _setitem_array raise ValueError("Columns must be same length as key") ValueError: Columns must be the same length as key

whats best solution for 2 delimiter ',' and '-', Also there is some empty rows too. 2个分隔符'，'和'-'的最佳解决方案是什么，也有一些空行。

Answer 1

Your exact input is unclear, but assuming the sample input kindly provided by @ArchAngelPwn, you could use str.split with a regex:您的确切输入尚不清楚，但假设@ArchAngelPwn 提供的示例输入，您可以将str.split与正则表达式一起使用：

names = ['Builder_Name', 'City_Name', 'Country']
out = (df['Column1']
 .str.split(r'\s*[,-]\s*', expand=True)  # split on "," or "-" with optional spaces
 .rename(columns=dict(enumerate(names))) # rename 0/1/2 with names in order
)

output:输出：

   Builder_Name City_Name  Country
0  Builder Name      City  Country

Answer 2

You can combine some rows if you feel like you need to, but this was a possible options and should be pretty readable for most developers included in the projects如果您觉得需要，您可以组合一些行，但这是一个可能的选项，对于项目中包含的大多数开发人员来说应该是非常易读的

data = {
    'Column1' : ['Builder Name - City, Country']
}

df = pd.DataFrame(data)
df['Builder_Name'] = df['Column1'].apply(lambda x : x.split('-')[0])
df['City_Name'] = df['Column1'].apply(lambda x : x.split('-')[1:])
df['City_Name'] = df['City_Name'][0]
df['City_Name'] = df['City_Name'].apply(lambda x : x.split()[0])
df['City_Name'] = df['City_Name'].apply(lambda x : x.replace(',', ''))
df['Country'] = df['Column1'].apply(lambda x : x.split(',')[1])
df = df[['Builder_Name', 'City_Name', 'Country']]
df

Answer 3

As mentioned in questions there is 2 delimiter "-" and ",".如问题中所述，有 2 个分隔符“-”和“，”。 for one we simply use str.split("-", expand=True) and for 2 different delimiter we can use same code with addition of small code such as column1 = name-city name ,country (Owner = SANTIERUL NAVAL CONSTANTA - CONSTANTZA, ROMANIA) code will be write as ownerName = df['owner_name'] df[["Owner_name", "City_Name", "owner_country"]] = df["owner_name"].str.split(r', |- |\*|\n', expand=True)对于一个我们简单地使用str.split("-", expand=True)并且对于 2 个不同的分隔符，我们可以使用相同的代码并添加一些小代码，例如column1 = name-city name ,country (Owner = SANTIERUL NAVAL CONSTANTA - CONSTANTZA , ROMANIA)代码将写为ownerName = df['owner_name'] df[["Owner_name", "City_Name", "owner_country"]] = df["owner_name"].str.split(r', |- |\*|\n', expand=True)

在 Pandas 中将一列拆分为多列

问题描述

3 个解决方案

解决方案1
1 已采纳 2022-05-24 12:19:29

解决方案2
0 2022-05-24 02:07:09

解决方案3
0 2022-05-25 17:24:40

在 Pandas 中将一列拆分为多列

问题描述

3 个解决方案

解决方案1 1 已采纳 2022-05-24 12:19:29

解决方案2 0 2022-05-24 02:07:09

解决方案3 0 2022-05-25 17:24:40

解决方案1
1 已采纳 2022-05-24 12:19:29

解决方案2
0 2022-05-24 02:07:09

解决方案3
0 2022-05-25 17:24:40