将具有数字和名称的列拆分为两个不同的列“ID”和“名称”

Question

I am converting a text file to csv.我正在将文本文件转换为 csv。 In the csv file Im getting a column having a number and name in it (eg 1: Aki ), I want to seperate them both in two different columns.在 csv 文件中，我得到一个包含数字和名称的列（例如 1: Aki ），我想将它们分成两个不同的列。

samle data样本数据

1: Aki 
2: Aki
3: Kano

code tried代码尝试

df_output.columns = ['Name', 'date', 'Description']

###df_output['ID'],df_output['Name_'] = df_output['Name'].str[:1],df_output['Name'].str[1:]

obj = df_output['Name']
obj = obj.str.strip()
obj = obj.str.split(':/s*')
df_output['Name'] = obj.str[-1]
df_output['idx'] = obj.str[0]
df_output = df_output.set_index('idx')

Answer 1

Use str.extract here:在此处使用str.extract ：

df_output['ID'] = df['name'].str.extract(r'^(\d+)')
df_output['name'] = df['name'].str.extract(r'^\d+: (.*)$')

Answer 2

You're very close, just need to change some of the syntax.你很接近，只需要改变一些语法。 Try this:尝试这个：

create data创建数据

df = pd.DataFrame({"column": ["1: Aki", "2: Aki", "3: Kano"]})

print(df)
    column
0   1: Aki
1   2: Aki
2  3: Kano

clean data干净的数据

Let's remove whitespace, then split our column on ": " (colon followed by space)让我们删除空格，然后将我们的列拆分为": " （冒号后跟空格）

clean_df = (df["column"].str.strip()                     # remove whitespace
            .str.split(": ", expand=True)                # new df with 2 columns (0, 1)
            .rename(columns={0: "number", 1: "name"}))   # new df renamed columns

print(clean_df)
  number  name
0      1   Aki
1      2   Aki
2      3  Kano

combine cleaned data with original将清理后的数据与原始数据相结合

Now that our data is nice and clean, we can join it back to the original dataframe:现在我们的数据已经干净整洁了，我们可以join它连接回原来的 dataframe：

final_df = df.join(clean_df)

print(final_df)
    column number  name
0   1: Aki      1   Aki
1   2: Aki      2   Aki
2  3: Kano      3  Kano

All together全部一起

final_df = df.join(
            df["column"].str.strip()
            .str.split(": ", expand=True)
            .rename(columns={0: "number", 1: "name"}))

Answer 3

After fixing your code:修复代码后：

df = pd.DataFrame({'Name':['1: Aki','2: Aki','3: Kano']})

df = df['Name'].str.split(r':\s*',expand = True).rename({0:'idx',1:'Name'},axis =1)

Output: Output：

>>> df
  idx  Name
0   1   Aki
1   2   Aki
2   3  Kano

Answer 4

Try this:尝试这个：

import pandas as pd

# add sample data
df = pd.DataFrame({'Name': ['1: Aki','2: Aki','3: Kano']}) 
   
df[['idx','Name']] = df.Name.str.split(":",expand=True) 
   
print(df)

Answer 5

You can also use extractall method:您还可以使用extractall方法：

df = pd.DataFrame({"col": ["1: Aki", "2: Aki", "3: Kano"]})

df = df.col.str.extractall(r"(?P<id>\d+):\s*(?P<name>\w+)").reset_index(drop=True)

Output: Output：

    id  name
0   1   Aki
1   2   Aki
2   3   Kano

将具有数字和名称的列拆分为两个不同的列“ID”和“名称”

问题描述

5 个解决方案

解决方案1
4 已采纳 2021-03-04 03:55:41

解决方案2
1 2021-03-04 03:57:44

create data创建数据

clean data干净的数据

combine cleaned data with original将清理后的数据与原始数据相结合

All together全部一起

解决方案3
1 2021-03-04 04:02:25

解决方案4
1 2021-03-04 04:06:56

解决方案5
1 2021-03-04 04:32:01

将具有数字和名称的列拆分为两个不同的列“ID”和“名称”

问题描述

5 个解决方案

解决方案1 4 已采纳 2021-03-04 03:55:41

解决方案2 1 2021-03-04 03:57:44

create data创建数据

clean data干净的数据

combine cleaned data with original将清理后的数据与原始数据相结合

All together全部一起

解决方案3 1 2021-03-04 04:02:25

解决方案4 1 2021-03-04 04:06:56

解决方案5 1 2021-03-04 04:32:01

解决方案1
4 已采纳 2021-03-04 03:55:41

解决方案2
1 2021-03-04 03:57:44

解决方案3
1 2021-03-04 04:02:25

解决方案4
1 2021-03-04 04:06:56

解决方案5
1 2021-03-04 04:32:01