简体   繁体   English

将具有数字和名称的列拆分为两个不同的列“ID”和“名称”

[英]Split a column with a number and name into two different columns 'ID' and 'Name'

I am converting a text file to csv.我正在将文本文件转换为 csv。 In the csv file Im getting a column having a number and name in it (eg 1: Aki ), I want to seperate them both in two different columns.在 csv 文件中,我得到一个包含数字和名称的列(例如 1: Aki ),我想将它们分成两个不同的列。

samle data样本数据

1: Aki 
2: Aki
3: Kano

code tried代码尝试

df_output.columns = ['Name', 'date', 'Description']

###df_output['ID'],df_output['Name_'] = df_output['Name'].str[:1],df_output['Name'].str[1:]

obj = df_output['Name']
obj = obj.str.strip()
obj = obj.str.split(':/s*')
df_output['Name'] = obj.str[-1]
df_output['idx'] = obj.str[0]
df_output = df_output.set_index('idx')

Use str.extract here:在此处使用str.extract

df_output['ID'] = df['name'].str.extract(r'^(\d+)')
df_output['name'] = df['name'].str.extract(r'^\d+: (.*)$')

You're very close, just need to change some of the syntax.你很接近,只需要改变一些语法。 Try this:尝试这个:

create data创建数据

df = pd.DataFrame({"column": ["1: Aki", "2: Aki", "3: Kano"]})

print(df)
    column
0   1: Aki
1   2: Aki
2  3: Kano

clean data干净的数据

Let's remove whitespace, then split our column on ": " (colon followed by space)让我们删除空格,然后将我们的列拆分为": " (冒号后跟空格)

clean_df = (df["column"].str.strip()                     # remove whitespace
            .str.split(": ", expand=True)                # new df with 2 columns (0, 1)
            .rename(columns={0: "number", 1: "name"}))   # new df renamed columns

print(clean_df)
  number  name
0      1   Aki
1      2   Aki
2      3  Kano

combine cleaned data with original将清理后的数据与原始数据相结合

Now that our data is nice and clean, we can join it back to the original dataframe:现在我们的数据已经干净整洁了,我们可以join它连接回原来的 dataframe:

final_df = df.join(clean_df)

print(final_df)
    column number  name
0   1: Aki      1   Aki
1   2: Aki      2   Aki
2  3: Kano      3  Kano

All together全部一起

final_df = df.join(
            df["column"].str.strip()
            .str.split(": ", expand=True)
            .rename(columns={0: "number", 1: "name"}))

After fixing your code:修复代码后:

df = pd.DataFrame({'Name':['1: Aki','2: Aki','3: Kano']})

df = df['Name'].str.split(r':\s*',expand = True).rename({0:'idx',1:'Name'},axis =1)

Output: Output:

>>> df
  idx  Name
0   1   Aki
1   2   Aki
2   3  Kano

Try this:尝试这个:

import pandas as pd

# add sample data
df = pd.DataFrame({'Name': ['1: Aki','2: Aki','3: Kano']}) 
   
df[['idx','Name']] = df.Name.str.split(":",expand=True) 
   
print(df)

You can also use extractall method:您还可以使用extractall方法:

df = pd.DataFrame({"col": ["1: Aki", "2: Aki", "3: Kano"]})

df = df.col.str.extractall(r"(?P<id>\d+):\s*(?P<name>\w+)").reset_index(drop=True)

Output: Output:

    id  name
0   1   Aki
1   2   Aki
2   3   Kano

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM