[英]Split a column with a number and name into two different columns 'ID' and 'Name'
I am converting a text file to csv.我正在将文本文件转换为 csv。 In the csv file Im getting a column having a number and name in it (eg 1: Aki ), I want to seperate them both in two different columns.
在 csv 文件中,我得到一个包含数字和名称的列(例如 1: Aki ),我想将它们分成两个不同的列。
samle data样本数据
1: Aki
2: Aki
3: Kano
code tried代码尝试
df_output.columns = ['Name', 'date', 'Description']
###df_output['ID'],df_output['Name_'] = df_output['Name'].str[:1],df_output['Name'].str[1:]
obj = df_output['Name']
obj = obj.str.strip()
obj = obj.str.split(':/s*')
df_output['Name'] = obj.str[-1]
df_output['idx'] = obj.str[0]
df_output = df_output.set_index('idx')
Use str.extract
here:在此处使用
str.extract
:
df_output['ID'] = df['name'].str.extract(r'^(\d+)')
df_output['name'] = df['name'].str.extract(r'^\d+: (.*)$')
You're very close, just need to change some of the syntax.你很接近,只需要改变一些语法。 Try this:
尝试这个:
df = pd.DataFrame({"column": ["1: Aki", "2: Aki", "3: Kano"]})
print(df)
column
0 1: Aki
1 2: Aki
2 3: Kano
Let's remove whitespace, then split our column on ": "
(colon followed by space)让我们删除空格,然后将我们的列拆分为
": "
(冒号后跟空格)
clean_df = (df["column"].str.strip() # remove whitespace
.str.split(": ", expand=True) # new df with 2 columns (0, 1)
.rename(columns={0: "number", 1: "name"})) # new df renamed columns
print(clean_df)
number name
0 1 Aki
1 2 Aki
2 3 Kano
Now that our data is nice and clean, we can join
it back to the original dataframe:现在我们的数据已经干净整洁了,我们可以
join
它连接回原来的 dataframe:
final_df = df.join(clean_df)
print(final_df)
column number name
0 1: Aki 1 Aki
1 2: Aki 2 Aki
2 3: Kano 3 Kano
final_df = df.join(
df["column"].str.strip()
.str.split(": ", expand=True)
.rename(columns={0: "number", 1: "name"}))
After fixing your code:修复代码后:
df = pd.DataFrame({'Name':['1: Aki','2: Aki','3: Kano']})
df = df['Name'].str.split(r':\s*',expand = True).rename({0:'idx',1:'Name'},axis =1)
Output: Output:
>>> df
idx Name
0 1 Aki
1 2 Aki
2 3 Kano
Try this:尝试这个:
import pandas as pd
# add sample data
df = pd.DataFrame({'Name': ['1: Aki','2: Aki','3: Kano']})
df[['idx','Name']] = df.Name.str.split(":",expand=True)
print(df)
You can also use extractall method:您还可以使用extractall方法:
df = pd.DataFrame({"col": ["1: Aki", "2: Aki", "3: Kano"]})
df = df.col.str.extractall(r"(?P<id>\d+):\s*(?P<name>\w+)").reset_index(drop=True)
Output: Output:
id name
0 1 Aki
1 2 Aki
2 3 Kano
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.