[英]How to remove a certain number of characters at the start of a string
I have a dataset of NHL Free Agents, however they are numbered as a part of the name.我有一个 NHL 自由球员的数据集,但是它们被编号为名称的一部分。 I am trying to make "1. Alex Ovechkin" look like "Alex Ovechkin".我试图让“1. Alex Ovechkin”看起来像“Alex Ovechkin”。 Basically just trying to delete the number, period, and space between.基本上只是试图删除数字、句号和之间的空格。
I have used the following code to successfully delete the numbers for the first 10 entries, however at entry 11 I need to delete 4 characters instead of 3. The same goes for row 100, I need to delete 5 characters to delete the numbers, period, and space.我已经使用以下代码成功删除了前 10 个条目的数字,但是在第 11 个条目中我需要删除 4 个字符而不是 3 个。第 100 行也是如此,我需要删除 5 个字符来删除数字,句号, 和空间。
This is the code that I have been trying to use to know avail.这是我一直试图用来了解有效性的代码。
free_agents['Player'] = free_agents['Player'].str[3:]
This works for the first 10 entries, but after that there is a space from 11-100, and a period and a space for the rest.这适用于前 10 个条目,但之后有 11-100 的空格,以及 rest 的句点和空格。
I also tried the following code, which worked for the first 10, but deleted the rest of the entries.我还尝试了以下代码,它适用于前 10 个,但删除了条目的 rest。
free_agents['Player'] = free_agents['Player'][0:10].str[3:]
My last attempt was to make a for loop, but did not work.我最后一次尝试是做一个 for 循环,但没有奏效。
for player in free_agents['Player']:
if player in free_agents['Player'][0:100]:
free_agents = free_agents['Player'].str[2:]
else:
free_agents['Player'] = free_agents['Player'].str[4:]
I've ran out of ideas to try, and would love some help in finding the most efficient way to do this.我已经没有想法可以尝试了,并且希望在找到最有效的方法方面得到一些帮助。 Thanks so much!非常感谢!
Assuming that no name starts with a number, you could try this:假设没有名称以数字开头,您可以尝试以下操作:
free_agents['Player'] = free_agents['Player'].lstrip('0123456789. ')
This strips leading characters in the string matching:这会去除字符串匹配中的前导字符:
0
and 9
0
到9
之间的任何数字.
一个时期.
. .split by.分开。 and get string index 1 of the output并获取 output 的字符串索引 1
df.Player=df.Player.str.split('\.\s').str[1]
df.Player = df.Player.str.split("\. ").str[1]
Based on regex基于正则表达式
df["Player"] = df["Player"].str.replace("^\d+\.\s+", repl="", regex=True)
>>> df
Player
0 1. Player A
1 2. Player B
2 10. Player C
3 11. Player D
4 100. Player E
5 101. Player F
df["Player"] = df["Player"].str.replace("^\d+\.\s+", repl="", regex=True)
>>> df
Player
0 Player A
1 Player B
2 Player C
3 Player D
4 Player E
5 Player F
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.