简体   繁体   English

如何在字符串的开头删除一定数量的字符

[英]How to remove a certain number of characters at the start of a string

I have a dataset of NHL Free Agents, however they are numbered as a part of the name.我有一个 NHL 自由球员的数据集,但是它们被编号为名称的一部分。 I am trying to make "1. Alex Ovechkin" look like "Alex Ovechkin".我试图让“1. Alex Ovechkin”看起来像“Alex Ovechkin”。 Basically just trying to delete the number, period, and space between.基本上只是试图删除数字、句号和之间的空格。

此处显示的数据集

I have used the following code to successfully delete the numbers for the first 10 entries, however at entry 11 I need to delete 4 characters instead of 3. The same goes for row 100, I need to delete 5 characters to delete the numbers, period, and space.我已经使用以下代码成功删除了前 10 个条目的数字,但是在第 11 个条目中我需要删除 4 个字符而不是 3 个。第 100 行也是如此,我需要删除 5 个字符来删除数字,句号, 和空间。

This is the code that I have been trying to use to know avail.这是我一直试图用来了解有效性的代码。

free_agents['Player'] = free_agents['Player'].str[3:]

This works for the first 10 entries, but after that there is a space from 11-100, and a period and a space for the rest.这适用于前 10 个条目,但之后有 11-100 的空格,以及 rest 的句点和空格。

I also tried the following code, which worked for the first 10, but deleted the rest of the entries.我还尝试了以下代码,它适用于前 10 个,但删除了条目的 rest。

free_agents['Player'] = free_agents['Player'][0:10].str[3:]

My last attempt was to make a for loop, but did not work.我最后一次尝试是做一个 for 循环,但没有奏效。

for player in free_agents['Player']:
    if player in free_agents['Player'][0:100]:
        free_agents = free_agents['Player'].str[2:]
    else: 
        free_agents['Player'] = free_agents['Player'].str[4:]

I've ran out of ideas to try, and would love some help in finding the most efficient way to do this.我已经没有想法可以尝试了,并且希望在找到最有效的方法方面得到一些帮助。 Thanks so much!非常感谢!

Assuming that no name starts with a number, you could try this:假设没有名称以数字开头,您可以尝试以下操作:

free_agents['Player'] = free_agents['Player'].lstrip('0123456789. ')

This strips leading characters in the string matching:这会去除字符串匹配中的前导字符:

  1. Any numbers between 0 and 9 09之间的任何数字
  2. A period .一个时期.
  3. A space空间 . .

split by.分开。 and get string index 1 of the output并获取 output 的字符串索引 1

df.Player=df.Player.str.split('\.\s').str[1]
df.Player = df.Player.str.split("\. ").str[1]

Based on regex基于正则表达式

df["Player"] = df["Player"].str.replace("^\d+\.\s+", repl="", regex=True)
>>> df
          Player
0    1. Player A
1    2. Player B
2   10. Player C
3   11. Player D
4  100. Player E
5  101. Player F

df["Player"] = df["Player"].str.replace("^\d+\.\s+", repl="", regex=True)

>>> df
     Player
0  Player A
1  Player B
2  Player C
3  Player D
4  Player E
5  Player F

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM