简体   繁体   English

删除字母或数字之前的所有文本

[英]Delete all the text before a letter or a number

I have to delete all the text before any letter or number using python.我必须使用 python 删除任何字母或数字之前的所有文本。

The string I have to deal can be:我必须处理的字符串可以是:

- Presa di coscienza

-3D is better than 2D

Basi di ottica

And the result have to be:结果必须是:

Presa di coscienza

3D is Better than 2D

Basi di ottica

Searching on inte.net I built this regex:在 inte.net 上搜索我构建了这个正则表达式:

^.*?([AZ]|[0-9])

It work well but it delete the first letter too.它运作良好,但它也删除了第一个字母。 How can I do this?我怎样才能做到这一点?

Positive lookahead is your answer:积极的前瞻是你的答案:

^.*?(?=[A-Z]|[0-9])

The extra ?= makes all the difference:额外的?=使一切变得不同:

Positive lookahead will pretty much match any [AZ]|[0-9] group found after the main expression (eg ^.*? ) without actually including it in the result.积极的前瞻将几乎匹配在主表达式(例如^.*? )之后找到的任何[AZ]|[0-9]组,而不会将其实际包含在结果中。

The pattern that you tried deletes the first letter as it first matches 0 or more times any character using a non greedy quantifier, and then captures either an uppercase char AZ or a digit 0-9.您尝试的模式删除第一个字母,因为它首先使用非贪婪量词匹配任何字符 0 次或多次,然后捕获大写字符 AZ 或数字 0-9。

That capture is part of the match, and will be deleted as well.该捕获是匹配的一部分,也将被删除。

Instead you can use a positive lookahead (?=[A-Z0-9]) asserting what is directly to the right is either an uppercase char AZ or a digit using a single character class.相反,您可以使用正前瞻(?=[A-Z0-9])断言直接在右边的是大写字符 AZ 或使用单个字符 class 的数字。

Instead of using the non greedy .*?而不是使用非贪婪的.*? you can use a negated character class matching 0+ times any char except a newline or upper case AZ or a digit and prevent unnecessary backtracking.您可以使用否定字符 class 匹配除换行符或大写字母 AZ 或数字以外的任何字符 0+ 次,并防止不必要的回溯。

^[^A-Z0-9\r\n]*(?=[A-Z0-9])

Explanation解释

  • ^ Start of string ^字符串开始
  • [^A-Z0-9\r\n]* Negated character class, match 0+ times any char except what is listed [^A-Z0-9\r\n]*否定字符 class,匹配 0+ 次除列出的任何字符
  • (?=[A-Z0-9]) Positive lookahead, assert what is directly to the right is a char AZ or digit 0-9 (?=[A-Z0-9])正面前瞻,断言直接在右边的是字符 AZ 或数字 0-9

Regex demo正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM