简体   繁体   English

Python正则表达式匹配包含字母和数字的8个字符的字符串

[英]Python regex match string of 8 characters that contain both alphabets and numbers

I am trying to match a string of length 8 containing both numbers and alphabets(cannot have just numbers or just alphabets)using re.findall .我正在尝试使用re.findall匹配包含数字和字母(不能只有数字或只有字母)的长度为 8 的字符串。 The string can start with either letter or alphabet followed by any combination.字符串可以以字母或字母开头,后跟任意组合。

eg-例如-

Input String: The reference number is 896av6uf and not 87987647 or ahduhsjs or hn0.输入字符串: The reference number is 896av6uf and not 87987647 or ahduhsjs or hn0.

Output: ['896av6uf','a96bv6u0']输出: ['896av6uf','a96bv6u0']

I came up with this regex r'([az]+[\\d]+[\\w]*|[\\d]+[az]+[\\w]*)' however it is giving me strings with less than 8 characters as well.我想出了这个正则表达式r'([az]+[\\d]+[\\w]*|[\\d]+[az]+[\\w]*)'但是它给了我少于 8 的字符串字符也是如此。 Need to modify the regex to return strings with exactly 8 chars that contain both letters and alphabets.需要修改正则表达式以返回包含字母和字母表的 8 个字符的字符串。

You can use您可以使用

\b(?=[a-zA-Z]*[0-9])(?=[0-9]*[a-zA-Z])[a-zA-Z0-9]{8}\b
\b(?=[^\W\d_]*\d)(?=\d*[^\W\d_])[^\W_]{8}\b

The first one only supports ASCII letters, while the second one supports all Unicode letters and digits since [^\\W\\d_] matches any Unicode letter and \\d matches any Unicode digit (as the re.UNICODE option is used by default in Python 3.x).第一个只支持 ASCII 字母,而第二个支持所有 Unicode 字母和数字,因为[^\\W\\d_]匹配任何 Unicode 字母, \\d匹配任何 Unicode 数字(因为在 Python 中默认使用re.UNICODE选项3.x)。

Details:细节:

  • \\b - a word boundary \\b - 单词边界
  • (?=[a-zA-Z]*[0-9]) - after any 0+ ASCII letters, there must be a digit (?=[a-zA-Z]*[0-9]) - 在任何 0+ ASCII 字母之后,必须有一个数字
  • (?=[0-9]*[a-zA-Z]) - after any 0+ digits, there must be an ASCII letter (?=[0-9]*[a-zA-Z]) - 在任何 0+ 数字之后,必须有一个 ASCII 字母
  • [a-zA-Z0-9]{8} - eight ASCII alphanumeric chars [a-zA-Z0-9]{8} - 八个 ASCII 字母数字字符
  • \\b - a word boundary \\b - 单词边界

You can use \\b\\w{8}\\b您可以使用\\b\\w{8}\\b

It does not guarantee that you will have both digits AND letters, but does guarantee that you will have exactly eight characters, surrounded by word boundaries (eg whitespace, start/end of line).它不保证您将同时拥有数字和字母,但可以保证您将恰好有八个字符,并被单词边界(例如空格、行首/行尾)包围。

You can try it in one of the online playgrounds such as this one: https://regex101.com/您可以在其中一个在线游乐场中进行尝试,例如: https : //regex101.com/

在此处输入图片说明

The meat of the matching is done with the \\w{8} which means 8 letters/words (including capitals and underscore).匹配的主要内容是用\\w{8} ,这意味着 8 个字母/单词(包括大写和下划线)。 \\b means "word boundary" \\b表示“词边界”

If you want only digits and lowercase letters, replace this by \\b[a-z0-9]{8}\\b如果您只需要数字和小写字母,请将其替换为\\b[a-z0-9]{8}\\b

You can then further check for existence of both digits AND letter, eg by using filter :然后,您可以进一步检查数字和字母是否存在,例如使用filter

list(filter(lambda s: re.search(r'[0-9]', s) and re.search(r'[az]', s), result))

result is what you get from re.findall() . result是你从re.findall()得到的。

So bottom line, I would use:所以最重要的是,我会使用:

list(filter(lambda s: re.search(r'[0-9]', s) and re.search(r'[az]', s), re.findall(r'\\b[a-z0-9]{8}\\b', str)))

First, let's find statement that finds words made of lowercase letters and digits that are 8 characters long:首先,让我们找到查找由 8 个字符长的小写字母和数字组成的单词的语句:

\b[a-z\d]{8}\b

Next condition is that the word must contain both letters and numbers:下一个条件是单词必须同时包含字母和数字:

[a-d]\d

Now for the challenging part, combining these into one statement.现在是具有挑战性的部分,将这些合并为一个语句。 Easiest way might be to just spit them up but we can use some look-aheads to get this to work:最简单的方法可能是把它们吐出来,但我们可以使用一些前瞻来让它工作:

\b(?=.*[a-z]\d)[a-z\d]{8}\b

Im sure there a tidier way of doing this but this will work.我确定有一种更整洁的方法可以做到这一点,但这会奏效。

A more compact solution than others have suggested is this:比其他人建议的更紧凑的解决方案是:

((?![A-Za-z]{8}|[0-9]{8})[0-9A-Za-z]{8})

This guarantees that the found matches are 8 characters in length and that they can not be only numeric or only alphabets.这保证找到的匹配项的长度为 8 个字符,并且它们不能只是数字或字母。

Breakdown:分解:

  • (?![A-Za-z]{8}|[0-9]{8}) = This is a negative lookahead that means the match can't be a string of 8 numbers or 8 alphabets. (?![A-Za-z]{8}|[0-9]{8}) = 这是一个负向前瞻,意味着匹配不能是 8 个数字或 8 个字母的字符串。
  • [0-9A-Za-z]{8} = Simple regex saying the input needs to be alphanumeric of 8 characters in length. [0-9A-Za-z]{8} = 简单的正则表达式,表示输入需要是长度为 8 个字符的字母数字。

Test Case:测试用例:

Input: 12345678 abcdefgh i8D0jT5Yu6Ms1GNmrmaUjicc1s9D93aQBj3WWWjww54gkiKqOd7Ytkl0MliJy9xadAgcev8b2UKdfGRDOpxRPm30dw9GeEz3WPRO 1234567890987654321 qwertyuiopasdfghjklzxcvbnm输入: 12345678 abcdefgh i8D0jT5Yu6Ms1GNmrmaUjicc1s9D93aQBj3WWWjww54gkiKqOd7Ytkl0MliJy9xadAgcev8b2UKdfGRDOpxRPm30dw9GeEz3WPRO 1234567890987654321 qwertyuiopasdfghjklzxcvbnm

import re

pattern = re.compile(r'((?![A-Za-z]{8}|\d{8})[A-Za-z\d]{8})')

test = input()
match = pattern.findall(test)
print(match)

Output: ['i8D0jT5Y', 'u6Ms1GNm', 'maUjicc1', 's9D93aQB', 'j3WWWjww', '54gkiKqO', 'd7Ytkl0M', 'liJy9xad', 'Agcev8b2', 'DOpxRPm3', '0dw9GeEz']输出: ['i8D0jT5Y', 'u6Ms1GNm', 'maUjicc1', 's9D93aQB', 'j3WWWjww', '54gkiKqO', 'd7Ytkl0M', 'liJy9xad', 'Agcev8b2', 'DOpxRPm3', '0dw9GeEz']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将正则表达式与无序的字母和数字字符串匹配 - Match regex with unordered string of alphabets and numbers 如何计算 python 中字符串中的特殊字符、字母和数字? - How to count special characters, Alphabets and numbers from a string in python? 在python中删除同时包含字母和数字的字符串 - remove string that contain both letters and numbers in python Python:字符串格式化使用'%'和'{'作为字符的正则表达式字符串 - Python: String formatting a regex string that uses both '%' and '{' as characters 当字符串包含正则表达式时,Python 编码特殊的 JSON 字符? - Python Encode Special JSON characters when string contain regex? Python 正则表达式匹配不包含字符串的行 - Python regex match a line that doesn't contain a string Python正则表达式匹配以单词开头,以4位数字结尾,不包含除@和%之外的特殊字符并且至少有10个字符的模式 - Python regex to match a pattern that starts with word, end with 4 digits, contain no special characters except @ and % and have atleast 10 characters 如何在python中使用正则表达式删除字母并提取数字? - How to remove alphabets and extract numbers using regex in python? python regex - 从带有数字和字符的字符串中提取数字 - python regex - extracting digits from string with numbers and characters Python RegEx帮助:将字符串拆分为数字,字符和空格 - Python RegEx help: Splitting string into numbers, characters and whitespace
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM