简体   繁体   English

在 python 中使用正则表达式排除模式

[英]exclude a pattern using regex in python

I want to extract Name and number from a given string and save it into two lists.我想从给定的字符串中提取名称和数字并将其保存到两个列表中。

    str = 'Dhoni scored 100 runs and Kohli scored 150 runs.Rohit scored 50 runs and Dhawan scored 250 runs .'

I want to acheive:我想实现:

    name = ['Dhoni','Kohli','Rohit','Dhawan']
    values = ['100','150','50','250']

I tried to use negative-look ahead but did not succeed.我尝试使用负面展望但没有成功。 I am trying to use the approach as match a word then a number then again a word.我正在尝试使用这种方法来匹配一个单词,然后是一个数字,然后再匹配一个单词。 May be I am wrong in this approach.可能是我在这种方法上错了。 How this can be acheived?如何实现这一目标?

What I tried:我尝试了什么:

   pattern = r'^[A-Za-z]+\s(?!)[a-z]'
   print(re.findall(pattern,str))

You might use 2 capturing groups instead:您可以改用 2 个捕获组:

\b([A-Z][a-z]+)\s+scored\s+(\d+)\b

regex demo正则表达式演示

import re

pattern = r"\b([A-Z][a-z]+)\s+scored\s+(\d+)\b"
str = "Dhoni scored 100 runs and Kohli scored 150 runs.Rohit scored 50 runs and Dhawan scored 250 runs ."

matches = re.finditer(pattern, str)
name = []
values = []
for matchNum, match in enumerate(matches, start=1):
    name.append(match.group(1))
    values.append(match.group(2))

print(name)
print(values)

Output Output

['Dhoni', 'Kohli', 'Rohit', 'Dhawan']
['100', '150', '50', '250']

The pattern seems to be name scored value .该模式似乎是name scored value

>>> res = re.findall(r'(\w+)\s*scored\s*(\d+)', s)
>>> names, values = zip(*res)
>>> names
('Dhoni', 'Kohli', 'Rohit', 'Dhawan')
>>> values
('100', '150', '50', '250')
This code basically give extract of **Name** and **Number** from a given string and save it into two lists and then store in dictionary in a form of key value pair.
import re

x = 'Dhoni scored 100 runs and Kohli scored 150 runs.Rohit scored 50 runs and Dhawan scored 250 runs.'

names=re.findall(r'[A-Z][a-z]*',x)
values=re.findall(r'[0-9]+',x)
dicts={}
for i in range(len(names)):
    dicts[names[i]]=values[i]
    print(dicts)
#Input: Dhoni scored 100 runs and Kohli scored 150 runs.Rohit scored 50 runs and Dhawan scored 250 runs.
#Output: {'Dhoni': '100', 'Kohli': '150', 'Rohit': '50', 'Dhawan': '250'}

#Input: A has 5000 rupees and B has 15000 rupees.C has 85000 rupees and D has 50000 rupees .
#Output: {'A': '5000', 'B': '15000', 'C': '85000', 'D': '50000'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM