简体   繁体   中英

exclude a pattern using regex in python

I want to extract Name and number from a given string and save it into two lists.

    str = 'Dhoni scored 100 runs and Kohli scored 150 runs.Rohit scored 50 runs and Dhawan scored 250 runs .'

I want to acheive:

    name = ['Dhoni','Kohli','Rohit','Dhawan']
    values = ['100','150','50','250']

I tried to use negative-look ahead but did not succeed. I am trying to use the approach as match a word then a number then again a word. May be I am wrong in this approach. How this can be acheived?

What I tried:

   pattern = r'^[A-Za-z]+\s(?!)[a-z]'
   print(re.findall(pattern,str))

You might use 2 capturing groups instead:

\b([A-Z][a-z]+)\s+scored\s+(\d+)\b

regex demo

import re

pattern = r"\b([A-Z][a-z]+)\s+scored\s+(\d+)\b"
str = "Dhoni scored 100 runs and Kohli scored 150 runs.Rohit scored 50 runs and Dhawan scored 250 runs ."

matches = re.finditer(pattern, str)
name = []
values = []
for matchNum, match in enumerate(matches, start=1):
    name.append(match.group(1))
    values.append(match.group(2))

print(name)
print(values)

Output

['Dhoni', 'Kohli', 'Rohit', 'Dhawan']
['100', '150', '50', '250']

The pattern seems to be name scored value .

>>> res = re.findall(r'(\w+)\s*scored\s*(\d+)', s)
>>> names, values = zip(*res)
>>> names
('Dhoni', 'Kohli', 'Rohit', 'Dhawan')
>>> values
('100', '150', '50', '250')
This code basically give extract of **Name** and **Number** from a given string and save it into two lists and then store in dictionary in a form of key value pair.
import re

x = 'Dhoni scored 100 runs and Kohli scored 150 runs.Rohit scored 50 runs and Dhawan scored 250 runs.'

names=re.findall(r'[A-Z][a-z]*',x)
values=re.findall(r'[0-9]+',x)
dicts={}
for i in range(len(names)):
    dicts[names[i]]=values[i]
    print(dicts)
#Input: Dhoni scored 100 runs and Kohli scored 150 runs.Rohit scored 50 runs and Dhawan scored 250 runs.
#Output: {'Dhoni': '100', 'Kohli': '150', 'Rohit': '50', 'Dhawan': '250'}

#Input: A has 5000 rupees and B has 15000 rupees.C has 85000 rupees and D has 50000 rupees .
#Output: {'A': '5000', 'B': '15000', 'C': '85000', 'D': '50000'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM