简体   繁体   中英

Using a RegEx to find groups of numbers, replace with only the last member of the group

I have a csv file that's formatted like this (only relevant row shown):

Global equity - 45%/45.1%
Private Investments - 25%/21%
Hedge Funds - 17.5%/18.1%
Bonds & cash - 12.5%/15.3%

I wrote a regex to find each occurrence of numbers (ie the 45%/45.1%, etc), and I'm trying to write it such that it keeps just the number after the slash mark. Here's what I have written:

with open('sheet.csv','rU') as f:
    rdr = csv.DictReader(f,delimiter=',')
    row1 = next(rdr)
    assets = str(row1['Asset Allocation '])
    finnum = re.sub(r'(\/[0-9]+.)','#This is where I want to replace with just the numbers after the slash',assets)
    print(finnum)

desired output:

Global equity - 45.1%
Private Investments - 21%
etc...

Is this even possible if I don't know the indices of the numbers I want?

You can try this ('\\d+%/') regexp to remove useless data.

import re

string = 'Global equity - 45%/45.1%'
re.sub(r'\d+%/', '', string) # 'Global equity - 45.1%'

If specifically looking for that pattern, you could use a replace and concat based on groups:

replace = lambda s: s.group(1) + ' ' + s.group(3)
re.sub(r'(.*) (\d+%/)(\d+%)', replace, 'Hedge Funds - 17.5%/18.1%')

Then there is a simple removal of unwanted:

val = 'Hedge Funds - 17.5%/18.1%'
re.sub(r'\d+%/', '', val)

Or, if you don't want to use regex:

val = 'Hedge Funds - 17.5%/18.1%'
replaced = val[0:val.find(' - ')] + ' - ' + val[val.find('%/') + 2:]

If you do not want to substitute and needed the values for use in other parts of the code. you could:

import re

cleanup = re.compile(r"(^.+?)-\s.+?\/(.+?)$",re.MULTILINE)
f = open(file_name, 'r')
text = f.read()
for match in cleanup.finditer(text):
    print match.group(1),match.group(2)

You can also group what's before the first number and after / :

import re

s = 'Hedge Funds - 17.5%/18.1%'
print re.sub('(.*-) .*/(.*)', '\g<1> \g<2>', s)

Output:

Hedge Funds - 18.1%

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM