简体   繁体   中英

File read and processing using regular expressions

I have a massive file which is nothing but repeated units of such blocks:

//WAYNE ROONEY (wr10)
  90 [label="90"];
  90 -> 11 [weight=25];
  90 -> 21 [weight=23];
  90 -> 31 [weight=17];
  90 -> 41 [weight=12];
  90 -> 51 [weight=1];
  90 -> 62 [weight=50];
  90 -> 72 [weight=7];
  90 -> 82 [weight=27];
  90 -> 92 [weight=9];
  90 -> 102 [weight=43];

I need to convert in into a format that looks like this

90 11 25

ie i just need to remove all the extra stuff and just keep the number exactly the way they are.

I tried using regex, with this line of code:

for line in filein:
    match = re.search('label=" "', line)
    if match:
        print (match.group())

But it just prints all the instances of 'label' in the file. If i try to search for 'label=" "' , there is no output. If i could some how read the labels then reading the weights will be pretty similar to it.

How about this:

import re

file = open("file","r")                       

for line in file:                                 
    if re.search('->',line):
        print ' '.join(re.findall('[0-9]+',line))

Output:

90 11 25
90 21 23
90 31 17
90 41 12
90 51 1
90 62 50
90 72 7
90 82 27
90 92 9
90 102 43

Just redirect to save the output: python test.py > newfile

You can match all lines with something like:

  1. (\\d+) -> A number (backreference)
  2. \\s*->\\s* -> Space -> space
  3. (\\d+) -> Another number (backreference)
  4. \\s*\\[weight=\\" -> Space and the literal [weigth="
  5. (\\d+) -> Another number (backreference)
  6. \\]; -> Literal ]; to end the match.

Then you have a numbered backreferences like this:

  1. The first number
  2. The second number
  3. The third number

Now you can build your string with the pattern you want. ($1 $2 $3)

To get all the numbers from each line, use r'\\d+' together with .findall() :

for line in filein:
    if 'label' in line:
        print 'label:',
    print ' '.join(re.findall(r'\d', line))

It is not entirely clear what you want to do with the label lines, but the very simple loop would print out:

label: 90 90
90 11 25
90 21 23
90 31 17

etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM