I have a massive file which is nothing but repeated units of such blocks:
//WAYNE ROONEY (wr10)
90 [label="90"];
90 -> 11 [weight=25];
90 -> 21 [weight=23];
90 -> 31 [weight=17];
90 -> 41 [weight=12];
90 -> 51 [weight=1];
90 -> 62 [weight=50];
90 -> 72 [weight=7];
90 -> 82 [weight=27];
90 -> 92 [weight=9];
90 -> 102 [weight=43];
I need to convert in into a format that looks like this
90 11 25
ie i just need to remove all the extra stuff and just keep the number exactly the way they are.
I tried using regex, with this line of code:
for line in filein:
match = re.search('label=" "', line)
if match:
print (match.group())
But it just prints all the instances of 'label'
in the file. If i try to search for 'label=" "'
, there is no output. If i could some how read the labels then reading the weights will be pretty similar to it.
How about this:
import re
file = open("file","r")
for line in file:
if re.search('->',line):
print ' '.join(re.findall('[0-9]+',line))
Output:
90 11 25
90 21 23
90 31 17
90 41 12
90 51 1
90 62 50
90 72 7
90 82 27
90 92 9
90 102 43
Just redirect to save the output: python test.py > newfile
You can match all lines with something like:
(\\d+)
-> A number (backreference) \\s*->\\s*
-> Space -> space (\\d+)
-> Another number (backreference) \\s*\\[weight=\\"
-> Space and the literal [weigth=" (\\d+)
-> Another number (backreference) \\];
-> Literal ]; to end the match. Then you have a numbered backreferences like this:
Now you can build your string with the pattern you want. ($1 $2 $3)
To get all the numbers from each line, use r'\\d+'
together with .findall()
:
for line in filein:
if 'label' in line:
print 'label:',
print ' '.join(re.findall(r'\d', line))
It is not entirely clear what you want to do with the label
lines, but the very simple loop would print out:
label: 90 90
90 11 25
90 21 23
90 31 17
etc.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.