简体   繁体   中英

Extract Data from file using python

Input File:

["abc","on time","date","<a href='link'>11111</a>","time","2","2"],

["abc","on time","date","<a href='link'>11111</a>","time","2","6"],

["abc","on time","date","<a href='link'>11111</a>","time","2","9"],

["abc","on time","date","<a href='link'>11111</a>","time","2","0"],

["abc","on time","date","<a href='link'>11111</a>","time","2","5"]

output to be needed:

abc,on time,date,<a href='link'>11111</a>,time,2,2

abc,on time,date,<a href='link'>11111</a>,time,2,6

abc,on time,date,<a href='link'>11111</a>,time,2,9

abc,on time,date,<a href='link'>11111</a>,time,2,0

abc,on time,date,<a href='link'>11111</a>,time,2,5

Code tried:

import sys
import re

Lines = [Line.strip() for Line in open (sys.argv[1],'r').readlines()]



for EachLine in Lines:
    Parts = EachLine.split(",")
    for EachPart in Parts:

        EachPart = re.sub(r'[', '', EachPart)
        EachPart = re.sub(r']', '', EachPart)
print ' '.join(Parts)

Can anyone help me on this?? I am not getting what i desired. Thanks in advance.

I modified your initial solution to

import sys
import re

Lines = [Line.strip() for Line in open (sys.argv[1],'r').readlines()]

for EachLine in Lines:
    matches = re.findall(r'\"(.+?)\"',EachLine)
    print ','.join(matches)

My approach is to use regex to get all string in double quotes.

As already mentioned, you can use eval() .

with open('a.txt') as f:
    for line in f:
        line = line.replace(',\n', '\n').strip() # remove if there is `,` at the end
        if line:                                 # to tackle with empty lines
            print(','.join(eval(line.strip())))

Input:

["abc","on time","date","<a href='link'>11111</a>","time","2","2"],

["abc","on time","date","<a href='link'>11111</a>","time","2","6"],

["abc","on time","date","<a href='link'>11111</a>","time","2","9"],

["abc","on time","date","<a href='link'>11111</a>","time","2","0"],

["abc","on time","date","<a href='link'>11111</a>","time","2","5"]

Output:

abc,on time,date,<a href='link'>11111</a>,time,2,2
abc,on time,date,<a href='link'>11111</a>,time,2,6
abc,on time,date,<a href='link'>11111</a>,time,2,9
abc,on time,date,<a href='link'>11111</a>,time,2,0
abc,on time,date,<a href='link'>11111</a>,time,2,5

Another option without using regex is:

for line in lines:
  formatted = ','.join(line).replace('"', '')
  print(formatted)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM