简体   繁体   中英

matching and splitting

I have a file1 with contents:

abc_1 (qst_0) bndk
cgn32 (mn_r_1) mncp
 dmj_2 (yst) pr1f

I want to match and split the file line by line. for which I use the following code:

path = sys.argv[1]
 with open(path) as f:
  data = f.read()
 unit = re.split(r"(.+\(.*\).+)", data)
 print(*unit)

It is able to split the first 2 lines, but in the 3rd line it gives an error saying IndentationError: Unexpected Indent at line 3 of file1. Could you someone help me out?

You can try this:

with open(path) as f:
    data = f.read()
unit = [line.strip().split(" ") for line in data.split("\n")]
print(unit)

output:

[['abc_1', '(qst_0)', 'bndk'],
 ['cgn32', '(mn_r_1)', 'mncp'],
 ['dmj_2', '(yst)', 'pr1f']]

What is an indentation error?

  • Indentation error in python refers to the wrong syntax with respect to the spaces.

Here in line 3 data=f.read() you haven't followed the syntax properly. So in this case your code hasn't even executed a single line from your input file.

Make sure that you've 4 spaces inside a block, while using python. The following should work.

import re
import sys

path = sys.argv[1]

with open(path) as fp:
    for line in fp:
        print (re.split(r"(.+\(.*\).+)", line))

(or)

import re
import sys

path = sys.argv[1]

with open(path) as fp:
    split_lines = [re.split(r"(.+\(.*\).+)", line) for line in fp]

print(split_lines)    

NOTE:

  • You haven't mentioned on what basis you want to split the lines. Is it by spaces, "_" or ")"?
  • Your current regex doesn't do it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM