简体   繁体   中英

How to read between 2 specific lines in python

I'm having a variable which holds the contents that is somewhat similar to this

**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Main_data1;a;b;c;dss;e;1
Main_data2;aa;bb;sdc;d;e;2
Main_data3;aaa;bbb;ccce;d;e;3
Main_data4;aaaa;bbbb;cc;d;e;4
Main_data5;aaaaa;bbbbb;cccc;d;e;5
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****

I want to read data that starts with Main_data1.{ Read only the last column and store it into a list} . Please note that this is a variable that holds this data and this is not a file.

My Desired Output:

Some_list=[1,2,3,4,5]

I thought of using something like this.

for line in var_a.splitlines():
     if Main_data1 in line:
        print (line)

But there are more than 200 lines from which I need to read the last column. What could be an efficient way of doing this

Check if line starts with "Main_data" than split by semi-colon ; and choose the last element by index -1 :

some_list = []
for line in var_a.split("\n"):
     if line.startswith("Main_data"):
          some_list.append(int(line.split(";")[-1]))

You can use a list comprehension to store the numbers :

my_list = [int(line.strip().split(';')[-1]) for line in my_var.split('\n') if line.startswith('Main_data5')]

Also note that as a more pyhtonic way you better to use str.startswith() method rather than in operator. (with regards to this poing that it might happen to one line has Main_data5 in the middle of the line!)

If you have two case for start of the line you can use an or operator with two startswith consition.

my_list = [int(line.strip().split(';')[-1]) for line in my_var.split('\n') if line.startswith('Main_data5') or line.startswith('Main_data1')]

But if you have more key-words you can use regex.For example if you want to match all the linse that stats with Main_data and followed by a number you can use re.match() :

import re
my_list = [int(line.strip().split(';')[-1]) for line in my_var.split('\n') if re.match(r'Main_data\d.*',line)]
 my_list = []
 for line in my_var.strip().split('\n):
     if "Main_data1" in line:
         my_list.append(int(line.split(";")[-1]))
     else:
         continue

Or you can use the startswith('match)' function like someone mentioned.

My approach is regex since it can control over pattern more-

File content

**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Main_data1;a;b;c;dss;e;1
Main_data2;aa;bb;sdc;d;e;2
Main_data3;aaa;bbb;ccce;d;e;3
Main_data4;aaaa;bbbb;cc;d;e;4
Main_data5;aaaaa;bbbbb;cccc;d;e;523233
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Main_data1;a;b;c;dss;e;1
Main_data2;aa;bb;sdc;d;e;2
Main_data3;aaa;bbb;ccce;d;e;3
Main_data4;aaaa;bbbb;cc;d;e;4
Main_data5;aaaaa;bbbbb;cccc;d;e;523233
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ******** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Main_data1;a;b;c;dss;e;1
Main_data2;aa;bb;sdc;d;e;2
Main_data3;aaa;bbb;ccce;d;e;3
Main_data4;aaaa;bbbb;cc;d;e;4
Main_data5;aaaaa;bbbbb;cccc;d;e;523233
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ******** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****

Code

import re

fl = open(r"C:\text.txt",'rb')
pattern = r'Main_data.*(?<=;)([0-9]{1,})'
data = []
for line in fl.readlines():
    #match all  the digits that have ; before and line starts with Main_data
    if re.search(pattern, line, re.IGNORECASE | re.MULTILINE):
        data.append(re.search(pattern, line, re.IGNORECASE | re.MULTILINE).group(1))
    else:
        data.append('N')
strng = ','.join(data)#get string of the list

lsts = re.findall(r'(?<=,)[0-9,]+(?=,)',strng)# extracts values and excludes 'N'

outpt = [i.split(',') for i in lsts]# generate final list

print outpt

Output

[['1', '2', '3', '4', '523233'], ['1', '2', '3', '4', '523233'], ['1', '2', '3', '4', '523233']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM