简体   繁体   中英

Python regex findall to read line in .csv file

I have a .csv file (or could happily be a .txt file) with some records in it:

JB74XYZ Kerry   Katona  44  Mansion_House   LV10YFB
WL67IAM William Iam 34  The_Voice_Street    LN44HJU

etc etc

I have used python to open and read the file, then regex findall (and attempted a similar regex rule) to identify a match:

import re
from re import findall

reg = "JB74XYZ"

with open("RegDD.txt","r")as file:
    data=file.read()
    search=findall(reg,data)

print (search)

which gives the resulting output:

['JB74XYZ']

I have tested this out, and it seems I have the regex findall working, in that it is correctly identifying a 'match' and returning it.

  1. My question is, how do I get the remaining content of the 'matched' lines to be returned as well? (eventually I will get this written into a new file, but for now I just want to have the matching line printed).

I have explored python dictionaries as one way of indexing things, but I hit a wall and got no further than the regex returning a positive result.

  1. I guess from this a second question might be: am I choosing the wrong approach altogether?

I hope I have been specific enough, first question here, and I have spent hours (not minutes) looking for specific solutions, and trying out a few ideas. I'm guessing that this is not an especially tricky concept, but I could do with a few hints if possible.

A better way to handle this would be to use Python's csv module. From the looks of your CSV, I'm guessing it's tab-delimited so I'm running off of that assumption.

import csv

match = "JB74XYZ"

matched_row = None
with open("RegDD.txt", "r") as file:
    # Read file as a CSV delimited by tabs.
    reader = csv.reader(file, delimiter='\t')
    for row in reader:
        # Check the first (0-th) column.
        if row[0] == match:
            # Found the row we were looking for.
            matched_row = row
            break

print(matched_row)

This should then output the following from matched_row :

['JB74XYZ', 'Kerry', 'Katona', '44', 'Mansion_House', 'LV10YFB']

I'd use the csv module , read in the file with the tab as delimiter, and then compare line by line. If there is a match in that line, append it to a results list.

If you want to read all the values in .csv file and save them in a dictionary with key as JB74XYZ and the details related to this. Then you can read this file line by line and just use split(" ") to get the list. Then you can easily make dictionary by just removing the first element from list and making it key and saving the remaining list as value of the dictionary. If you want to use regular expresssion, you should refer to this link: https://docs.python.org/3/library/re.html for extraction of details from your file and saving it in tuples.

You could try re.search or if you require it to be at the start, re.match . Both return a MatchObject with information about the operation, including access to the original string. For example, to get the remaining string:

import re

reg = "(JB74XYZ)"

with open("RegDD.txt","r")as file:
    for line in file:
        line = line.strip()
        match = re.match(reg,line.strip())
        if match:
            print (line[match.end():])

Note that I had to change the regex to a group, in order to tell re that I want to track the position of what I matched.

So, after looking at all the excellent replies, I ended up focusing (as advised by a few here) to look a csv module in a bit more detail. With some digging around I've ended up with this (and, tbh at this stage, I'm not sure how I did it exactly...):

import csv

reg="TS74UIO"
reader = csv.reader(open('T3.csv'))
row=0
for row in reader:
if row[0] == reg:
    print (row)
else:
    row=+1

and this resulted in an output that I think I'll be able to write to another file:

['TS74UIO', 'Kerry', 'Katona', '44', 'Mansion_House', 'LV10YFB']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM