简体   繁体   English

Python正则表达式可以读取.csv文件中的行

[英]Python regex findall to read line in .csv file

I have a .csv file (or could happily be a .txt file) with some records in it: 我有一个.csv文件(或者很可能是一个.txt文件),里面有一些记录:

JB74XYZ Kerry   Katona  44  Mansion_House   LV10YFB
WL67IAM William Iam 34  The_Voice_Street    LN44HJU

etc etc 等等

I have used python to open and read the file, then regex findall (and attempted a similar regex rule) to identify a match: 我使用python打开并读取文件,然后regex findall(并尝试使用类似的正则规则)来识别匹配:

import re
from re import findall

reg = "JB74XYZ"

with open("RegDD.txt","r")as file:
    data=file.read()
    search=findall(reg,data)

print (search)

which gives the resulting output: 得出结果输出:

['JB74XYZ']

I have tested this out, and it seems I have the regex findall working, in that it is correctly identifying a 'match' and returning it. 我已经测试了这一点,似乎我有正则表达式找到了,因为它正确识别'匹配'并返回它。

  1. My question is, how do I get the remaining content of the 'matched' lines to be returned as well? 我的问题是,我如何获得“匹配”行的剩余内容? (eventually I will get this written into a new file, but for now I just want to have the matching line printed). (最终我会将其写入一个新文件,但是现在我只想打印匹配的行)。

I have explored python dictionaries as one way of indexing things, but I hit a wall and got no further than the regex returning a positive result. 我已经将python词典作为索引事物的一种方式进行了探索,但是我找到了一块墙,并且正如我所说的那样,正则表达式取得了积极的结果。

  1. I guess from this a second question might be: am I choosing the wrong approach altogether? 我想这可能是第二个问题:我选择了错误的做法吗?

I hope I have been specific enough, first question here, and I have spent hours (not minutes) looking for specific solutions, and trying out a few ideas. 我希望我已经足够具体,第一个问题在这里,我花了几个小时(而不是几分钟)寻找具体的解决方案,并尝试了一些想法。 I'm guessing that this is not an especially tricky concept, but I could do with a few hints if possible. 我猜这不是一个特别棘手的概念,但如果可能的话,我可以做一些提示。

A better way to handle this would be to use Python's csv module. 处理这个的更好方法是使用Python的csv模块。 From the looks of your CSV, I'm guessing it's tab-delimited so I'm running off of that assumption. 从你的CSV的外观来看,我猜它是用制表符分隔的,所以我就没有这个假设了。

import csv

match = "JB74XYZ"

matched_row = None
with open("RegDD.txt", "r") as file:
    # Read file as a CSV delimited by tabs.
    reader = csv.reader(file, delimiter='\t')
    for row in reader:
        # Check the first (0-th) column.
        if row[0] == match:
            # Found the row we were looking for.
            matched_row = row
            break

print(matched_row)

This should then output the following from matched_row : 然后应该从matched_row输出以下matched_row

['JB74XYZ', 'Kerry', 'Katona', '44', 'Mansion_House', 'LV10YFB']

I'd use the csv module , read in the file with the tab as delimiter, and then compare line by line. 我使用csv模块 ,使用选项卡作为分隔符读入文件,然后逐行比较。 If there is a match in that line, append it to a results list. 如果该行中存在匹配项,请将其附加到结果列表中。

If you want to read all the values in .csv file and save them in a dictionary with key as JB74XYZ and the details related to this. 如果要读取.csv文件中的所有值,并将其保存在字典中,密钥为JB74XYZ并且与此相关的详细信息。 Then you can read this file line by line and just use split(" ") to get the list. 然后你可以逐行阅读这个文件,只需使用split(" ")来获取列表。 Then you can easily make dictionary by just removing the first element from list and making it key and saving the remaining list as value of the dictionary. 然后,您可以通过从列表中删除第一个元素并使其成为键来轻松创建字典,并将剩余列表保存为字典的值。 If you want to use regular expresssion, you should refer to this link: https://docs.python.org/3/library/re.html for extraction of details from your file and saving it in tuples. 如果你想使用常规表达,你应该参考这个链接: https ://docs.python.org/3/library/re.html从你的文件中提取细节并将其保存在元组中。

You could try re.search or if you require it to be at the start, re.match . 您可以尝试re.search或者如果您需要它在开始时,请re.match Both return a MatchObject with information about the operation, including access to the original string. 两者都返回MatchObject ,其中包含有关操作的信息,包括对原始字符串的访问。 For example, to get the remaining string: 例如,要获取剩余的字符串:

import re 进口重新

reg = "(JB74XYZ)"

with open("RegDD.txt","r")as file:
    for line in file:
        line = line.strip()
        match = re.match(reg,line.strip())
        if match:
            print (line[match.end():])

Note that I had to change the regex to a group, in order to tell re that I want to track the position of what I matched. 请注意,我不得不正则表达式更改为一组,以告诉re说我要追踪的是我相匹配的位置。

So, after looking at all the excellent replies, I ended up focusing (as advised by a few here) to look a csv module in a bit more detail. 因此,在查看了所有优秀的回复之后,我最终集中注意力(这里有一些建议)来更详细地查看csv模块。 With some digging around I've ended up with this (and, tbh at this stage, I'm not sure how I did it exactly...): 随着一些挖掘,我最终得到了这个(并且,在这个阶段,我不确定我是怎么做到的......):

import csv

reg="TS74UIO"
reader = csv.reader(open('T3.csv'))
row=0
for row in reader:
if row[0] == reg:
    print (row)
else:
    row=+1

and this resulted in an output that I think I'll be able to write to another file: 这导致输出我认为我将能够写入另一个文件:

['TS74UIO', 'Kerry', 'Katona', '44', 'Mansion_House', 'LV10YFB']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM