简体   繁体   English

Python正则表达式:循环浏览目录中每个文件的第一行

[英]Python Regex: Loop through first line of each file in directory

I want to loop through .txt files and use the date (eg April 1, 1993) from the first line in that file. 我想循环遍历.txt文件,并使用该文件第一行中的日期(例如1993年4月1日)。

This code works, but matches through the entire file and not just the first line (note: the code Im showing below shows more than just the date matching loop): 该代码可以工作,但可以匹配整个文件,而不仅限于第一行(注意:下面显示的代码Im不仅显示日期匹配循环,还显示更多内容):

Script below is updated and it works: 以下脚本已更新,并且可以正常工作:

articles = glob.glob("*.txt")
y = 1

for f in articles:
    with open(f, "r") as content:
        wordcount = "x"
        lines = content.readlines()
        for line in lines :
            if line[0:7] == "LENGTH:":
                lineclean = re.sub('[#%&\<>*?:/{}$@+|=]', '', line)
                wordcount = lineclean[7:13]
                if wordcount[5] == "w":
                    wordcount = wordcount[0:4]
                elif wordcount[4] == "w":
                    wordcount = wordcount[0:3]
                elif wordcount[3] == "w":
                    wordcount =  wordcount[0:2]
                elif wordcount[2] == "w":
                    wordcount =  wordcount[0:1]
    with open(f, "r") as content:
        first_line = next(content)
        try:
            import re
            match = re.search('(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}', first_line).group()
        except:
            pass           
        from dateutil import parser  
        parsed_pubdate = parser.parse(match).strftime('%Y-%m-%d')                  
    try:
        if wordcount != "x":
            move(f, "{parsed_pubdate}_{wordcount}_{source}.txt".format(**locals()))
        else:
            pass
    except OSError:
        pass
    y += 1
    content.close() 

In order to match dates only in the first line of the file, I add ^\\s and flags=re.MULTILINE , so I get: 为了仅在文件的第一行中匹配日期,我添加了^\\sflags=re.MULTILINE ,所以得到:

match = re.search('^\s(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?
|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?
|Dec(ember)?)\s+\d{1,2},\s+\d{4}', line, flags=re.MULTILINE).group()

However, now the program only uses one date (the date of the last file in the folder) and uses that for every file (so every file gets the same date, while the dates vary in the original .txt files). 但是,现在该程序仅使用一个日期(文件夹中最后一个文件的日期),并对每个文件使用该日期(因此每个文件都具有相同的日期,而原始.txt文件中的日期有所不同)。

I uncluded the entire step this loop is part of, but my problem only applies to the regex date matching loop. 我取消了此循环所包含的整个步骤,但是我的问题仅适用于regex日期匹配循环。 Thanks in advance for your help! 在此先感谢您的帮助!

articles = glob.glob("*.txt")
y = 1

for f in articles:
    with open(f, "r") as content:
        wordcount = "x"
        lines = content.readlines()
        for line in lines :
            if line[0:7] == "LENGTH:":
                lineclean = re.sub('[#%&\<>*?:/{}$@+|=]', '', line)
                wordcount = lineclean[7:13]
                if wordcount[5] == "w":
                    wordcount = wordcount[0:4]
                elif wordcount[4] == "w":
                    wordcount = wordcount[0:3]
                elif wordcount[3] == "w":
                    wordcount =  wordcount[0:2]
                elif wordcount[2] == "w":
                    wordcount =  wordcount[0:1]
    with open(f, "r") as content:
        first_line = next(content)
        try:
            import re
            match = re.search('(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}', first_line).group()
        except:
            pass           
        from dateutil import parser  
        parsed_pubdate = parser.parse(match).strftime('%Y-%m-%d')                  
    try:
        if wordcount != "x":
            move(f, "{parsed_pubdate}_{wordcount}_{source}.txt".format(**locals()))
        else:
            pass
    except OSError:
        pass
    y += 1
    content.close() 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用python遍历给定目录中的文件,逐行读取每个文件并删除该行中的第一个和最后一个字符串并保存更新的文件 - Go through files in given directory with python, read each file line by line and remove first and last string in the line and save updated file Python - 遍历目录并在每个文件上调用函数 - Python - Loop through directory and call function on each file 在Python中迭代导入的文本文件的每一行的前几位 - Iterating through the first digits in each line of an imported text file in Python Python:循环浏览文件夹并从每个文件的第一个选项卡保存数据并在单独的选项卡上保存到新文件 - Python: Loop through a folder and save data from first tab of each file and save to new file on separate tabs python 未附加到目录中文件的每一行 - python not appending to each line in file in directory 使用Python为目录中的每个文件运行for循环 - Run the for loop for each file in directory using Python 替换文件 python 中每一行的第一个字符 - replace first character of each line in file python 如何循环和索引 python 中的文件内容并将每一行分配给不同的变量 - How to loop and index through file content in python and assign each line for different variable 循环浏览目录的文件夹并在python中的每个文件夹之后创建输出 - Loop through folders of a directory and create an output after each one in python python -regex匹配和逐行运行文件的循环 - python -regex match and for loop that run file line by line
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM