[英]Python Regex: Loop through first line of each file in directory
I want to loop through .txt files and use the date (eg April 1, 1993) from the first line in that file. 我想循环遍历.txt文件,并使用该文件第一行中的日期(例如1993年4月1日)。
This code works, but matches through the entire file and not just the first line (note: the code Im showing below shows more than just the date matching loop): 该代码可以工作,但可以匹配整个文件,而不仅限于第一行(注意:下面显示的代码Im不仅显示日期匹配循环,还显示更多内容):
Script below is updated and it works: 以下脚本已更新,并且可以正常工作:
articles = glob.glob("*.txt")
y = 1
for f in articles:
with open(f, "r") as content:
wordcount = "x"
lines = content.readlines()
for line in lines :
if line[0:7] == "LENGTH:":
lineclean = re.sub('[#%&\<>*?:/{}$@+|=]', '', line)
wordcount = lineclean[7:13]
if wordcount[5] == "w":
wordcount = wordcount[0:4]
elif wordcount[4] == "w":
wordcount = wordcount[0:3]
elif wordcount[3] == "w":
wordcount = wordcount[0:2]
elif wordcount[2] == "w":
wordcount = wordcount[0:1]
with open(f, "r") as content:
first_line = next(content)
try:
import re
match = re.search('(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}', first_line).group()
except:
pass
from dateutil import parser
parsed_pubdate = parser.parse(match).strftime('%Y-%m-%d')
try:
if wordcount != "x":
move(f, "{parsed_pubdate}_{wordcount}_{source}.txt".format(**locals()))
else:
pass
except OSError:
pass
y += 1
content.close()
In order to match dates only in the first line of the file, I add ^\\s
and flags=re.MULTILINE
, so I get: 为了仅在文件的第一行中匹配日期,我添加了^\\s
和flags=re.MULTILINE
,所以得到:
match = re.search('^\s(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?
|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?
|Dec(ember)?)\s+\d{1,2},\s+\d{4}', line, flags=re.MULTILINE).group()
However, now the program only uses one date (the date of the last file in the folder) and uses that for every file (so every file gets the same date, while the dates vary in the original .txt files). 但是,现在该程序仅使用一个日期(文件夹中最后一个文件的日期),并对每个文件使用该日期(因此每个文件都具有相同的日期,而原始.txt文件中的日期有所不同)。
I uncluded the entire step this loop is part of, but my problem only applies to the regex date matching loop. 我取消了此循环所包含的整个步骤,但是我的问题仅适用于regex日期匹配循环。 Thanks in advance for your help! 在此先感谢您的帮助!
articles = glob.glob("*.txt")
y = 1
for f in articles:
with open(f, "r") as content:
wordcount = "x"
lines = content.readlines()
for line in lines :
if line[0:7] == "LENGTH:":
lineclean = re.sub('[#%&\<>*?:/{}$@+|=]', '', line)
wordcount = lineclean[7:13]
if wordcount[5] == "w":
wordcount = wordcount[0:4]
elif wordcount[4] == "w":
wordcount = wordcount[0:3]
elif wordcount[3] == "w":
wordcount = wordcount[0:2]
elif wordcount[2] == "w":
wordcount = wordcount[0:1]
with open(f, "r") as content:
first_line = next(content)
try:
import re
match = re.search('(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}', first_line).group()
except:
pass
from dateutil import parser
parsed_pubdate = parser.parse(match).strftime('%Y-%m-%d')
try:
if wordcount != "x":
move(f, "{parsed_pubdate}_{wordcount}_{source}.txt".format(**locals()))
else:
pass
except OSError:
pass
y += 1
content.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.