I'm trying to extract the first number that appears in the first line of my text file. I'm a noob, so I'm playing around with regex. The issue I have is nothing is printing, so i'm not sure if it's my code or something else?
I've tried printing my file names too and nothing happens either so i'm not sure whats going on
work_dir = "User/...my folder of 9 text files"
for path in glob.glob(os.path.join(work_dir, "*.txt")):
with io.open(path, mode="r", encoding="utf-8") as file:
first_line = file.readline()
for line[34:] in first_line:
if "LOCUS" in line[0:34]:
matches = int(re.search(r"(\d+)", first_line).group(0))
print(matches)
name = os.path.basename(path).replace(".gbff", "")
print(name)
Here's the head of an example of the types of files im working with. It's a text file even though it looks like a table here.
LOCUS AE017334 *5227419* bp DNA circular BCT 03-DEC-2015
DEFINITION Bacillus anthracis str. 'Ames Ancestor', complete genome.
ACCESSION AE017334
VERSION AE017334.2
DBLINK BioProject: PRJNA10784
BioSample: SAMN02603433
I need the number I've put ** around
I actually got output for your regex and text format, and its working fine with slicing and other stuff u mentioned, so its not the regex or the for loop part, since u are saying its not printing anything and i am assuming its not printing out errors too i think it has something to do with your path or directory readings.
anyways here's your regex part: f
first_line='LOCUS AE017334 *5227419* bp DNA circular BCT 03-DEC-2015'
matches = int(re.search(r"(\d+)", first_line[34:]).group(0))
print(matches)
output:
5227419
posting this so others trying to answer can skip these steps and check into the other parts of your code
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.