简体   繁体   English

我有一个多行的文本文件。 如何在python中使用正则表达式从每一行中提取一部分?

[英]I have a text file with multiple lines. How can I extract a portion from each line using regex in python?

The line input is like this:线路输入是这样的:

-rw-r--r-- 1 jttoivon hyad-all   25399 Nov  2 21:25 exception_hierarchy.pdf

Required output is:所需的输出是:

25399 Nov  2 21:25 exception_hierarchy.pdf

which is size , month , day , hour , minute and filename respectively.分别是size , month , day , hour , minutefilename

The question asks to return a list of tuples (size, month, day, hour, minute, filename) using regular expressions to do this (either match , search , findall , or finditer method).该问题要求使用正则表达式( matchsearchfindallfinditer方法)返回元组列表(size, month, day, hour, minute, filename) )。

My code that I tried is -我试过的代码是 -

for line in range(1):
    line=f.readline()
x=re.findall(r'[^-]\d+\w+:\w+.*\w+_*',line)
    print (x)

My output - [' 21:25 add_colab_link.py']

please have a read of the following example on how to ask great questions: How to make a great R reproducible example请阅读以下有关如何提出好问题的示例: How to make a great R reproducible example

I answer your question because not long ago I did the same mistakes and I was happy if someone still answered.我回答你的问题,因为不久前我犯了同样的错误,如果有人仍然回答我很高兴。

import re  # import of regular expression library

# I just assume you had three of those pieces in one list:
my_list = ["-rw-r--r-- 1 jttoivon hyad-all 12345 Nov 2 21:25 exception_hierarchy.pdf", "-rw-r--r-- 1 jttoivon hyad-all 25399 Nov 2 21:25 exception_hierarchy.pdf", "-rw-r--r-- 1 jttoivon hyad-all 98765 Nov 2 21:25 exception_hierarchy.pdf"]

# I create a new list to store the results in
new_list = []

# I produce this loop to go through every piece in the list:
for x in my_list:
    y = re.findall("([0-9]{5}.+pdf)", x) # you can check the meaning of the symbols with a simple google search
    for thing in y:
        a, b, c, d, e = thing.split(" ")
        g, h = d.split(":")
        z = (a, b, c, g, h, e)
        new_list.append(z)

print(new_list)

Here's a working example using regular expressions thanks to package re :由于包re这是一个使用正则表达式的工作示例:

>>> import re
>>> line = "-rw-r--r-- 1 jttoivon hyad-all   25399 Nov  2 21:25 exception_hierarchy.pdf"
>>> pattern = r"([\d]+)\s+([A-z]+)\s+(\d{1,2})\s+(\d{1,2}):(\d{1,2})\s+(.+)$"
>>> output_tuple = re.findall(pattern, line)[0]
>>> print(output_tuple)
('25399', 'Nov', '2', '21', '25', 'exception_hierarchy.pdf')
>>> size, month, day, hour, minute, filename = output_tuple

Most of the logic is encoded in the raw pattern variable.大多数逻辑都编码在原始pattern变量中。 It's very easy though if you look at it piece by piece.如果你一块一块地看,这很容易。 See below, with new lines to help you read through:见下文,用新行帮助您通读:

([\d]+)    # means basically group of digits (size)
\s+        # means one or more spaces
([A-z]+)   # means one or more letter (month)
\s+        # means one or more spaces
(\d{1,2})  # one or two digits (day)
\s+        # means one or more spaces
(\d{1,2})  # one or two digits (hour)
:          # looking for a ':'
(\d{1,2})  # one or two digits (minute)
\s+        # means one or more spaces
(.+)       # anything basically
$          # until the string ends

By the way, here's a working example not using re (because it's actually not mandatory here):顺便说一下,这是一个不使用re的工作示例(因为这里实际上不是强制性的):

>>> line = "-rw-r--r-- 1 jttoivon hyad-all   25399 Nov  2 21:25 exception_hierarchy.pdf"
>>> size, month, day, hour_minute, filename = line.split("hyad-all")[1].strip().split()
>>> hour, minute = hour_minute.split(":")
>>> print(size, month, day, hour, minute, filename)
25399 Nov 2 21 25 exception_hierarchy.pdf

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Regex:如何使用正则表达式读取多行文件,并从每行中提取单词以创建两个不同的列表 - Python Regex: How do I use regular expression to read in a file with multiple lines, and extract words from each line to create two different lists 使用正则表达式从文件中提取文本的一部分 - using regex to extract a portion of text from a file 如何使用正则表达式从文件中提取文本? - How can I extract text from a file using regex? 如何从python中的文本文件的多行提取两个特定数字 - How can I extract two specific numbers from multiple line of a text file in python 如何从文件中提取文本行? - How can I extract lines of text from a file? 如何使用sed -n在Python中从文本文件中提取一系列行? - How do I extract a range of lines from a text file using sed -n but in Python? 如何在java中使用正则表达式提取多行? - How can I extract multiple lines with regex in java? 如何使用正则表达式或子字符串从字符串中提取文本? - How can I extract text from a string using regex or substring? 如何使用 Perl 正则表达式提取多行代码? - How do I extract multiple lines of code using Perl regex? 如何在python中使用正则表达式删除文本文件中的多行? - How to remove multiple lines in a text file using regex in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM