简体   繁体   中英

Need to extract specific pattern from a line in Python

Member DOB:2012-04-18!:MemberID:00000000!Filename:Source_Folder\Sample_File.pdf!:Language:English!:Member First Name:CONDA!:Member Last Name:LAVE!

The above given sample is a 1st line on a file, there can be n lines with different attributes and I have requirement to get the file name from "Filename:Source_Folder\Sample_File.pdf" from the sample line and check if that file (Sample_File.pdf) is present on another folder, and this has to happen for all the lines in that file. There can be change in that order.

I am beginner in Python any help would be appreciated, thanks in advance.

You can use regular expressions like this:

import re

line = 'Member DOB:2012-04-18!:MemberID:00000000!Filename:Source_Folder\Sample_File.pdf!:Language:English!:Member First Name:CONDA!:Member Last Name:LAVE!'

re.search('Filename:(.*?)!', line).group(1).split('\\')[1]

'Sample_File.pdf'

Here is the solution for a single line. You just need to add this in a loop for multiple lines.

string = "Member DOB:2012-04-18!:MemberID:00000000!Filename:Source_Folder\Sample_File.pdf!:Language:English!:Member First Name:CONDA!:Member Last Name:LAVE!"

str_arr = string.split(':')

for i in range(0,len(str_arr)):
    if "Filename" in str_arr[i]:
        path = str_arr[i+1]
        break;

if "!" in path:
    path = path[:-1]

print(path)

Explanation: First I am converting a string into an array of strings. Words are splitted by : . So the output of str_arr is ['Member DOB', '2012-04-18,', 'MemberID', '00000000.Filename', 'Source_Folder\\Sample_File,pdf,', 'Language', 'English,', 'Member First Name', 'CONDA!', 'Member Last Name', 'LAVE!']

After that I am iterating over that array and finding the Filename keyword string, and the path would be definitely the next word. And as you can see the above array, one ! is there in the path, an if condition is there to check and remove it.

Further if you want to check whether this file is a pdf or not, then you can add an if condition such as if "pdf" in path

using just string methods:

spam = "Member DOB:2012-04-18!:MemberID:00000000!Filename:Source_Folder\Sample_File.pdf!:Language:English!:Member First Name:CONDA!:Member Last Name:LAVE!"
member = dict(item[1:].split(':') if item.startswith(':') else item.split(':') for item in spam.split('!') if item)
print(member)
file_name = member['Filename']
print(file_name)

output

{'Member DOB': '2012-04-18', 'MemberID': '00000000', 'Filename': 'Source_Folder\\Sample_File.pdf', 'Language': 'English', 'Member First Name': 'CONDA', 'Member Last Name': 'LAVE'}
Source_Folder\Sample_File.pdf

There are some inconsistency with colons and exclamation marks in the input line and if they are fixed the list comprehension can be simplified.

import re
from pathlib import Path

RE_FILENAME = re.compile(r'Filename:(.*?)!')
TARGET_DIR = Path('path', 'to', 'target', 'dir')

with open(myfile) as fd:
    for line in fd:
        if match := RE_FILENAME.search(line):
            origin_path = Path(*match.group(1).split('\\')).resolve()
            target_path = TARGET_DIR / origin_path.parts[-1]
            if target_path.is_file():
                print(f"File {target_path} exists!")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM