Need to extract specific pattern from a line in Python

Question

Member DOB:2012-04-18!:MemberID:00000000!Filename:Source_Folder\Sample_File.pdf!:Language:English!:Member First Name:CONDA!:Member Last Name:LAVE!

The above given sample is a 1st line on a file, there can be n lines with different attributes and I have requirement to get the file name from "Filename:Source_Folder\Sample_File.pdf" from the sample line and check if that file (Sample_File.pdf) is present on another folder, and this has to happen for all the lines in that file. There can be change in that order.

I am beginner in Python any help would be appreciated, thanks in advance.

Answer 1

You can use regular expressions like this:

import re

line = 'Member DOB:2012-04-18!:MemberID:00000000!Filename:Source_Folder\Sample_File.pdf!:Language:English!:Member First Name:CONDA!:Member Last Name:LAVE!'

re.search('Filename:(.*?)!', line).group(1).split('\\')[1]

'Sample_File.pdf'

Answer 2

Here is the solution for a single line. You just need to add this in a loop for multiple lines.

string = "Member DOB:2012-04-18!:MemberID:00000000!Filename:Source_Folder\Sample_File.pdf!:Language:English!:Member First Name:CONDA!:Member Last Name:LAVE!"

str_arr = string.split(':')

for i in range(0,len(str_arr)):
    if "Filename" in str_arr[i]:
        path = str_arr[i+1]
        break;

if "!" in path:
    path = path[:-1]

print(path)

Explanation: First I am converting a string into an array of strings. Words are splitted by : . So the output of str_arr is ['Member DOB', '2012-04-18,', 'MemberID', '00000000.Filename', 'Source_Folder\\Sample_File,pdf,', 'Language', 'English,', 'Member First Name', 'CONDA!', 'Member Last Name', 'LAVE!']

After that I am iterating over that array and finding the Filename keyword string, and the path would be definitely the next word. And as you can see the above array, one ! is there in the path, an if condition is there to check and remove it.

Further if you want to check whether this file is a pdf or not, then you can add an if condition such as if "pdf" in path

Answer 3

using just string methods:

spam = "Member DOB:2012-04-18!:MemberID:00000000!Filename:Source_Folder\Sample_File.pdf!:Language:English!:Member First Name:CONDA!:Member Last Name:LAVE!"
member = dict(item[1:].split(':') if item.startswith(':') else item.split(':') for item in spam.split('!') if item)
print(member)
file_name = member['Filename']
print(file_name)

output

{'Member DOB': '2012-04-18', 'MemberID': '00000000', 'Filename': 'Source_Folder\\Sample_File.pdf', 'Language': 'English', 'Member First Name': 'CONDA', 'Member Last Name': 'LAVE'}
Source_Folder\Sample_File.pdf

There are some inconsistency with colons and exclamation marks in the input line and if they are fixed the list comprehension can be simplified.

Answer 4

import re
from pathlib import Path

RE_FILENAME = re.compile(r'Filename:(.*?)!')
TARGET_DIR = Path('path', 'to', 'target', 'dir')

with open(myfile) as fd:
    for line in fd:
        if match := RE_FILENAME.search(line):
            origin_path = Path(*match.group(1).split('\\')).resolve()
            target_path = TARGET_DIR / origin_path.parts[-1]
            if target_path.is_file():
                print(f"File {target_path} exists!")

Need to extract specific pattern from a line in Python

Question

4 answers

solution1
2 2021-02-25 06:55:08

solution2
0 2021-02-25 07:01:19

solution3
0 2021-02-25 07:13:13

solution4
-1 2021-02-25 07:09:56

Need to extract specific pattern from a line in Python

Question

4 answers

solution1 2 2021-02-25 06:55:08

solution2 0 2021-02-25 07:01:19

solution3 0 2021-02-25 07:13:13

solution4 -1 2021-02-25 07:09:56

solution1
2 2021-02-25 06:55:08

solution2
0 2021-02-25 07:01:19

solution3
0 2021-02-25 07:13:13

solution4
-1 2021-02-25 07:09:56