简体   繁体   中英

Search using substring in python

I have a txt file that has two columns as below -

LocationIndex   ID
P-1-A100A100    X000PY66QL
P-1-A100A100    X000RE0RRD
P-1-A100A101    X000R39WBL
P-1-A100A103    X000LJ7MX1
P-1-A100A104    X000S5QZMH
P-1-A100A105    X000MUMNOR
P-1-A100A105    X000S5R571
P-1-A100B100    X000MXVHFZ
P-1-A100B100    X000Q18233
P-1-A100B100    X000S6RSZJ
P-1-A100B101    X000K7C4HN
P-1-A100B102    X000RN9U59
P-1-A100B103    X000R4MZE1
P-1-A100B104    X000K9HSKT
P-1-A100C101    X000MCB5DZ
P-1-A100C101    X000O0T0RX
P-1-A100C102    X000RULTGZ
P-1-A100C104    X000O5NXKN
P-1-A100C104    X000RN3G9F
P-1-A100C105    X000D4P1P5
P-1-A100C105    X000QNBKDF
P-1-A100D100    X000FADDHP
P-1-A100D100    X000KR34DB
P-1-A100D100    X000MPCZ1X
P-1-A100D100    X000S6TO0B
P-1-A100D101    B00PANFBJ2
P-1-A100D101    X000Q1IYQD
P-1-A100D101    X000QEMDV7
P-1-A100D101    X000QHRKM1
P-1-A100D101    X000RUGIKR
P-1-A100D102    X000FF656L
P-1-A100D102    X000S13C5J

Taking the LocationIndex as the search index, I need to find which adjacent locations have the same ID .

Defining the adjacent locations :

The left and right locations for a particular Location Index is given by changing last character of the Location Index , eg: for P-1-A100B103 , left is P-1-A100B102 and right is P-1-A100B104 (the last digit is in the range 0-5 )

The top and bottom locations for a particular Location Index is given by changing fourth last character of the Location Index , eg: for P-1-A100B103 , top is P-1-A100C103 and right is P-1-A100A103 (the fourth last digit is in the range AE )

I need to find out if the ID of given location index (here for eg P-1-A100B103 ) matches with ID of any of its left right top or bottom location index .

I tried the following way -

import sys

with open( 'Test.txt', 'r') as f:
    for line in f:
        line = line.split()
                x = int(line[1])
                y = line[2]
                if x[-1:] > 0 && x[-1: < 5] && x[-4:] != 'A' && x[-4:] != 'E':  # eliminating corner cases
                        right = ord x[12] + 1
                        left  = ord x[12] - 1
                        top   = chr(ord x[9] + 1)
                        bottom = chr(ord x[9] - 1)
                        # how to search ID for individual right, left, top and bottom?

I can do this in shell but I need to have this done in Python. Any hint/help would be appreciated

A bit long and not the most efficient, but it gets the job done:

def getData():
    loc_keys = {}
    with open(FILE_PATH, 'r') as f:
        next(f)
        for line in f:
            line = line.split()
            loc, key = line[0], line[1]
            if loc not in loc_keys:
                loc_keys[loc] = set([])
            loc_keys[loc].add(key)

    return loc_keys


def is_adjacent(loc1, loc2):
    if int(loc1[-1]) == int(loc2[-1]) + 1 or \
       int(loc1[-1]) == int(loc2[-1]) - 1 or \
       ord(loc1[-4]) == ord(loc2[-4]) + 1 or \
       ord(loc1[-4]) == ord(loc2[-4]) - 1:
        return True
    else:
        return False


def find_matches(loc, loc_keys):
    if loc not in loc_keys:
        return None

    keys = loc_keys[loc]  # Set of keys for the input location
    matches = set([])
    for i in loc_keys.keys():
        # {*()} is an empty set literal
        if is_adjacent(loc, i) and loc_keys[i].intersection(keys) != {*()}:
            matches.add(i)

    return matches


# Call find_matches( <some LocationIndex>, getData() )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM