简体   繁体   中英

How to check if part of a string in a list is contained in another list in Python

I have been looking through numerous questions that seem like they hit the nail on the head, but then end up confusing me further and end up not helping at all. So hopefully no one closes this question and refers me to other questions, and actually helps me because I have spent hours trying to figure it out. I cannot provide the actual text for security reasons so I will make up similar looking lists. There are thousands of strings in these list but ill just make an example of 3, purposely putting in strings that I want to match up.

list= ['93900 2016-01-11.50 10.17', '93030 2014-04-16.50 18.83', '29322 2009-05-21.50 17.81']

list1= ['33492 2017-02-14.50 11.17', '93900 2016-02-11.00 11.15', '93900 2016-12-14.00 15.66']

  1. list has different spacing between the characters
  2. I need to take for example in "list", 93900 2016-01-11.50 10.17 and compare to the strings in list1, and ask if 93900 along with the date 2016-01-11.50 but with a +-month buffer. So ideally it would return '93900 2016-02-11.00 11.15', '93900 2015-12-14.00 15.66' from list1. I only know how to compare exact strings that are either exactly the same or not. This is more complicated because if I do that comparison it will clearly return an empty list because none of them will match. I need a smarter code that will look within the string and allow me to look for values near it. I also need to put the full string into a new list after compared, not the partial string.

I hope this makes sense and that someone can help.

All I have is a nested loop that does not work because I cannot figure out how to compare partial strings.

new_list= [] for line in list: for line1 in list1: if line[0:5] in line1[0:5] new_list.append[line]

Yeh this clearly does not work but its a way to check one agains each element in the list, but not certain characters.

If the buffer is always 1 month and the data format is the same this code should work for you:

def comp(s, l): # string to search, list
    head, month = s.split('-')[0:2] # eg: with s = '93900 2016-01-11.50 10.17' head = '93900 2016' and month = '01'
    head, year = head.split(' ') # head = '93900' year = '2016'
    year = int(year)
    month = int(month)

    # managing edge cases where month is january or december
    if month == 1: 
        y1 = year - 1 
        m1 = 12
    else:
        y1 = year
        m1 = month - 1

    if month == 12:
        y2 = year + 1
        m2 = 1
    else:
        y2 = year
        m2 = month + 1

    # building strings to search for
    s1 = head + ' ' + str(y1) + '-' + str(m1).zfill(2)
    s2 = head + ' ' + str(y2) + '-' + str(m2).zfill(2)

    out = []
    for item in l:
        if s1 in item or s2 in item:
            out.append(item)

    return out

test_s = '93900 2016-01-11.50 10.17'
test_l = ['33492 2017-02-14.50 11.17', '93900 2016-02-11.00 11.15', '93900 2015-12-14.00 15.66']

print(comp(test_s, test_l))

You need to extract the date part convert them to date type then you can do date comparisons.

Well as pointed out in the comments, timedelta can't compare months as it's not a uniform measure. Found another answer which uses a 3rd party library to compare months. If you use that you could piece together a logic like below.

Warning: psuedocode below

import datetime as dt

def extract_date(txt):
    return dt.datetime.strptime(txt.split()[1].split('.')[0])

for i in list0:
     id, date = extract_id(i), extract_date(i)
     filter = [j for j in list1 if j.startwith(id) and (date - extract_date(j)).month <= 1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM