简体   繁体   中英

Determining the position of sub-string in list of strings

I have a list of words (strings), say:

word_lst = ['This','is','a','great','programming','language']

And a second list with sub-strings, say:

subs_lst= ['This is', 'language', 'a great']

And let's suppose each sub-string in subs_lst appears only one time in word_lst. (sub-strings can be of any length)

I want an easy way to find the hierarchical position of the sub-strings in the word_lst. So what I want is to order subs_lst according to they appearance in word_lst. In the previous example, the output would be:

out = ['This is', 'a great', language]

Does anyone know an easy way to do this?

There's probably a faster way to do this, but this works, at least:

word_lst = ['This','is','a','great','programming','language']
subs_lst= ['This is', 'language', 'a great']
substr_lst = [' '.join(word_lst[i:j]) for i in range(len(word_lst)) for j in range(i+1, len(word_lst)+1)]
sorted_subs_list = sorted(subs_lst, key=lambda x:substr_lst.index(x))
print sorted_subs_list

Output:

['This is', 'a great', 'language']

The idea is to build a list of every substring in word_lst , ordered so that all the entries that start with "This" come first, followed by all the entries starting with "is", etc.. We store that in substr_lst .

>>> print substr_lst
['This', 'This is', 'This is a', 'This is a great', 'This is a great programming', 'This is a great programming language', 'is', 'is a', 'is a great', 'is a great programming', 'is a great programming language', 'a', 'a great', 'a great programming', 'a great programming language', 'great', 'great programming', 'great programming language', 'programming', 'programming language', 'language']

Once we have that list, we sort subs_list, using the index of each entry in substr_list as the key to sort by:

>>> substr_lst.index("This is")
1
>>> substr_lst.index("language")
20
>>> substr_lst.index("a great")
12

The intermediate step seems unneeded to me. Why not just make the word list a single string and find the substrings in that?

 sorted(subs_lst, key = lambda x : ' '.join(word_lst).index(x))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM