A faster way to find out if any word in a list is a substring in a longer word

Question

What I have

A string s of length m (where m > 3)
A huge list L of words (length > 2)

What I want to know

Is any word in L a substring of s

Currently i have L split up into files, one file each for each word-length 16.txt , 15.txt ... 04.txt
I then iterate over these files from n.txt --> 04.txt and basically do

cat n.txt | while read w; do if [[ $s =~ $w ]] ; then echo $w; fi; done

It's painfully slow there has to be a better way to do this.

Additional info:

L contains about 200k words and is fairly static so I don't mind a complicated time consuming setup if it means greater speed
There are several different such lists L but only one of them needs to be searched at a time.

I'm pretty language agnostic (se tags) but pseudo code is also fine

Answer 1

You can use grep without splitting the list:

grep -oFf list.txt <<< "$s"

-o will only output the matching substrings
-F will interpret the lines in list.txt as fixed strings, not regular expressions
-f will tell grep what file to use as the source of the patterns to match
<<< in bash takes the following word and makes it input to the preceding command

A faster way to find out if any word in a list is a substring in a longer word

Question

1 answers

solution1
6 ACCPTED 2021-02-22 22:47:27

A faster way to find out if any word in a list is a substring in a longer word

Question

1 answers

solution1 6 ACCPTED 2021-02-22 22:47:27

solution1
6 ACCPTED 2021-02-22 22:47:27