What I have
s
of length m
(where m > 3)L
of words
(length > 2)What I want to know
word
in L
a substring of s
Currently i have L
split up into files, one file each for each word-length 16.txt , 15.txt ... 04.txt
I then iterate over these files from n.txt --> 04.txt and basically do
cat n.txt | while read w; do if [[ $s =~ $w ]] ; then echo $w; fi; done
It's painfully slow there has to be a better way to do this.
Additional info:
L
contains about 200k words and is fairly static so I don't mind a complicated time consuming setup if it means greater speed L
but only one of them needs to be searched at a time.I'm pretty language agnostic (se tags) but pseudo code is also fine
You can use grep
without splitting the list:
grep -oFf list.txt <<< "$s"
-o
will only output the matching substrings -F
will interpret the lines in list.txt as fixed strings, not regular expressions -f
will tell grep what file to use as the source of the patterns to match <<<
in bash takes the following word and makes it input to the preceding command
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.