简体繁体中英

suffix trees: locating a substring if a certain number of mistakes are allowed

原文 2012-02-16 12:00:39 6 1 python/ algorithm/ suffix-tree

According to the Wikipedia article on suffix trees , suffix trees can be used to locate substrings of a string if a certain number of mistakes are allowed.

Given the suffix tree of a string, how can I find all the instances of a given substrings of it, allowing for each instance at most one mistake?

(By "mistake", I mean the substitution of one character.)

1 answers

That would be just a more convoluted graph searching (aka find the path through the dungeon where some of the doors are broken and need to be kicked open and you want to spare your feet).

The details depend greatly on what do you mean by "mistake". So I take it that "mistake" is a substitution of one character, that is the easiest case.

In the algorithm, you will search the tree from the root comparing and advancing your pattern as if you'd searched for exact match. Just if there was a character on the edge that you can't follow, you save the state of your algorithm for later (the state being [tree position, pattern position] ). This should apply even when you can follow one link for a node, but not another - you follow the matching and save the others.

Then, you return to the saved positions and emulate the substitution, that means advance one position in the tree (to all nonmatching possibilities) and one position in the pattern. Then, continue your search as normal (you have consumed your possibility of one error, so you're searching for exact match now).

Whenever you reach the end of the pattern, report successful match (ie. all leaves below the current node in the tree).

Working with suffix trees in python

Regex to match a number without a certain suffix

How to extract specific number of characters from a substring in python with same suffix

Find Substring of A in B with n mistakes

How to capture a certain number of characters after a substring?

Python Running out of Memory (Using Suffix Trees)

Locating and extracting a substring to a new column in python

Locating a substring in Field Calculator using ArcGIS Pro

Defining a function to count the number of lines in a file, containing a certain substring

Python Substring - Splitting nth number of Characters to the left of a certain string

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Working with suffix trees in python Regex to match a number without a certain suffix How to extract specific number of characters from a substring in python with same suffix Find Substring of A in B with n mistakes How to capture a certain number of characters after a substring? Python Running out of Memory (Using Suffix Trees) Locating and extracting a substring to a new column in python Locating a substring in Field Calculator using ArcGIS Pro Defining a function to count the number of lines in a file, containing a certain substring Python Substring - Splitting nth number of Characters to the left of a certain string

Related Tags

suffix trees: locating a substring if a certain number of mistakes are allowed

Question

1 answers

solution1 4 ACCPTED 2012-02-16 12:23:28

solution1
4 ACCPTED 2012-02-16 12:23:28