I am trying to extract parts of a MySQL query to get the information I want.
I used this code / regex in Python:
import re
query = "SELECT `asd`.`ssss` as `column1`, `ss`.`wwwwwww` from `table`"
table_and_columns = re.findall('\`.*?`[.]\`.*?`',query)
My expected output:
['`asd`.`ssss`', `ss`.`wwwwwww`']
My real output:
['`asd`.`ssss`', '`column1`, `ss`.`wwwwwww`']
Can anybody help me and explain me where I went wrong? The regex should only find the ones that have two strings like asd
and a dot in the middle.
PS: I know that this is not a valid query.
The dot .
can also match a backtick, so the pattern starts by matching a backtick and is able to match all chars until it reaches the literal dot in [.]
There is no need to use non greedy quantifiers, you can use a negated character class only prevent crossing the backtick boundary.
`[^`]*`\.`[^`]*`
The asterix *
matches 0 or more times. If there has to be at least a single char, and newlines and spaces are unwanted, you could add \\s
to prevent matching whitespace chars and use +
to match 1 or more times.
`[^`\s]+`\.`[^`\s]+`
For example
import re
query = "SELECT `asd`.`ssss` as `column1`, `ss`.`wwwwwww` from `table`"
table_and_columns = re.findall('`[^`\s]+`\.`[^`\s]+`',query)
print(table_and_columns)
Output
['`asd`.`ssss`', '`ss`.`wwwwwww`']
Please try below regex. Greedy nature of .*
from left to right is what caused issue.
Instead you should search for [^`]*
`[^`]*?`\.`[^`]*?`
The thing is that .*?
matches any character (except for line terminators) even whitespaces. Also as you're already using *
which means either 0 or unlimited occurrences,not sure you need to use ?
. So this seems to work:
\`\S+\`[.]\`\S+\`
where \\S
is any non-whitespace character. You always can check you regexes using https://regex101.com
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.