简体   繁体   中英

Regex to capture all import statements

I want to create a script that looks inside a Python file and finds all import statements. Possible variations of those are the following:

import os
import numpy as np
from itertools import accumulate
from collections import Counter as C
from pandas import *

By looking at these, one could argue that the logic should be:

Get me all <foo> from from <foo> statements and those <bar> from import <bar> that are not preceded by from <foo> .

To translate the above in regex, I wrote:

from (\w+)|(?<!from \w+)import (\w+)

The problem seems to be with the non-fixed width of the negative lookbehind but I cannot seem to be able to fix it.

EDIT:

As a bonus, it would also be nice to capture multiple includes as in:

import sys, glob

It seems you only want to extract the matches from the start of a line, taking into account the leading whitespace.

You may consider using

^\s*(?:from|import)\s+(\w+(?:\s*,\s*\w+)*)

See the regex demo .

Details

  • ^ - start of string (use re.M to also match start of a line)
  • \\s* - 0+ whitespaces (use [^\\S\\r\\n]* to only match horizontal whitespace)
  • (?:from|import) - either of the two words
  • \\s+ - 1+ whitespaces
  • (\\w+(?:\\s*,\\s*\\w+)*) - 1 or more word chars, followed with 0+ occurrences of 0+ whitespaces, , , 0+ whitespaces and then 1+ word chars.

In Python, you may later split the Group 1 value with re.split(r'\\s*,\\s*', group_1_value) to get individual comma-separated module names.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM