简体   繁体   中英

Regex split string to isolate substrings enclosed in square brackets

Here is an example substring from the text I'm trying to parse and a couple of the raw strings I'm trying to split this text with:

>>> test_string = "[shelter and transitional housing during shelter crisis - selection of sites;\nwaiver of certain requirements regarding contracting]\n\nsponsors: acting mayor breed; kim, ronen, sheehy and cohen\nordinance authorizing public works, the department of homelessness and supportive\nhousing, and the department of public health to enter into contracts without adhering to the\nadministrative code or environment code provisions regarding competitive bidding and other\nrequirements for construction work, procurement, and personal services relating to identified\nshelter crisis sites (1601 quesada avenue; 149-6th street; 125 bayshore boulevard; 13th\nstreet and south van ness avenue, southwest corner; 5th street and bryant street, northwest\ncorner; caltrans emergency shelter properties; and existing city navigation centers and\nshelters) that will provide emergency shelter or transitional housing to persons experiencing\nhomelessness; authorizing the director of property to enter into and amend leases or licenses\nfor the shelter crisis sites without adherence to certain provisions of the administrative code;\nauthorizing the director of public works to add sites to the list of shelter crisis sites subject to\nexpedited processing, procurement, and leasing upon written notice to the board of\nsupervisors, and compliance with conditions relating to environmental review and\nneighborhood notice; affirming the planning department’s determination under the californinenvironmental quality act; and making findings of consistency with the general plan, and the eight priority policies of planning code, section 101.1.  assigned under 30 day rule to\nrules committee.\n[memorandum of understanding - service employees international union, local\n1021]\n\nsponsor: acting mayor breed"
>>> title = re.compile(r"\[([\s\S]*)\]")
>>> title = re.compile(r"\[.*\]")

What I want is to get a list of all strings enclosed in square brackets: []

>>> title.split(test_string)
['shelter and transitional housing during shelter crisis - selection of sites; waiver of certain requirements regarding contracting', 'memorandum of understanding - service employees international union, local 1021']

However, none of these raw strings split properly. It seems to me that re is including the closing criteria ] as part of the non-whitespace character set when it should the character that the string is split on.

I tried modifying the raw string to split on to be like this:

title = re.compile(r"\\[([\\s\\S^\\]]*)\\]")

but that doesn't work either. I'm interpreting this last string to split on substrings that have [ in them, followed by any number of characters except for ] , and followed by ] .

How am I misunderstanding this?

[\\s\\S^\\]] means: whitespace or non-whitespace or caret ^ or slash or ] . You cannot mix negated classes and regular ones. I think it's enough to use a class "all but closing ] ": [^]] , see example below.

You can also use - findall instead of split .

re.findall(r'\[([^]]*)\]', test_string)[0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM