Using regex to remove unwanted end of a string

Question

I'm struggling a little with some regex execution to remove trailing extraneous characters. I've tried a few ideas that I found here, but none are quite what I'm looking for.

Data looks like this (only one column of data):

City1[edit]

City2 (University Name)

City with a Space (University Name)

Etc.

Basically, the trouble that I run into here is I can't necessarily remove everything after a space because sometimes a city name includes a space ("New York City").

However, what I think I could do is a three step approach:

Replace anything between [],(),{} sets of characters (this will remove the "edit" and the "University Name" in the sample data.
Replace the [],(),{} type characters since those are now extra characters.
Trim any trailing spaces (which will leave the spaces in city names such as St. Paul)

I have two main questions: 1. Is there a way to do this in one command, or will it have to be three separate commands? 2. How do you remove characters in between specific characters using regex?

Code that I have attempted:

DF[0].replace(r'[^0-9a-zA-Z*]$', "", regex=True, inplace = True) ---however this only replaced the final iteration of the special characters
DF[0].replace(r'[\\W+$|^0-9a-zA-Z*]',"",regex=True, inplace=True) --unfortunately this just replaced everything, leaving all my data blank

Answer 1

If you always know the bracket characters that will come first you can do:

Create data

df=pd.DataFrame({'names':['City1[edit]', 
                          'City2 (University Name)', 
                           'City with a Space {University Name}']})

Then replace everything after first bracket.

df.names.str.replace('\[.*|\(.*|\{.*', '').str.strip()

Output

0                City1
1                City2
2    City with a Space

Answer 2

A regexp would be a relatively easy way to do this.

import re

p = re.compile('(\(|\[|\{)[A-Za-z\ ].+(\)|\]|\})')
dirty = 'City with a Space (University Name)'
cleaned = p.sub('', dirty).strip()
print(cleaned)

Answer 3

option with split
look for zero or one space followed by a [ , ( , or {
split at that point and take first part

df.names.str.split(r'\s*[\[\{\(]').str[0]

0                City1
1                City2
2    City with a Space
Name: names, dtype: object

Using regex to remove unwanted end of a string

Question

3 answers

solution1
3 ACCPTED 2016-12-20 21:05:09

solution2
0 2016-12-20 21:09:25

solution3
0 2016-12-20 21:16:52

Using regex to remove unwanted end of a string

Question

3 answers

solution1 3 ACCPTED 2016-12-20 21:05:09

solution2 0 2016-12-20 21:09:25

solution3 0 2016-12-20 21:16:52

solution1
3 ACCPTED 2016-12-20 21:05:09

solution2
0 2016-12-20 21:09:25

solution3
0 2016-12-20 21:16:52