Python - Splitting and replacing parts of a list with multiple replacements in a pandas dataframe

Question

In Python, I have a list of places in a pandas dataframe that I want to reduce each string to match the format of a larger list, with the goal of merging the lists.

Ultimately, I want to make this list match the format of the other dataframe so that when I merge, I'm only merging rows where the "stop_name" column matches.

For example, out of the list below, I want to remove " STATION", so that "BOONTON STATION" becomes just "BOONTON".

However, I also want "BUTLER STATON (NEW JERSEY)" to become just "BUTLER", removing " STATION (NEW JERSEY)".

Lastly, for a 2-word station name I want to keep the second word, so that "MORRIS PLAINS STATION" becomes just "MORRIS PLAINS".

Basically I want to remove everything from one space from before the word "station" and everything after it on every row in the “stop_name” column.

I've tried various splits and replacements of strings and I'm either getting errors, or it's not making the replacement on every row.

Any direction to a viable solution would be appreciated.

stop_name
0   BOONTON STATION
1   BUTLER STATION (NEW JERSEY)
2   CONVENT STATION (NJ TRANSIT)
3   DOVER STATION (NJ TRANSIT)
4   LAKE HOPATCONG STATION
5   MADISON STATION (NJ TRANSIT)
6   MILLINGTON STATION
7   MORRIS PLAINS STATION
8   MORRISTOWN STATION
9   MOUNT ARLINGTON STATION
10  MOUNT TABOR STATION
12  POMPTON PLAINS STATION
13  TOWACO STATION

Answer 1

It seems you just want to replace pattern STATION.* with empty string:

df.stop_name.str.replace(' STATION.*', '')

0             BOONTON
1              BUTLER
2             CONVENT
3               DOVER
4      LAKE HOPATCONG
5             MADISON
6          MILLINGTON
7       MORRIS PLAINS
8          MORRISTOWN
9     MOUNT ARLINGTON
10        MOUNT TABOR
12     POMPTON PLAINS
13             TOWACO
Name: stop_name, dtype: object

Answer 2

A regular expression extract() is straight forward.

df = pd.read_csv(io.StringIO("""stop_name
0   BOONTON STATION
1   BUTLER STATION (NEW JERSEY)
2   CONVENT STATION (NJ TRANSIT)
3   DOVER STATION (NJ TRANSIT)
4   LAKE HOPATCONG STATION
5   MADISON STATION (NJ TRANSIT)
6   MILLINGTON STATION
7   MORRIS PLAINS STATION
8   MORRISTOWN STATION
9   MOUNT ARLINGTON STATION
10  MOUNT TABOR STATION
12  POMPTON PLAINS STATION
13  TOWACO STATION"""), sep="\s\s+", engine="python")

df.stop_name = df.stop_name.str.extract(r"(^.*) STATION.*$")

	stop_name
0	BOONTON
1	BUTLER
2	CONVENT
3	DOVER
4	LAKE HOPATCONG
5	MADISON
6	MILLINGTON
7	MORRIS PLAINS
8	MORRISTOWN
9	MOUNT ARLINGTON
10	MOUNT TABOR
12	POMPTON PLAINS
13	TOWACO

Answer 3

Alternative without regular expression:

>>> df["stop_name"].str.split("STATION").str[0].str.strip()
0             BOONTON
1              BUTLER
2             CONVENT
3               DOVER
4      LAKE HOPATCONG
5             MADISON
6          MILLINGTON
7       MORRIS PLAINS
8          MORRISTOWN
9     MOUNT ARLINGTON
10        MOUNT TABOR
12     POMPTON PLAINS
13             TOWACO
Name: stop_name, dtype: object

Python - Splitting and replacing parts of a list with multiple replacements in a pandas dataframe

Question

3 answers

solution1
1 ACCPTED 2021-02-21 21:34:33

solution2
0 2021-02-21 21:34:48

solution3
0 2021-02-21 21:50:55

Python - Splitting and replacing parts of a list with multiple replacements in a pandas dataframe

Question

3 answers

solution1 1 ACCPTED 2021-02-21 21:34:33

solution2 0 2021-02-21 21:34:48

solution3 0 2021-02-21 21:50:55

solution1
1 ACCPTED 2021-02-21 21:34:33

solution2
0 2021-02-21 21:34:48

solution3
0 2021-02-21 21:50:55