Python - 在 pandas dataframe 中使用多个替换来拆分和替换列表的部分

Question

In Python, I have a list of places in a pandas dataframe that I want to reduce each string to match the format of a larger list, with the goal of merging the lists.在 Python 中，我有一个 pandas dataframe 中的位置列表，我想减少每个字符串以匹配更大列表的格式，并以列表为目标。

Ultimately, I want to make this list match the format of the other dataframe so that when I merge, I'm only merging rows where the "stop_name" column matches.最终，我想让这个列表与另一个 dataframe 的格式相匹配，这样当我合并时，我只合并“stop_name”列匹配的行。

For example, out of the list below, I want to remove " STATION", so that "BOONTON STATION" becomes just "BOONTON".例如，在下面的列表中，我想删除“STATION”，这样“BOONTON STATION”就变成了“BOONTON”。

However, I also want "BUTLER STATON (NEW JERSEY)" to become just "BUTLER", removing " STATION (NEW JERSEY)".但是，我也希望“BUTLER STATON (NEW JERSEY)”变成“BUTLER”，删除“STATION (NEW JERSEY)”。

Lastly, for a 2-word station name I want to keep the second word, so that "MORRIS PLAINS STATION" becomes just "MORRIS PLAINS".最后，对于两个单词的站名，我想保留第二个单词，这样“MORRIS PLAINS STATION”就变成了“MORRIS PLAINS”。

Basically I want to remove everything from one space from before the word "station" and everything after it on every row in the “stop_name” column.基本上，我想从“站”一词之前的一个空格中删除所有内容，以及“stop_name”列中每一行的所有内容。

I've tried various splits and replacements of strings and I'm either getting errors, or it's not making the replacement on every row.我尝试了各种拆分和替换字符串，但我要么遇到错误，要么没有在每一行上进行替换。

Any direction to a viable solution would be appreciated.任何可行的解决方案的方向将不胜感激。

stop_name
0   BOONTON STATION
1   BUTLER STATION (NEW JERSEY)
2   CONVENT STATION (NJ TRANSIT)
3   DOVER STATION (NJ TRANSIT)
4   LAKE HOPATCONG STATION
5   MADISON STATION (NJ TRANSIT)
6   MILLINGTON STATION
7   MORRIS PLAINS STATION
8   MORRISTOWN STATION
9   MOUNT ARLINGTON STATION
10  MOUNT TABOR STATION
12  POMPTON PLAINS STATION
13  TOWACO STATION

Answer 1

It seems you just want to replace pattern STATION.* with empty string:看来您只想用空字符串替换模式STATION.* ：

df.stop_name.str.replace(' STATION.*', '')

0             BOONTON
1              BUTLER
2             CONVENT
3               DOVER
4      LAKE HOPATCONG
5             MADISON
6          MILLINGTON
7       MORRIS PLAINS
8          MORRISTOWN
9     MOUNT ARLINGTON
10        MOUNT TABOR
12     POMPTON PLAINS
13             TOWACO
Name: stop_name, dtype: object

Answer 2

A regular expression extract() is straight forward.正则表达式extract()是直截了当的。

df = pd.read_csv(io.StringIO("""stop_name
0   BOONTON STATION
1   BUTLER STATION (NEW JERSEY)
2   CONVENT STATION (NJ TRANSIT)
3   DOVER STATION (NJ TRANSIT)
4   LAKE HOPATCONG STATION
5   MADISON STATION (NJ TRANSIT)
6   MILLINGTON STATION
7   MORRIS PLAINS STATION
8   MORRISTOWN STATION
9   MOUNT ARLINGTON STATION
10  MOUNT TABOR STATION
12  POMPTON PLAINS STATION
13  TOWACO STATION"""), sep="\s\s+", engine="python")

df.stop_name = df.stop_name.str.extract(r"(^.*) STATION.*$")

	stop_name停止名称
0 0	BOONTON布顿
1 1	BUTLER管家
2 2	CONVENT修道院
3 3	DOVER多佛
4 4	LAKE HOPATCONG霍帕聪湖
5 5	MADISON麦迪逊
6 6	MILLINGTON米灵顿
7 7	MORRIS PLAINS莫里斯平原
8 8	MORRISTOWN莫里斯敦
9 9	MOUNT ARLINGTON阿灵顿山
10 10	MOUNT TABOR泰伯山
12 12	POMPTON PLAINS庞普顿平原
13 13	TOWACO托瓦科

Answer 3

Alternative without regular expression:没有正则表达式的替代方案：

>>> df["stop_name"].str.split("STATION").str[0].str.strip()
0             BOONTON
1              BUTLER
2             CONVENT
3               DOVER
4      LAKE HOPATCONG
5             MADISON
6          MILLINGTON
7       MORRIS PLAINS
8          MORRISTOWN
9     MOUNT ARLINGTON
10        MOUNT TABOR
12     POMPTON PLAINS
13             TOWACO
Name: stop_name, dtype: object

Python - 在 pandas dataframe 中使用多个替换来拆分和替换列表的部分

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-02-21 21:34:33

解决方案2
0 2021-02-21 21:34:48

解决方案3
0 2021-02-21 21:50:55

Python - 在 pandas dataframe 中使用多个替换来拆分和替换列表的部分

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-02-21 21:34:33

解决方案2 0 2021-02-21 21:34:48

解决方案3 0 2021-02-21 21:50:55

解决方案1
1 已采纳 2021-02-21 21:34:33

解决方案2
0 2021-02-21 21:34:48

解决方案3
0 2021-02-21 21:50:55