如何僅提取括號之間的字符串組件？

Question

我正在尋找一種從數據列中刪除特定元素的有效方法。

我有這樣的數據：

year
1 (1991)
10 (1991-2001)
8 (1991-1998)
2 (2000-2002)

我想成為這樣：

year
1991
1991 - 2001
1991 - 1998
2000 - 2002

我想刪除括號前后的括號和元素。

Answer 1

使用正則表達式：

使用pandas.Series.str.extract
- 正則表達式： \((.*)\)
- 提取()之間的內容

df = pd.DataFrame({'year': ['1 (1991)', '10 (1991-2001)', '8 (1991-1998)', '2 (2000-2002)']})

           year
       1 (1991)
 10 (1991-2001)
  8 (1991-1998)
  2 (2000-2002)

df['year'] = df['year'].str.extract(r'\((.*)\)')

      year
      1991
 1991-2001
 1991-1998
 2000-2002

Answer 2

您可以使用以下代碼

df['year'] = df['year'].str.split('(').str[1].str.strip(')')

output

    year
0   1991
1   1991-2001
2   1991-1998
3   2000-2002

Answer 3

怎么樣：

df['year'] = df['year'].str[1:-1]

如果您的數據並不總是以'()'開頭/結尾，則更安全：

# str.strip accepts regex
df['year'] = df['year'].str.strip('(|)')

Output：

1          1991
10    1991-2001
8     1991-1998
2     2000-2002
Name: year, dtype: object

Answer 4

lines = [
  "year",
  "1 (1991)",
  "10 (1991-2001)",
  "8 (1991-1998)",
  "2 (2000-2002)"
]
formatted_lines = []
for line in lines:
  updated_line = line.split('(') # Splits it into two lines: ["1 ", "1991)"]
  updated_line = updated_line.replace(')') # remove extra parenthesis
  formatted_lines.append(updated_line)

如何僅提取括號之間的字符串組件？

問題描述

4 個解決方案

解決方案1
2 2019-10-13 03:29:34

使用正則表達式：

解決方案2
1 2019-10-13 02:50:51

解決方案3
0 2019-10-13 03:11:43

解決方案4
-2 2019-10-13 02:40:30

如何僅提取括號之間的字符串組件？

問題描述

4 個解決方案

解決方案1 2 2019-10-13 03:29:34

使用正則表達式：

解決方案2 1 2019-10-13 02:50:51

解決方案3 0 2019-10-13 03:11:43

解決方案4 -2 2019-10-13 02:40:30

解決方案1
2 2019-10-13 03:29:34

解決方案2
1 2019-10-13 02:50:51

解決方案3
0 2019-10-13 03:11:43

解決方案4
-2 2019-10-13 02:40:30