简体   繁体   English

如何仅提取括号之间的字符串组件?

[英]How to extract only the string component between parenthesis?

I am looking for an efficient way to remove specific elements from a column of data.我正在寻找一种从数据列中删除特定元素的有效方法。

I have data like this:我有这样的数据:

year
1 (1991)
10 (1991-2001)
8 (1991-1998)
2 (2000-2002)

and I wanted to be like this:我想成为这样:

year
1991
1991 - 2001
1991 - 1998
2000 - 2002

I want to remove the parentheses and elements before and after parentheses.我想删除括号前后的括号和元素。

With a regular expression:使用正则表达式:

df = pd.DataFrame({'year': ['1 (1991)', '10 (1991-2001)', '8 (1991-1998)', '2 (2000-2002)']})

           year
       1 (1991)
 10 (1991-2001)
  8 (1991-1998)
  2 (2000-2002)

df['year'] = df['year'].str.extract(r'\((.*)\)')

      year
      1991
 1991-2001
 1991-1998
 2000-2002

You can use the below code您可以使用以下代码

df['year'] = df['year'].str.split('(').str[1].str.strip(')')

output output

    year
0   1991
1   1991-2001
2   1991-1998
3   2000-2002

How about:怎么样:

df['year'] = df['year'].str[1:-1]

Or safer if your data don't always start/end with '()' :如果您的数据并不总是以'()'开头/结尾,则更安全:

# str.strip accepts regex
df['year'] = df['year'].str.strip('(|)')

Output: Output:

1          1991
10    1991-2001
8     1991-1998
2     2000-2002
Name: year, dtype: object
lines = [
  "year",
  "1 (1991)",
  "10 (1991-2001)",
  "8 (1991-1998)",
  "2 (2000-2002)"
]
formatted_lines = []
for line in lines:
  updated_line = line.split('(') # Splits it into two lines: ["1 ", "1991)"]
  updated_line = updated_line.replace(')') # remove extra parenthesis
  formatted_lines.append(updated_line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM