[英]How to extract only the string component between parenthesis?
I am looking for an efficient way to remove specific elements from a column of data.我正在寻找一种从数据列中删除特定元素的有效方法。
I have data like this:我有这样的数据:
year
1 (1991)
10 (1991-2001)
8 (1991-1998)
2 (2000-2002)
and I wanted to be like this:我想成为这样:
year
1991
1991 - 2001
1991 - 1998
2000 - 2002
I want to remove the parentheses and elements before and after parentheses.我想删除括号前后的括号和元素。
pandas.Series.str.extract
使用pandas.Series.str.extract
\((.*)\)
正则表达式: \((.*)\)
()
提取()
之间的内容df = pd.DataFrame({'year': ['1 (1991)', '10 (1991-2001)', '8 (1991-1998)', '2 (2000-2002)']})
year
1 (1991)
10 (1991-2001)
8 (1991-1998)
2 (2000-2002)
df['year'] = df['year'].str.extract(r'\((.*)\)')
year
1991
1991-2001
1991-1998
2000-2002
You can use the below code您可以使用以下代码
df['year'] = df['year'].str.split('(').str[1].str.strip(')')
output output
year
0 1991
1 1991-2001
2 1991-1998
3 2000-2002
How about:怎么样:
df['year'] = df['year'].str[1:-1]
Or safer if your data don't always start/end with '()'
:如果您的数据并不总是以'()'
开头/结尾,则更安全:
# str.strip accepts regex
df['year'] = df['year'].str.strip('(|)')
Output: Output:
1 1991
10 1991-2001
8 1991-1998
2 2000-2002
Name: year, dtype: object
lines = [
"year",
"1 (1991)",
"10 (1991-2001)",
"8 (1991-1998)",
"2 (2000-2002)"
]
formatted_lines = []
for line in lines:
updated_line = line.split('(') # Splits it into two lines: ["1 ", "1991)"]
updated_line = updated_line.replace(')') # remove extra parenthesis
formatted_lines.append(updated_line)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.