简体   繁体   中英

Extract text between characters in a list in Python

After iterating over a list with a for loop, in order to extract only a few values, I get this:

['Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)', '11.6']
['Dell Precision 380 (3.8 GHz, Pentium 4 processor 670)', '11.5']

What I need to do is to extract the information between the parenthesis in each line and put it into another list, but I'm struggling to find the right code.

I have tried the method described here: How do I find the string between two special characters? , but I get an error because the string is in a list.

I have also looked at the documentation for Re, but I'm not sure how to apply it in this case.

Considering that this a standard structure, you can avoid the regex part entirely, and simply do something like this:

Let us assume you have already extracted the string you want to work on:

s = 'Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)'

You can do a split on the first ( , and then use slicing to remove what you don't need:

>>> s.split('(')[1][:-1]
'3.73 GHz, Pentium Exteme Edition 965'

While the above does have the dependency of the structure always falling between the parentheses, and in order to avoid the case of something raising, you can do:

s.partition('(')[2][:-1]

As provided in the comments by @JonClements.

a = ['Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)', '11.6']
b = a[0] # Get 'Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)'
c = b[b.find('(') + 1: b.find(')')] # Get '3.73 GHz, Pentium Exteme Edition 965'

The "more powerful" way to achieve this is to use regex. Like this:

import re
regex = re.compile("\((.*)\)")
details = list(for regex.findall(text)[0] for text in origin_list if regex.search(text))

You can use r'\\((.*)\\) to get the data inside the parantesis. This is simple.

import re
data=[['Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)', '11.6'],
['Dell Precision 380 (3.8 GHz, Pentium 4 processor 670)', '11.5']]
result=[re.match(r'\((.*)\)',x[0]).group(1) for x in data]
print result

But simply using wildcard may sometime yield you junk results. So it is always better to apply more restrictions to get an exact match. Hence, if you use \\w.*\\((\\d+.\\d+\\s\\w.*,.*\\d+)\\) as your match pattern you will always get exact data. so if this is the case the same code will become

import re
data=[['Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)', '11.6'],
['Dell Precision 380 (3.8 GHz, Pentium 4 processor 670)', '11.5']]
result=[re.match(r'\w.*\((\d+.\d+\s\w.*,.*\d+)\)',x[0]).group(1) for x in data]
print result

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM