I have the following strings:
'10000 ABC = 1 DEF'
'1 AM = 0,30$'
'3500 ABC = 1 GTY'
'1000 HUYT=1ABC'
'1 MONET Data = 1 ABC'
I want to find a flexible way to extract numeric and string values from left and right sides of =
. I do not know all possible string values. Therefore I cannot pre-define them. The only thing that I know is that left and right sides are divided by =
.
The goal is to get this result for the above-given example:
String-pairs
:
ABC-DEF
AM-$
ABC-GTY
HUYT-ABC
MONET Data-ABC
Numeric-pairs
:
10000-1
1-0.30
3500-1
1000-1
1-1
I was trying to use .lstrip('...')
and rstrip("...")
, but it does not give me the expected result.
Remove the unwanted characters and replace the =
with a -
.
import re
str = ['10000 ABC = 1 DEF',
'1 AM = 0,30$',
'3500 ABC = 1 GTY',
'1000 HUYT=1ABC',
'1 MONET Data = 1 ABC']
String_pairs = []
Numeric_pairs = []
for s in str:
String_pairs.append (re.sub(r'\s*=\s*','-', re.sub(r'\s*\d+(,\d+)?\s*','', s)))
Numeric_pairs.append (re.sub(r'\s*=\s*','-', re.sub(r'\s*[^\d,=]+\s*','', s)))
print String_pairs
print Numeric_pairs
Result:
['ABC-DEF', 'AM-$', 'ABC-GTY', 'HUYT-ABC', 'MONET Data-ABC']
['10000-1', '1-0,30', '3500-1', '1000-1', '1-1']
or a more cooler list comprehension (with the same result):
String_pairs = [re.sub(r'\s*=\s*','-', re.sub(r'\s*\d+(,\d+)?\s*','', s)) for s in str]
Numeric_pairs = [re.sub(r'\s*=\s*','-', re.sub(r'\s*[^\d,=]+\s*','', s)) for s in str]
As an alternative to regex, what you could do is to loop through each string and extract the relevant characters. It could look something along the lines of the following.
def extract_string_pairs(source_string):
string_pair = ''
for c in source_string:
if c.isalpha() or c == '$':
string_pair += c
elif c == '=':
string_pair += '-'
return string_pair
def extract_numeric_pairs(source_string):
string_pair = ''
for c in source_string:
if c.isdigit():
string_pair += c
elif c == '.':
string_pair += '.'
elif c == '=':
string_pair += '-'
return string_pair
import re
str = ['10000 ABC = 1 DEF',
'1 AM = 0,30$',
'3500 ABC = 1 GTY',
'1000 HUYT=1ABC',
'1 MONET Data = 1 ABC']
def getThePat(pat):
for i in str:
i = i.split("=")
x = re.findall(pat, i[0])
y = re.findall(pat, i[1])
print(" ".join(x), "-", " ".join(y))
pat1 = "\$+|[a-z]+|[A-Z][a-z]+|[A-Z]+"
pat2 = "\d+|\,+"
getThePat(pat1)
getThePat(pat2)
output:
ABC - DEF
AM - $
ABC - GTY
HUYT - ABC
MONET Data - ABC
10000 - 1
1 - 0 , 30
3500 - 1
1000 - 1
1 - 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.