简体   繁体   中英

Python findall() start digit and end word

I have this string

procesor = "2x2.73 GHz Mongoose M5 & 2x2.50 GHz Cortex-A76 & 4x2.0 GHz Cortex-A55"

and I need this CPU core list by using the re.findall()

Out:['2x2.73 GHz', '2x2.50 GHz', '4x2.0 GHz']

Please help me. I'm stuck here:

re.findall('(\d+[A-Za-z])',procesor)
Out[1]: ['2x', '2x', '4x']

Use

re.findall(r'\d+x\d+(?:\.\d+)?\s*GHz', procesor)

See regex proof .

Explanation

--------------------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
  x                        'x'
--------------------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  GHz                      'GHz'

If you need it case insensitive:

re.findall(r'\d+x\d+(?:\.\d+)?\s*GHz', procesor, re.I)

In a more human readable format [0-9] represents one digit:

processor = "2x2.73 GHz Mongoose M5 & 2x2.50 GHz Cortex-A76 & 4x2.0 GHz Cortex-A55"
re.findall(r'[0-9]+x[0-9]+.[0-9]* GHz', processor)

Returns:

['2x2.73 GHz', '2x2.50 GHz', '4x2.0 GHz']

This regex-pattern can helps you: ([\\d.]+)\\s?[xX]\\s?([\\d.]+)\\s?GHz or insentitive case (?i)([\\d.]+)\\s?x\\s?([\\d.]+)\\s?GHz

See the sample in regex101 !

Append this to your Python source:

processor  = """2x2.73 GHz Mongoose M5 & 2x2.50 GHz Cortex-A76 & 4x2.0 GHz Cortex-A55"""
CPU_Cores = re.findall("([\d.]+)\s?[xX]\s?([\d.]+)\s?GHz", processor)
print (CPU_Cores)

Output

[('2', '2.73'), ('2', '2.50'), ('4', '2.0')]

Explaination

([\\d.]+)\\s?[xX]\\s?([\\d.]+)\\s?GHz

  • The first group ([\\d.]+) matches first real-number.
  • \\s?[xX]\\s? matches x , x , x , X , X , X .
  • The second group ([\\d.]+) matches second real-number.
  • \\s? is optional that matches whitespace character or nothing.
  • GHz matches literally word GHz.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM