如何從Python中的類似字符串中獲取值？

Question

假設我有一個包含類似字符串的文件中的以下字符串：

Andorra la Vella|ad|Andorra la Vella|20430|42.51|1.51|
Canillo|ad|Canillo|3292|42.57|1.6|
Encamp|ad|Encamp|11224|42.54|1.57|
La Massana|ad|La Massana|7211|42.55|1.51|
...

如何使用正則表達式打印第一個數字（或每個字符串的第四個字段）？ 而且，如果第四個數字超過10000，我怎么能打印特定線的前4個字段（例如“Andorra la Vella”“ad”“Andorra la Vella”20430）？

Answer 1

我認為在這種情況下使用csv模塊會更容易：

import csv
with open(filename, 'rb') as f:
    for row in csv.reader(f, delimiter='|'):
        num = float(row[3])
        if num > 10000:
            print(row[:4])

Answer 2

你不需要正則表達式。

s = """
Andorra la Vella|ad|Andorra la Vella|20430|42.51|1.51|
Canillo|ad|Canillo|3292|42.57|1.6|
Encamp|ad|Encamp|11224|42.54|1.57|
La Massana|ad|La Massana|7211|42.55|1.51|
"""

for line in s.splitlines():  # pretend we are reading from a file
    if not line:
        continue # skip empty lines

    groups = line.split('|')  # splits each line into its segments
    if int(groups[3]) > 10000:  # checks if the 4th value is above 10000
        print groups[:4]  # prints the first 4 values
    else:
        print groups[3]  # prints the 4th value

>>> 
['Andorra la Vella', 'ad', 'Andorra la Vella', '20430']
3292
['Encamp', 'ad', 'Encamp', '11224']
7211

Answer 3

使用正則表達式 ：

import re
results = [re.match('(.*?\|)(.*?\|)(.*?\|)(.*?\|)(.*?\|)(.*?\|)', line).groups() for line in open('file.txt')]
# filter just the rows with fourth column > 10000
results = [result for result in results if int(result[3]) > 10000]

使用拆分：

results = [line.split('|')[0:-1] for line in open('file.txt')]
# filter just the rows with fourth column > 10000
results = [result for result in results if int(result[3]) > 10000]

Answer 4

你不需要正則表達式，你可以使用str.split和str.strip ：

>>> s = 'Andorra la Vella|ad|Andorra la Vella|20430|42.51|1.51|'
>>> spl = s.rstrip('|\n').split('|')
>>> spl
['Andorra la Vella', 'ad', 'Andorra la Vella', '20430', '42.51', '1.51']
if int(spl[3]) > 10000:
    print (spl[:3])
...     
['Andorra la Vella', 'ad', 'Andorra la Vella']

演示：

with open('filename') as f:
    for line in f:
        data = line.rstrip('|\n').split('|')
        if int(data[3]) > 10000:
            print data[:4]

輸出：

['Andorra la Vella', 'ad', 'Andorra la Vella', '20430']
['Encamp', 'ad', 'Encamp', '11224']

如何從Python中的類似字符串中獲取值？

問題描述

4 個解決方案

解決方案1
5 2013-11-17 16:07:04

解決方案2
2 已采納 2013-11-17 16:08:00

解決方案3
1 2013-11-17 16:19:05

解決方案4
0 2013-11-17 16:07:13

如何從Python中的類似字符串中獲取值？

問題描述

4 個解決方案

解決方案1 5 2013-11-17 16:07:04

解決方案2 2 已采納 2013-11-17 16:08:00

解決方案3 1 2013-11-17 16:19:05

解決方案4 0 2013-11-17 16:07:13

解決方案1
5 2013-11-17 16:07:04

解決方案2
2 已采納 2013-11-17 16:08:00

解決方案3
1 2013-11-17 16:19:05

解決方案4
0 2013-11-17 16:07:13