[英]How to map the most “similar” strings from one list to another in python?
[英]How to get values from similar strings in Python?
假設我有一個包含類似字符串的文件中的以下字符串:
Andorra la Vella|ad|Andorra la Vella|20430|42.51|1.51|
Canillo|ad|Canillo|3292|42.57|1.6|
Encamp|ad|Encamp|11224|42.54|1.57|
La Massana|ad|La Massana|7211|42.55|1.51|
...
如何使用正則表達式打印第一個數字(或每個字符串的第四個字段)? 而且,如果第四個數字超過10000,我怎么能打印特定線的前4個字段(例如“Andorra la Vella”“ad”“Andorra la Vella”20430)?
我認為在這種情況下使用csv
模塊會更容易:
import csv
with open(filename, 'rb') as f:
for row in csv.reader(f, delimiter='|'):
num = float(row[3])
if num > 10000:
print(row[:4])
你不需要正則表達式。
s = """
Andorra la Vella|ad|Andorra la Vella|20430|42.51|1.51|
Canillo|ad|Canillo|3292|42.57|1.6|
Encamp|ad|Encamp|11224|42.54|1.57|
La Massana|ad|La Massana|7211|42.55|1.51|
"""
for line in s.splitlines(): # pretend we are reading from a file
if not line:
continue # skip empty lines
groups = line.split('|') # splits each line into its segments
if int(groups[3]) > 10000: # checks if the 4th value is above 10000
print groups[:4] # prints the first 4 values
else:
print groups[3] # prints the 4th value
>>>
['Andorra la Vella', 'ad', 'Andorra la Vella', '20430']
3292
['Encamp', 'ad', 'Encamp', '11224']
7211
使用正則表達式 :
import re
results = [re.match('(.*?\|)(.*?\|)(.*?\|)(.*?\|)(.*?\|)(.*?\|)', line).groups() for line in open('file.txt')]
# filter just the rows with fourth column > 10000
results = [result for result in results if int(result[3]) > 10000]
使用拆分 :
results = [line.split('|')[0:-1] for line in open('file.txt')]
# filter just the rows with fourth column > 10000
results = [result for result in results if int(result[3]) > 10000]
你不需要正則表達式,你可以使用str.split
和str.strip
:
>>> s = 'Andorra la Vella|ad|Andorra la Vella|20430|42.51|1.51|'
>>> spl = s.rstrip('|\n').split('|')
>>> spl
['Andorra la Vella', 'ad', 'Andorra la Vella', '20430', '42.51', '1.51']
if int(spl[3]) > 10000:
print (spl[:3])
...
['Andorra la Vella', 'ad', 'Andorra la Vella']
演示:
with open('filename') as f:
for line in f:
data = line.rstrip('|\n').split('|')
if int(data[3]) > 10000:
print data[:4]
輸出:
['Andorra la Vella', 'ad', 'Andorra la Vella', '20430']
['Encamp', 'ad', 'Encamp', '11224']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.