![](/img/trans.png)
[英]Is there a faster way to check the similar data value in a column from csv file using python?
[英]Faster way to check for value in csv?
我有一些從csv查找的代碼,然后如果csv中不存在它,則從google地圖中檢索它。 我有100,000多條記錄,大約需要2個小時。 關於如何加快速度的任何想法? 謝謝!
from csv import DictReader
import codecs
def find_school(high_school, city, state):
types_of_encoding = ["utf8"]
for encoding_type in types_of_encoding:
with codecs.open('C:/high_schools.csv', encoding=encoding_type, errors='replace') as csvfile:
reader = DictReader(csvfile)
for row in reader:
#checks the csv file and sees if the high school already exists
if (row['high_school'] == high_school.upper() and
row['city'] == city.upper() and
row['state'] == state.upper()):
return dict(row)['zipcode'],dict(row)['latitude'],dict(row)['longitude'],dict(row)['place_id']
else:
#hits Google Maps api
#executes
df['zip'],df['latitude'], df['longitude'], df['place_id'] = zip(*df.apply(lambda row: find_school(row['high_school'].strip(), row['City'].strip(), row['State'].strip()), axis=1))
CSV文件片段
high_school,city,state,address,zipcode,latitude,longitude,place_id,country,location_type
GEORGIA MILITARY COLLEGE,MILLEDGEVILLE,GA,"201 E GREENE ST, MILLEDGEVILLE, GA 31061, USA",31061,33.0789184,-83.2235169,ChIJv0wUz97H9ogRwuKm_HC-lu8,USA,UNIVERSITY
BOWIE,BOWIE,MD,"15200 ANNAPOLIS RD, BOWIE, MD 20715, USA",20715,38.9780387,-76.7435378,ChIJRWh2C1fpt4kR6XFWnAm5yAE,USA,SCHOOL
EVERGLADES,MIRAMAR,FL,"17100 SW 48TH CT, MIRAMAR, FL 33027, USA",33027,25.9696495,-80.3737813,ChIJQfmM_I6j2YgR1Hdq0CC4apo,USA,SCHOOL
您每次都要進行檢查都沒有必要讀取文件。 只需將文件加載到內存中一次,然后使用元組鍵將您感興趣的字段創建一個新字典即可。
import csv
lookup_dict = {}
with open('C:/Users/Josh/Desktop/test.csv') as infile:
reader = csv.DictReader(infile)
for row in reader:
lookup_dict[(row['high_school'].lower(), row['city'].lower(),
row['state'].lower())] = row
現在,您只需檢查要測試的值是否已經是lookup_dict
的鍵。 如果不是,那么您查詢Google地圖。
由於您的編輯顯示您正在使用它來apply
數據框,因此應在函數外部計算lookup_dict
並將其作為參數傳遞。 這樣,該文件仍然只能讀取一次。
lookup_dict = {}
with open('C:/Users/Josh/Desktop/test.csv') as infile:
reader = csv.DictReader(infile)
for row in reader:
lookup_dict[(row['high_school'].lower(), row['city'].lower(),
row['state'].lower())] = row
def find_school(high_school, city, state, lookup_dict):
result = lookup_dict.get((high_school.lower(), city.lower(), state.lower()))
if result:
return result
else:
# Google query
pass
a = find_school('georgia military college', 'milledgeville', 'ga', lookup_dict)
#df['zip'],df['latitude'], df['longitude'], df['place_id'] = zip(*df.apply(lambda row: find_school(row['high_school'].strip(), row['City'].strip(), row['State'].strip()), axis=1))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.