如何在另一個 CSV 中查找一個 CSV 中的數據？

Question

在crq_data文件中，我有來自用戶上傳的 *.csv 文件的城市和州
在cityDoordinates.csv文件中，我有一個美國城市和州的圖書館以及它們的坐標，我希望這是一種“查找工具”來比較上傳的 .csv 文件以找到它們在 Folium 中映射的坐標

現在，它逐行讀取，因此一次添加一個坐標（n 秒）我希望它運行得更快，這樣如果有 6000 行，用戶就不必等待 6000 秒。

這是我的代碼的一部分：

crq_file = askopenfilename(filetypes=[('CSV Files', '*csv')])
crq_data = pd.read_csv(crq_file, encoding="utf8")
coords = pd.read_csv("cityCoordinates.csv")

for crq in range(len(crq_data)):
    task_city = crq_data.iloc[crq]["TaskCity"]
    task_state = crq_data.iloc[crq]["TaskState"]

    for coordinates in range(len(coords)):
        cityCoord = coords.iloc[coordinates]["City"]
        stateCoord = coords.iloc[coordinates]["State"]
        latCoord = coords.iloc[coordinates]["Latitude"]
        lngCoord = coords.iloc[coordinates]["Longitude"]

        if task_city == cityCoord and task_state == stateCoord:
            crq_data["CRQ Latitude"] = latCoord
            crq_data["CRQ Longitude"] = lngCoord
                
            print(cityCoord, stateCoord, latCoord, lngCoord)

這是當前終端輸出的示例

上傳的 .csv 文件示例

Answer 1

我認為這不是優化 Pandas 的問題，而是為快速查找找到一個好的數據結構：快速查找的一個好的數據結構是 dict。 但是，字典需要記憶。 您需要自己評估該成本。

我模擬了你的 cityCoordinates CSV 的樣子：

| City     | State | Latitude   | Longitude   |
|----------|-------|------------|-------------|
| Portland | OR    | 45°31′12″N | 122°40′55″W |
| Dallas   | TX    | 32°46′45″N | 96°48′32″W  |
| Portland | ME    | 43°39′36″N | 70°15′18″W  |

import csv
import pprint


def cs_key(city_name: str, state_name: str) -> str:
    """Make a normalized City-State key."""
    return city_name.strip().lower() + "--" + state_name.strip().lower()


# A dict of { "City_name-State_name": (latitude, longitude), ... }
coords_lookup = {}

with open("cityCoordinates.csv", newline="") as f:
    reader = csv.DictReader(f)  # your coords file appears to have a header
    for row in reader:
        city = row["City"]
        state = row["State"]
        lat = row["Latitude"]
        lon = row["Longitude"]

        key = cs_key(city, state)
        coords_lookup[key] = (lat, lon)


pprint.pprint(coords_lookup, sort_dicts=False)

當我運行它時，我得到：

{'portland--or': ('45°31′12″N', '122°40′55″W'),
 'dallas--tx':   ('32°46′45″N', '96°48′32″W'),
 'portland--me': ('43°39′36″N', '70°15′18″W')}

現在，迭代任務數據看起來幾乎相同：我們取一對 City 和 State，從中制作一個標准化的鍵，然后嘗試查找該鍵以獲得已知坐標。

我模擬了一些任務數據：

| TaskCity   | TaskState |
|------------|-----------|
| Portland   | OR        |
| Fort Worth | TX        |
| Dallas     | TX        |
| Boston     | MA        |
| Portland   | ME        |

當我運行這個時：

with open("crq_data.csv", newline="") as f:
    reader = csv.DictReader(f)
    for row in reader:
        city = row["TaskCity"]
        state = row["TaskState"]

        key = cs_key(city, state)
        coords = coords_lookup.get(key, (None, None))
        if coords != (None, None):
            print(city, state, coords[0], coords[1])

我得到：

Portland OR 45°31′12″N 122°40′55″W
Dallas TX 32°46′45″N 96°48′32″W
Portland ME 43°39′36″N 70°15′18″W

原則上，此解決方案會快得多，因為您沒有執行cityCoordinates-ROWS x taskData-ROWS二次循環。 而且，在實踐中，Pandas 在進行行迭代^1時會受到影響，我不確定索引（ iloc ）是否同樣適用，但一般來說，Pandas 用於操作數據列，我想說不是面向行問題/解決方案。

如何在另一個 CSV 中查找一個 CSV 中的數據？

問題描述

1 個解決方案

解決方案1
0 已采納 2022-07-05 23:24:20

如何在另一個 CSV 中查找一個 CSV 中的數據？

問題描述

1 個解決方案

解決方案1 0 已采納 2022-07-05 23:24:20

解決方案1
0 已采納 2022-07-05 23:24:20