簡體   English   中英

如何查看 CSV 列中的變量列表?

[英]How can I see a list of the variables in a CSV column?

我有一個 csv 文件,其中包含超過 5,000,000 行數據,看起來像這樣(除了它是波斯語):

Contract Code,Contract Type,State,City,Property Type,Region,Usage Type,Area,Percentage,Price,Price per m2,Age,Frame Type,Contract Date,Postal Code
765720,Mobayee,East Azar,Kish,Apartment,,Residential,96,100,570000,5937.5,36,Metal,13890107,5169614658
766134,Mobayee,East Azar,Qeshm,Apartment,,Residential,144.5,100,1070000,7404.84,5,Concrete,13890108,5166884645
766140,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,100,1050000,7266.44,5,Concrete,13890108,5166884645
766146,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,100,700000,4844.29,5,Concrete,13890108,5166884645
766147,Mobayee,East Azar,Kish,Apartment,,Residential,144.5,100,1625000,11245.67,5,Concrete,13890108,5166884645
770822,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,50,500000,1730.1,5,Concrete,13890114,5166884645

我想要一個代碼來列出特定列中的變量。 例如,我希望它為'city'列返回{Kish, Qeshm, Tabriz}

您需要先將 csv 模塊導入 python 文件並讀取文件中的每一行並將其保存在列表中,所以它就像

import csv

cities = []
with open("yourfile.csv", "r") as file:
    reader = csv.DictReader(file)  //This will save the values in the very top of the csv file as header so it will skip a line
    for row in reader:
        city = row["City"]
        cities.append(city)

這會給你一個cities=[Kish, Qesh, Tabriz, ....]

您似乎也想刪除重復項,只需將完成的列表轉換為set即可。 以下是使用pandas的方法:

import pandas as pd
cities = pd.read_csv('yourfile.csv', usecols=['City'])['City']

# just cast to list if you want a plain list instead of a DataFrame
cities_list = list(cities)

# use set to remove the duplicates
unique_cities = set(cities)

如果您需要保留順序,您可以使用僅包含鍵的有序字典

此外,如果您缺少 memory 試圖讀取一個 go 中的 5M 行,您可以分塊讀取它們:

import pandas as pd
cities_chunks_list = [chunck['City'] for chunck in pd.read_csv('yourfile.csv', usecols=['City'], chunksize = 1000)]

#let's flatten the list
cities_list = [city for cities_chunk in cities_chunks_list for city in cities_chunk] 

希望我有所幫助。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM