[英]How can I get a count of occurrances in csv columns and save as a new csv containing the count in python
我是 python 的新手,非常感謝您的幫助。 我嘗試了一整天。 我有一個包含 10 列的 csv 文件。 我只對 3 state、縣和郵政編碼感興趣。 我正在嘗試,嘗試並嘗試計算每列中出現的次數,例如 CA 20000、TX 14000,並將計數結果輸出保存在 csv 文件中,該文件可以輕松導入 excel 並進一步與地理空間合並文件。
我設法 select 我需要的 3 列
import numpy as np
from tabulate import tabulate
import pandas as pd
#Replace with path and name file in your computer
filename = "10column.csv"
# Enter the column number that you want to display between [, , ,] no space between the commas "usecols=[3,4,5]"
table = np.genfromtxt(filename,delimiter=',',skip_header=0,dtype='U',usecols=[4,5,6])
print(tabulate(table))
#Insert the path and name of the file
pd.DataFrame(table).to_csv("3column.csv")
然后我嘗試計算出現次數,但 output 格式錯誤,我無法保存為 csv。
import csv
from collections import Counter
import numpy as np
my_reader = csv.reader(open("3column.csv"))
#insert column number instead of the 2 "[rec[2]"
column = [rec[2] for rec in my_reader]
np.array([Counter(column)])
print(np.array([Counter(column)]))
結果是
[Counter({'22209': 10, '20007': 5, …'})]
我無法將其保存為 csv 並且我想使用表格格式
zip, count
22209, 10, 20007, 10
我將衷心感謝您的幫助
另一種方法是使用 Pandas 文檔中的value_counts()
。
返回一個包含唯一值計數的系列。
示例數據文件7column.csv
id,state,city,zip,ip_address,latitude,longitude
1,NY,New York City,10005,246.78.179.157,40.6964,-74.0253
2,WA,Yakima,98907,79.61.127.155,46.6288,-120.574
3,OK,Oklahoma City,73109,129.226.225.133,35.4259,-97.5261
4,FL,Orlando,32859,104.196.5.159,28.4429,-81.4026
5,NY,New York City,10004,246.78.180.157,40.6964,-74.0253
6,FL,Orlando,32860,104.196.5.159,29.4429,-81.4026
7,IL,Chicago,60641,19.226.187.13,41.9453,-87.7474
8,NC,Fayetteville,28314,45.109.1.38,35.0583,-79.008
9,IL,Chicago,60642,19.226.187.14,41.9453,-87.7474
10,WA,Yakima,98907,79.61.127.156,46.6288,-120.574
11,IL,Chicago,60643,19.226.187.15,41.9453,-87.7474
12,CA,Sacramento,94237,77.208.31.167,38.3774,-121.4444
import pandas as pd
df = pd.read_csv("7column.csv")
zipcode = df["zip"].value_counts()
state = df["state"].value_counts()
city = df["city"].value_counts()
zipcode.to_csv('zipcode_count.csv')
state.to_csv('state_count.csv')
city.to_csv('city_count.csv')
CSV output 文件
state_count.csv | city_count.csv | zipcode_count.csv
,state | ,city | ,zip
IL,3 | Chicago,3 | 98907,2
NY,2 | Orlando,2 | 32859,1
FL,2 | New York City,2 | 94237,1
WA,2 | Yakima,2 | 32860,1
NC,1 | Sacramento,1 | 28314,1
OK,1 | Fayetteville,1 | 10005,1
CA,1 | Oklahoma City,1 | 10004,1
| | 60643,1
| | 60642,1
| | 60641,1
| | 73109,1
您可以以 DataFrame 的形式讀取您寫給 CSV 的文件,並使用 Pandas 具有的計數方法。
states_3 = pd.DataFrame(table)
state_count = states_3.count(axis='columns')
out_name = 'statecount.xlsx'
with pd.ExcelWriter(out_name) as writer:
state_count.to_excel(writer, sheet_name='counts')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.