简体   繁体   English

如何获取 csv 列中的出现次数并保存为包含 python 中的计数的新 csv

[英]How can I get a count of occurrances in csv columns and save as a new csv containing the count in python

I am new to python, I would really appreciate the assistance.我是 python 的新手,非常感谢您的帮助。 I trie the entire day.我尝试了一整天。 I have a csv file containing 10 columns.我有一个包含 10 列的 csv 文件。 I am only interested in 3 state, county and zipcode.我只对 3 state、县和邮政编码感兴趣。 I am trying, trying and trying to get a count of the occurrences in each column for instance CA 20000, TX 14000, and having the count result outpute to be saved in a csv files that could be easily imported into excel and further merged with geospatial files.我正在尝试,尝试并尝试计算每列中出现的次数,例如 CA 20000、TX 14000,并将计数结果输出保存在 csv 文件中,该文件可以轻松导入 excel 并进一步与地理空间合并文件。

I managed to select the 3 columns that I need我设法 select 我需要的 3 列


import numpy as np

from tabulate import tabulate

import pandas as pd 


#Replace with path and name file in your computer
filename = "10column.csv"

# Enter the column number that you want to display between [, , ,] no space between the commas  "usecols=[3,4,5]" 
table = np.genfromtxt(filename,delimiter=',',skip_header=0,dtype='U',usecols=[4,5,6])

print(tabulate(table))

#Insert the path and name of the file 
pd.DataFrame(table).to_csv("3column.csv") 

Then I tried to count the occurrences but the output it is in the wrong format and I cannot save as csv.然后我尝试计算出现次数,但 output 格式错误,我无法保存为 csv。


import csv

from collections import Counter


import numpy as np

my_reader = csv.reader(open("3column.csv"))

#insert column number instead of the 2 "[rec[2]"
column = [rec[2] for rec in my_reader]

np.array([Counter(column)])

print(np.array([Counter(column)]))

the result is结果是

[Counter({'22209': 10, '20007': 5, …'})]

I cannot save it as csv and I would like to have on a tabulated format我无法将其保存为 csv 并且我想使用表格格式

zip, count
22209, 10, 20007, 10

I would really appreciate your help我将衷心感谢您的帮助

A different way to approach would be using value_counts() from Pandas documentation .另一种方法是使用 Pandas 文档中的value_counts()

Return a Series containing counts of unique values.返回一个包含唯一值计数的系列。

Exemple data file 7column.csv示例数据文件7column.csv

id,state,city,zip,ip_address,latitude,longitude
1,NY,New York City,10005,246.78.179.157,40.6964,-74.0253
2,WA,Yakima,98907,79.61.127.155,46.6288,-120.574
3,OK,Oklahoma City,73109,129.226.225.133,35.4259,-97.5261
4,FL,Orlando,32859,104.196.5.159,28.4429,-81.4026
5,NY,New York City,10004,246.78.180.157,40.6964,-74.0253
6,FL,Orlando,32860,104.196.5.159,29.4429,-81.4026
7,IL,Chicago,60641,19.226.187.13,41.9453,-87.7474
8,NC,Fayetteville,28314,45.109.1.38,35.0583,-79.008
9,IL,Chicago,60642,19.226.187.14,41.9453,-87.7474
10,WA,Yakima,98907,79.61.127.156,46.6288,-120.574
11,IL,Chicago,60643,19.226.187.15,41.9453,-87.7474
12,CA,Sacramento,94237,77.208.31.167,38.3774,-121.4444
import pandas as pd

df = pd.read_csv("7column.csv")

zipcode = df["zip"].value_counts()
state = df["state"].value_counts()
city = df["city"].value_counts()

zipcode.to_csv('zipcode_count.csv')
state.to_csv('state_count.csv')
city.to_csv('city_count.csv')

CSV output files CSV output 文件

state_count.csv   |   city_count.csv      |  zipcode_count.csv
,state            |   ,city               |  ,zip
IL,3              |   Chicago,3           |  98907,2
NY,2              |   Orlando,2           |  32859,1
FL,2              |   New York City,2     |  94237,1
WA,2              |   Yakima,2            |  32860,1
NC,1              |   Sacramento,1        |  28314,1
OK,1              |   Fayetteville,1      |  10005,1
CA,1              |   Oklahoma City,1     |  10004,1
                  |                       |  60643,1
                  |                       |  60642,1
                  |                       |  60641,1
                  |                       |  73109,1

You could read the file you wrote out to CSV in as a DataFrame and use the count method that Pandas has.您可以以 DataFrame 的形式读取您写给 CSV 的文件,并使用 Pandas 具有的计数方法。

states_3 = pd.DataFrame(table)

state_count = states_3.count(axis='columns')

out_name = 'statecount.xlsx'
with pd.ExcelWriter(out_name) as writer:
    state_count.to_excel(writer, sheet_name='counts')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM