简体   繁体   中英

Using a dataset to create a dictionary of the count of inputs from a specific column with no repeats

I'm trying to create a dictionary from this csv dataset: https://data.cityofnewyork.us/Social-Services/311-Noise-Complaints/p5f6-bkga so that I can create a count of the number of complaints that have been made for each zip code recorded so that I can then map it out into a choropleth map.

This is my code so far:

import folium
import pandas as pd

data311 = pd.read_csv('311_Noise_Complaints.csv')

zips = list(data311["Incident Zip"])
zipsND = pd.Series(zips).drop_duplicates().to_list()

newData = {}
occ = 0

for zp in zipsND:
    for data in data311["Incident Zip"]:
        if zp == data:
            newData[zp].append(occ)
    occ += 1

print(newData)

But I get this error:

Warning (from warnings module):
  File "<string>", line 1
DtypeWarning: Columns (15,17,18,20) have mixed types.Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
  File "/Users/kenia/Desktop/CSCI233/PRAC.py", line 15, in <module>
    newData[zp].append(occ)
KeyError: 11215.0

I'm not exactly sure what the error is telling me. Is the issue how I've set up the dictionary and it can't record the first data of the dataset? I also tried moving the indentations in the for loop around (just the occ += 1 and print lines ) but I get the same error


[EDIT]

So I ran this code again, but using defaultdict() this time:

data311 = pd.read_csv(311_Noise_Complaints.csv')

zips = list(data311["Incident Zip"])
zipsND = pd.Series(zips).drop_duplicates().to_list()

new = defaultdict(list)
count = 0

for (zp, count) in zipsND:
    if zp in zips:
        count += 1
        new[zp].append(count)

print(new)

But now I get this error:

line 13, in <module>
    for (zp, count) in zipsND:
TypeError: cannot unpack non-iterable float object

[EDIT 2]

zips = list(data311["Incident Zip"])
zipsND = pd.Series(zips).drop_duplicates().to_list()

new = defaultdict(list)
count = 0

for zp in zipsND:
    if zp in zips:
        new[zp] = [count]
    count += 1
print(new)

I get results with this code, but the values are not correct. It just counts each zipcode; so the first zip code has a value of 0, the second 1 and so on. I need each zipcode to have the total number of complaints that were made within that zipcode.

Initially, there is nothing in newData , so the first time you try to add to newData[zp] , there isn't anything to append to.

Is this what you are looking for?

import pandas as pd
data311 = pd.read_csv('311_Noise_Complaints.csv')
f = data311.groupby(['Incident Zip']).agg({'Incident Zip':'count'}).to_dict()
print (f)

The output of this will be (i picked the first 160 rows as a sample set):

{'Incident Zip': {10001: 1, 10002: 1, 10009: 1, 10010: 1, 10011: 1, 10012: 1, 10019: 3, 10023: 1, 10024: 2, 10025: 2, 10026: 1, 10027: 2, 10031: 2, 10032: 9, 10033: 3, 10035: 1, 10036: 1, 10037: 1, 10040: 5, 10302: 1, 10303: 1, 10306: 2, 10310: 3, 10314: 1, 10451: 4, 10452: 2, 10453: 4, 10454: 3, 10455: 1, 10456: 6, 10457: 3, 10458: 3, 10459: 1, 10460: 6, 10461: 2, 10462: 2, 10466: 2, 10467: 5, 10468: 3, 10469: 1, 10472: 1, 10473: 1, 11101: 1, 11102: 1, 11105: 2, 11106: 2, 11201: 1, 11203: 1, 11204: 1, 11206: 3, 11208: 1, 11210: 1, 11211: 1, 11212: 1, 11213: 2, 11215: 1, 11217: 1, 11218: 1, 11220: 1, 11221: 2, 11225: 1, 11226: 5, 11229: 4, 11233: 3, 11235: 1, 11237: 1, 11238: 2, 11354: 1, 11356: 2, 11365: 2, 11368: 2, 11372: 1, 11373: 1, 11374: 1, 11377: 2, 11378: 1, 11385: 2, 11414: 1, 11416: 1, 11418: 1, 11419: 2, 11420: 1, 11435: 1}}

EDIT 1:

This is an updated code.

import pandas as pd
data311 = pd.read_csv('311_noise_complaints.csv')
zips = list(data311["Incident Zip"])
f = data311[['Unique Key','Incident Zip']].groupby(['Incident Zip']).agg('count').to_dict()
print (f)

The output is as shown below.

I ran it against the full 311 file.

{'Unique Key': {0.0: 1, 83.0: 337, 10000.0: 723, 10001.0: 26972, 10002.0: 56853, 10003.0: 44626, 10004.0: 3578, 10005.0: 6361, 10006.0: 3769, 10007.0: 7187, 10009.0: 55194, 10010.0: 19225, 10011.0: 34350, 10012.0: 26946, 10013.0: 25343, 10014.0: 27155, 10016.0: 36739, 10017.0: 10850, 10018.0: 9491, 10019.0: 33394, 10020.0: 237, 10021.0: 12556, 10022.0: 16395, 10023.0: 24528, 10024.0: 28261, 10025.0: 50183, 10026.0: 42993, 10027.0: 50355, 10028.0: 19946, 10029.0: 52997, 10030.0: 29588, 10031.0: 76699, 10032.0: 71851, 10033.0: 55417, 10034.0: 66645, 10035.0: 25192, 10036.0: 24200, 10037.0: 13804, 10038.0: 15350, 10039.0: 26660, 10040.0: 66489, 10041.0: 57, 10044.0: 649, 10045.0: 12, 10048.0: 33, 10065.0: 11494, 10069.0: 847, 10075.0: 9035, 10103.0: 10, 10105.0: 37, 10106.0: 49, 10107.0: 20, 10110.0: 4, 10111.0: 17, 10112.0: 37, 10115.0: 4, 10118.0: 44, 10119.0: 36, 10120.0: 31, 10121.0: 46, 10122.0: 6, 10123.0: 22, 10128.0: 23133, 10129.0: 9, 10152.0: 9, 10153.0: 18, 10154.0: 5, 10155.0: 1, 10158.0: 56, 10162.0: 139, 10165.0: 10, 10167.0: 2, 10168.0: 3, 10169.0: 9, 10170.0: 30, 10171.0: 3, 10172.0: 5, 10173.0: 1, 10174.0: 19, 10176.0: 1, 10177.0: 10, 10178.0: 7, 10271.0: 16, 10275.0: 11, 10278.0: 31, 10279.0: 27, 10280.0: 1679, 10281.0: 346, 10282.0: 676, 10301.0: 16528, 10302.0: 6258, 10303.0: 7632, 10304.0: 16740, 10305.0: 7416, 10306.0: 10304, 10307.0: 2852, 10308.0: 4210, 10309.0: 3885, 10310.0: 11082, 10312.0: 7394, 10314.0: 13363, 10451.0: 31149, 10452.0: 62874, 10453.0: 59162, 10454.0: 20059, 10455.0: 24716, 10456.0: 57692, 10457.0: 58746, 10458.0: 54311, 10459.0: 27283, 10460.0: 38090, 10461.0: 18868, 10462.0: 32137, 10463.0: 41064, 10464.0: 2538, 10465.0: 14298, 10466.0: 95481, 10467.0: 66896, 10468.0: 66413, 10469.0: 21987, 10470.0: 7362, 10471.0: 7065, 10472.0: 43586, 10473.0: 23173, 10474.0: 8207, 10475.0: 3168, 10583.0: 1, 10803.0: 2, 11001.0: 574, 11004.0: 2100, 11005.0: 40, 11040.0: 368, 11096.0: 1, 11101.0: 23341, 11102.0: 18084, 11103.0: 20622, 11104.0: 13449, 11105.0: 14990, 11106.0: 21565, 11109.0: 1441, 11201.0: 38457, 11203.0: 28045, 11204.0: 14174, 11205.0: 32537, 11206.0: 56656, 11207.0: 43003, 11208.0: 41012, 11209.0: 23689, 11210.0: 18774, 11211.0: 56416, 11212.0: 33420, 11213.0: 39695, 11214.0: 16671, 11215.0: 26197, 11216.0: 49235, 11217.0: 32102, 11218.0: 21723, 11219.0: 12134, 11220.0: 29694, 11221.0: 60046, 11222.0: 26228, 11223.0: 17772, 11224.0: 13046, 11225.0: 47689, 11226.0: 77819, 11228.0: 6429, 11229.0: 20586, 11230.0: 24152, 11231.0: 15842, 11232.0: 13360, 11233.0: 34900, 11234.0: 22271, 11235.0: 21506, 11236.0: 25438, 11237.0: 49388, 11238.0: 51740, 11239.0: 1375, 11241.0: 4, 11242.0: 10, 11243.0: 4, 11249.0: 23555, 11251.0: 1, 11354.0: 11975, 11355.0: 14236, 11356.0: 10640, 11357.0: 6729, 11358.0: 7297, 11359.0: 12, 11360.0: 2173, 11361.0: 4664, 11362.0: 1723, 11363.0: 1064, 11364.0: 3842, 11365.0: 10384, 11366.0: 27548, 11367.0: 8992, 11368.0: 36096, 11369.0: 13150, 11370.0: 6422, 11371.0: 4, 11372.0: 28212, 11373.0: 25907, 11374.0: 12069, 11375.0: 18285, 11377.0: 27329, 11378.0: 8568, 11379.0: 6748, 11385.0: 45105, 11411.0: 5367, 11412.0: 8934, 11413.0: 7301, 11414.0: 5785, 11415.0: 9684, 11416.0: 11158, 11417.0: 11546, 11418.0: 14633, 11419.0: 24669, 11420.0: 20867, 11421.0: 18786, 11422.0: 5997, 11423.0: 9406, 11426.0: 2480, 11427.0: 4642, 11428.0: 5620, 11429.0: 5097, 11430.0: 38, 11432.0: 25472, 11433.0: 13105, 11434.0: 14533, 11435.0: 18474, 11436.0: 7503, 11691.0: 16486, 11692.0: 5150, 11693.0: 5436, 11694.0: 6333, 11695.0: 1, 11697.0: 214, 12345.0: 41}}

EDIT 2:

You can also use the pandas value_counts() option. This is much faster when you dont use the sort option.

f = data311['Incident Zip'].value_counts(sort=False).to_dict()
print (f)

The output of this will be:

{0.0: 1, 10000.0: 723, 10001.0: 26972, 10002.0: 56853, 10003.0: 44626, 10004.0: 3578, 10005.0: 6361, 10006.0: 3769, 10007.0: 7187, 10009.0: 55194, 10010.0: 19225, 10011.0: 34350, 10012.0: 26946, 10013.0: 25343, 10014.0: 27155, 10016.0: 36739, 10017.0: 10850, 10018.0: 9491, 10019.0: 33394, 10020.0: 237, 10021.0: 12556, 10022.0: 16395, 10023.0: 24528, 10024.0: 28261, 10025.0: 50183, 10026.0: 42993, 10027.0: 50355, 10028.0: 19946, 10029.0: 52997, 10030.0: 29588, 10031.0: 76699, 10032.0: 71851, 10033.0: 55417, 10034.0: 66645, 10035.0: 25192, 10036.0: 24200, 10037.0: 13804, 10038.0: 15350, 10039.0: 26660, 10040.0: 66489, 10041.0: 57, 10044.0: 649, 10045.0: 12, 10048.0: 33, 10065.0: 11494, 10069.0: 847, 10075.0: 9035, 10103.0: 10, 10105.0: 37, 10106.0: 49, 10107.0: 20, 10110.0: 4, 10111.0: 17, 10112.0: 37, 10115.0: 4, 10118.0: 44, 10119.0: 36, 10120.0: 31, 10121.0: 46, 10122.0: 6, 10123.0: 22, 10128.0: 23133, 10129.0: 9, 10152.0: 9, 10153.0: 18, 10154.0: 5, 10155.0: 1, 10158.0: 56, 10162.0: 139, 10165.0: 10, 10167.0: 2, 10168.0: 3, 10169.0: 9, 10170.0: 30, 10171.0: 3, 10172.0: 5, 10173.0: 1, 10174.0: 19, 10176.0: 1, 10177.0: 10, 10178.0: 7, 10271.0: 16, 10275.0: 11, 10278.0: 31, 10279.0: 27, 10280.0: 1679, 10281.0: 346, 10282.0: 676, 10301.0: 16528, 10302.0: 6258, 10303.0: 7632, 10304.0: 16740, 10305.0: 7416, 10306.0: 10304, 10307.0: 2852, 10308.0: 4210, 10309.0: 3885, 10310.0: 11082, 10312.0: 7394, 10314.0: 13363, 10451.0: 31149, 10452.0: 62874, 10453.0: 59162, 10454.0: 20059, 10455.0: 24716, 10456.0: 57692, 10457.0: 58746, 10458.0: 54311, 10459.0: 27283, 10460.0: 38090, 10461.0: 18868, 10462.0: 32137, 10463.0: 41064, 10464.0: 2538, 10465.0: 14298, 10466.0: 95481, 10467.0: 66896, 10468.0: 66413, 10469.0: 21987, 10470.0: 7362, 10471.0: 7065, 10472.0: 43586, 10473.0: 23173, 10474.0: 8207, 10475.0: 3168, 10583.0: 1, 10803.0: 2, 11001.0: 574, 11004.0: 2100, 11005.0: 40, 11040.0: 368, 11096.0: 1, 11101.0: 23341, 11102.0: 18084, 11103.0: 20622, 11104.0: 13449, 11105.0: 14990, 11106.0: 21565, 11109.0: 1441, 11201.0: 38457, 11203.0: 28045, 11204.0: 14174, 11205.0: 32537, 11206.0: 56656, 11207.0: 43003, 11208.0: 41012, 11209.0: 23689, 11210.0: 18774, 11211.0: 56416, 11212.0: 33420, 11213.0: 39695, 11214.0: 16671, 11215.0: 26197, 11216.0: 49235, 11217.0: 32102, 11218.0: 21723, 11219.0: 12134, 11220.0: 29694, 11221.0: 60046, 11222.0: 26228, 11223.0: 17772, 11224.0: 13046, 11225.0: 47689, 11226.0: 77819, 11228.0: 6429, 11229.0: 20586, 11230.0: 24152, 11231.0: 15842, 11232.0: 13360, 11233.0: 34900, 11234.0: 22271, 11235.0: 21506, 11236.0: 25438, 11237.0: 49388, 11238.0: 51740, 11239.0: 1375, 11241.0: 4, 11242.0: 10, 11243.0: 4, 11249.0: 23555, 11251.0: 1, 11354.0: 11975, 11355.0: 14236, 11356.0: 10640, 11357.0: 6729, 11358.0: 7297, 11359.0: 12, 11360.0: 2173, 11361.0: 4664, 11362.0: 1723, 11363.0: 1064, 11364.0: 3842, 11365.0: 10384, 11366.0: 27548, 11367.0: 8992, 11368.0: 36096, 11369.0: 13150, 11370.0: 6422, 11371.0: 4, 11372.0: 28212, 11373.0: 25907, 11374.0: 12069, 11375.0: 18285, 11377.0: 27329, 11378.0: 8568, 11379.0: 6748, 11385.0: 45105, 11411.0: 5367, 11412.0: 8934, 11413.0: 7301, 11414.0: 5785, 11415.0: 9684, 11416.0: 11158, 11417.0: 11546, 11418.0: 14633, 11419.0: 24669, 11420.0: 20867, 11421.0: 18786, 11422.0: 5997, 11423.0: 9406, 11426.0: 2480, 11427.0: 4642, 11428.0: 5620, 11429.0: 5097, 11430.0: 38, 11432.0: 25472, 11433.0: 13105, 11434.0: 14533, 11435.0: 18474, 11436.0: 7503, 11691.0: 16486, 11692.0: 5150, 11693.0: 5436, 11694.0: 6333, 11695.0: 1, 11697.0: 214, 12345.0: 41, 83.0: 337}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM