I have a huge json file , it has a key call type(the type of crime commited), date and time(date crime was commited) , and location(address or lat&long) among other keys with values. Im mostly interested in counting the days with the most crimes , counting what call types show up the most, and what location shows up the most also, the location can measure by the home address or pairing the latitude and longitude together. Python would probably be best . THERES OVER 350 TYPES OF CALL TYPES ON A JSON WITH OVER 350K DATA ROWS. So everything time you see a new call type it should like create a new variable for that and keep track of it
i tried iterating threw it like a list but having issues . how can i attach to my code when its 62 mb , should i link to a file?
this is an example of data
[{"A": "incident_num", "B": "date_time", "C": "day", "D": "stno", "E": "stdir1", "F": "StreetName", "G": "streettype", "H": "FullAddress", "I": "call_type", "J": "disposition", "K": "beat", "L": "priority", "M": "lat", "N": "long"},
{"A": "P17060024503", "B": "6/14/2017 21:54", "C": "4", "D": "10", "E": "", "F": "14TH", "G": "ST", "H": "10 14TH ST, San Diego, CA", "I": "1151", "J": "O", "K": "521", "L": "2", "M": "32.7054489", "N": "-117.1518696"},
{"A": "P17030051227", "B": "3/29/2017 22:24", "C": "4", "D": "10", "E": "", "F": "14TH", "G": "ST", "H": "10 14TH ST, San Diego, CA", "I": "1016", "J": "A", "K": "521", "L": "2", "M": "32.7054489", "N": "-117.1518696"},
{"A": "P17060004814", "B": "6/3/2017 18:04", "C": "7", "D": "10", "E": "", "F": "14TH", "G": "ST", "H": "10 14TH ST, San Diego, CA", "I": "1016", "J": "A", "K": "521", "L": "2", "M": "32.7054489", "N": "-117.1518696"},
{"A": "P17030029336", "B": "3/17/2017 10:57", "C": "6", "D": "10", "E": "", "F": "14TH", "G": "ST", "H": "10 14TH ST, San Diego, CA", "I": "1151", "J": "OT", "K": "521", "L": "2", "M": "32.7054489", "N": "-117.1518696"},
{"A": "P17030005412", "B": "3/3/2017 23:45", "C": "6", "D": "10", "E": "", "F": "15TH", "G": "ST", "H": "10 15TH ST, San Diego, CA", "I": "911P", "J": "CAN", "K": "521", "L": "2", "M": "32.7057215", "N": "-117.1503498"},
{"A": "P17020016091", "B": "2/10/2017 8:23", "C": "6", "D": "10", "E": "", "F": "15TH", "G": "ST", "H": "10 15TH ST, San Diego, CA", "I": "AU2", "J": "W", "K": "521", "L": "2", "M": "32.7057215", "N": "-117.1503498"},
{"A": "P17040017368", "B": "4/11/2017 4:57", "C": "3", "D": "10", "E": "", "F": "15TH", "G": "ST", "H": "10 15TH ST, San Diego, CA", "I": "5150", "J": "CAN", "K": "521", "L": "2", "M": "32.7057215", "N": "-117.1503498"},
{"A": "P17030048050", "B": "3/28/2017 6:30", "C": "3", "D": "10", "E": "", "F": "15TH", "G": "ST", "H": "10 15TH ST, San Diego, CA", "I": "1146", "J": "K", "K": "521", "L": "", "M": "32.7057215", "N": "-117.1503498"},
{"A": "P17060037341", "B": "6/22/2017 10:19", "C": "5", "D": "10", "E": "", "F": "15TH", "G": "ST", "H": "10 15TH ST, San Diego, CA", "I": "242", "J": "K", "K": "521", "L": "1", "M": "32.7057215", "N": "-117.1503498"},
{"A": "P17060008467", "B": "6/5/2017 19:27", "C": "2", "D": "10", "E": "", "F": "15TH", "G": "ST", "H": "10 15TH ST, San Diego, CA", "I": "5150", "J": "K", "K": "521", "L": "2", "M": "32.7057215", "N": "-117.1503498"},
i just want stats for like each call type that was made and how much time it was made , or what location has most crimes , what date had the most crimes etc ..
Use pandas
:
import pandas as pd
raw_df = pd.DataFrame(data)
df = raw_df.rename(columns=raw_df.iloc[0]).drop(0)
df
Output:
incident_num date_time day stno stdir1 StreetName ... call_type disposition beat priority lat long
1 P17060024503 6/14/2017 21:54 4 10 14TH ... 1151 O 521 2 32.7054489 -117.1518696
2 P17030051227 3/29/2017 22:24 4 10 14TH ... 1016 A 521 2 32.7054489 -117.1518696
3 P17060004814 6/3/2017 18:04 7 10 14TH ... 1016 A 521 2 32.7054489 -117.1518696
4 P17030029336 3/17/2017 10:57 6 10 14TH ... 1151 OT 521 2 32.7054489 -117.1518696
5 P17030005412 3/3/2017 23:45 6 10 15TH ... 911P CAN 521 2 32.7057215 -117.1503498
6 P17020016091 2/10/2017 8:23 6 10 15TH ... AU2 W 521 2 32.7057215 -117.1503498
7 P17040017368 4/11/2017 4:57 3 10 15TH ... 5150 CAN 521 2 32.7057215 -117.1503498
8 P17030048050 3/28/2017 6:30 3 10 15TH ... 1146 K 521 32.7057215 -117.1503498
9 P17060037341 6/22/2017 10:19 5 10 15TH ... 242 K 521 1 32.7057215 -117.1503498
10 P17060008467 6/5/2017 19:27 2 10 15TH ... 5150 K 521 2 32.7057215 -117.1503498
Example of queries you can run:
>>> df['call_type'].value_counts()
5150 2
1016 2
1151 2
242 1
911P 1
AU2 1
1146 1
Iterate the json file and store the required fields in assosiatve array. You can perform operation on it.
If the data has fixed column and structure you can store it in database like MySql and you can perform your required operations easily with simple queries.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.