简体   繁体   中英

python - read in csv to extract corresponding values

I have a GeoJSON file which was originally a CAD drawing which has been converted. The different features are now in the following format:

{
"type": "FeatureCollection",
"name": "entities",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature","id":"1","properties": { "Layer": "Ref02-Boundary-River Lagan", "SubClasses": "AcDbEntity:AcDbPolyline", "EntityHandle": "18D49" }, "geometry": { "type": "LineString", "coordinates": [ [ -5.912099758277697, 54.609878675075841, -5.5 ], [ -5.912101882049217, 54.609877129917869, -5.5 ], [ -5.912101882049217, 54.609877129917869, -5.5 ], [ -5.912385911796179, 54.609732322025998, -5.5 ], [ -5.912460771867132, 54.609694155565506, -5.5 ], [ -5.912509312980717, 54.609669407428946, -5.5 ], [ -5.912558290440344, 54.609644436775582, -5.5 ], [ -5.912593219883747, 54.609626628327483, -5.5 ], [ -5.912842999552883, 54.609451465147721, -5.5 ], [ -5.913303982987227, 54.609185055049799, -5.5 ], [ -5.913357621258934, 54.609154056244115, -5.5 ], [ -5.913475791249335, 54.609085762803012, -5.5 ], [ -5.91331045908047, 54.609008818748194, -5.5 ], [ -5.913567798505827, 54.608822686626787, -5.5 ], [ -5.913930556058254, 54.608559661895065, -5.5 ], [ -5.914450423684049, 54.608183772015252, -5.5 ], [ -5.914890158601113, 54.607859968684259, -5.5 ], [ -5.915538373511547, 54.607376600730291, -5.5 ], [ -5.916032865254835, 54.607009313468311, -5.5 ], [ -5.916697729452425, 54.606515616313651, -5.5 ], [ -5.917178407683183, 54.606158685978556, -5.5 ], [ -5.917766043503596, 54.605716418896513, -5.5 ], [ -5.918086236941109, 54.605476215232756, -5.5 ], [ -5.918640259978573, 54.605054489501633, -5.5 ], [ -5.918999711737682, 54.604774215636091, -5.5 ], [ -5.919357947073775, 54.604466516913604, -5.5 ], [ -5.919119477531201, 54.604364287495677, -5.5 ], [ -5.91911946365491, 54.604364281547021, -5.5 ], [ -5.919431266971372, 54.604107513356603, -5.5 ], [ -5.919746627591454, 54.603850932389285, -5.5 ], [ -5.920139467068778, 54.603531307996782, -5.5 ], [ -5.920248667309735, 54.603576372107611, -5.5 ], [ -5.920347471042307, 54.603606382368625, -5.5 ], [ -5.920402664320942, 54.603563838568519, -5.5 ], [ -5.920401738885977, 54.603563453056566, -5.5 ], [ -5.920344866066254, 54.60357812489795, -5.5 ], [ -5.920313678683745, 54.603565255922042, -5.5 ], [ -5.9203068464555, 54.603559486492991, -5.5 ], [ -5.920303789925165, 54.603552837726028, -5.5 ], [ -5.920305749751396, 54.603546999298253, -5.5 ], [ -5.920307430030534, 54.603541993654865, -5.5 ], [ -5.920331543924449, 54.603513953336815, -5.5 ], [ -5.920385677849353, 54.603467389662065, -5.5 ], [ -5.920482867206563, 54.603388806421506, -5.5 ], [ -5.919817541044252, 54.603159111715968, -5.5 ], [ -5.919360395554573, 54.60300128492522, -5.5 ], [ -5.91891893118377, 54.602848869166237, -5.5 ], [ -5.918698193844773, 54.603161730656176, -5.5 ], [ -5.918539363659911, 54.60337070372406, -5.5 ], [ -5.918499839799537, 54.603421756702978, -5.5 ], [ -5.918464898376052, 54.603466890427526, -5.5 ], [ -5.918225682242552, 54.603751339695293, -5.5 ], [ -5.918143444222999, 54.60384144161879, -5.5 ], [ -5.917978369447476, 54.604018308886261, -5.5 ], [ -5.917778937474949, 54.604228055853667, -5.5 ], [ -5.917517847307555, 54.604488421902253, -5.5 ], [ -5.917379353138514, 54.604618038030523, -5.5 ], [ -5.917365475232636, 54.604630431120313, -5.5 ], [ -5.917040197149845, 54.604908732611968, -5.5 ], [ -5.916978245474892, 54.60495804969085, -5.5 ], [ -5.916687343768886, 54.605183975261362, -5.5 ], [ -5.916486539190938, 54.605326465106707, -5.5 ], [ -5.916278267253799, 54.605462208806927, -5.5 ], [ -5.916206535865816, 54.605505889774236, -5.5 ], [ -5.916089228598353, 54.605430749771216, -5.5 ], [ -5.915062917485617, 54.606128298865329, -5.5 ], [ -5.915760690540097, 54.606488653530612, -5.5 ], [ -5.914815684195395, 54.607200189687191, -5.5 ], [ -5.914583029770388, 54.607363894821489, -5.5 ], [ -5.914122726280319, 54.607716003482558, -5.5 ], [ -5.91387658435291, 54.607889193166862, -5.5 ], [ -5.912160615340791, 54.6091048406185, -5.5 ], [ -5.911469071222665, 54.609592077745347, -5.5 ], [ -5.911469071222665, 54.609592077745347, -5.5 ], [ -5.911469071222665, 54.609592077745347, -5.5 ], [ -5.912099758063087, 54.609878675231968, -5.5 ], [ -5.911469071222665, 54.609592077745347, -5.5 ] ] } },
{ "type": "Feature","id":"2","properties": { "Layer": "Ref34-Herdman Channel", "SubClasses": "AcDbEntity:AcDbPolyline", "EntityHandle": "18D4A" }, "geometry": { "type": "LineString", "coordinates": [ [ -5.897789395249773, 54.623858842627953, -7.3 ], [ -5.902905537709284, 54.621963513299349, -7.3 ], [ -5.902980447128448, 54.621954733854786, -7.3 ], [ -5.906383586733336, 54.62068210571401, -7.3 ], [ -5.907394611057439, 54.620271860296093, -7.3 ], [ -5.908271700524285, 54.620438570481646, -7.3 ], [ -5.909237173007087, 54.619740109032655, -7.3 ], [ -5.909244065491171, 54.619670532212957, -7.3 ], [ -5.913579743708052, 54.616533508712052, -7.3 ], [ -5.912536954077766, 54.616201946358402, -7.3 ], [ -5.908610159222599, 54.619008315699411, -7.3 ], [ -5.908177623209888, 54.618817165973439, -7.3 ], [ -5.906296536124115, 54.619602975490572, -7.3 ], [ -5.906417907684791, 54.619696755525432, -7.3 ], [ -5.904505962012193, 54.620518375213322, -7.3 ], [ -5.904379303360807, 54.62042000830553, -7.3 ], [ -5.904178353077652, 54.620627706470792, -7.3 ], [ -5.903970927937185, 54.620842094302013, -7.3 ], [ -5.90145584424333, 54.621753037021421, -7.3 ], [ -5.900146085296913, 54.622228665744018, -7.3 ], [ -5.897978434406959, 54.623016291711536, -7.3 ], [ -5.896884714885545, 54.623410922201039, -7.3 ], [ -5.896807416440097, 54.623435588515115, -7.3 ], [ -5.896729976036048, 54.62345325358843, -7.3 ], [ -5.89661111577345, 54.623472342952041, -7.3 ], [ -5.896522858967185, 54.623480690555247, -7.3 ], [ -5.896428106251355, 54.623484457554135, -7.3 ], [ -5.896303166824568, 54.623481292811128, -7.3 ], [ -5.896197472248414, 54.623471252944306, -7.3 ], [ -5.896137759498506, 54.623462493994786, -7.3 ], [ -5.896052388054945, 54.623445836672367, -7.3 ], [ -5.895978571253911, 54.623427154229439, -7.3 ], [ -5.89590524094214, 54.623404208241219, -7.3 ], [ -5.895800416253622, 54.623362445424725, -7.3 ], [ -5.895757569679375, 54.62334176698927, -7.3 ], [ -5.895647923528012, 54.623292641829437, -7.3 ], [ -5.895577692064672, 54.623239431950907, -7.3 ], [ -5.895514910451845, 54.623170119742099, -7.3 ], [ -5.89538163536643, 54.623022980393102, -7.3 ], [ -5.895329415138476, 54.622808933902263, -7.3 ], [ -5.895351007802742, 54.622572974852275, -7.3 ], [ -5.894146142060189, 54.623536453956078, -7.3 ], [ -5.89412889154843, 54.623528837668523, -7.3 ], [ -5.893690605257398, 54.623885226178857, -7.3 ], [ -5.893623854708871, 54.625016224210214, -7.3 ], [ -5.893578643913801, 54.625782216489412, -7.3 ], [ -5.89347404142008, 54.625948034414961, -7.3 ], [ -5.893943848714391, 54.625901816579493, -7.3 ], [ -5.894893061951917, 54.625808430731169, -7.3 ], [ -5.894584457557488, 54.625586465495758, -7.3 ], [ -5.897239997520025, 54.624337565861637, -7.3 ], [ -5.897464132809517, 54.624498603197388, -7.3 ], [ -5.897738236705834, 54.623857397898696, -7.3 ], [ -5.897789395249773, 54.623858842627953, -7.3 ] ] } },
]
}

The GeoJSON file contains over 45,000 different features, but I only need 60 of them.

I have a CSV which contains reference numbers, eg In the CSV reference 2 is River Lagan, but in the GeoJSON it's "Layer": "Ref02-Boundary-River Lagan"

Is there anyway of writing a python script to import the csv, read out the reference and extract the corresponding from the GeoJSON?

So far I have only been able to check the CSV is readable with this code

import csv
file = open("C:/test.csv")
csvreader = csv.reader(file)
header = next(csvreader)
print(header)
rows = []
for row in csvreader:
    rows.append(row)
print(rows)

To read the GeoJSON and filter it down to only contain features that have the appropriate reference in the Layer property:

import json

with open('geo.json') as f:
    geo_json = json.load(f)

reference = 2
geo_json['features'] = [
    feature for feature in geo_json['features']
    if int(feature['properties']['Layer'][3:5]) == reference
]

print (geo_json)

This just loads the GeoJSON as regular JSON, using the standard json module. It only selects a single 'reference', 2 in this case, but of course you could easily extend this to include any reference from your source data.

Filtering to all the features matching any reference in the csv (some guesswork required here, because you're not sharing what your csv looks like, but you can work from this):

import json
import csv

with open('references.csv') as f:
    cr = csv.reader(f)
    header = next(cr)
    ref_column = header.index('reference')  # or whatever the column is called
    references = [int(row[ref_column]) for row in cr]

with open('geo.json') as f:
    geo_json = json.load(f)

geo_json['features'] = [
    feature for feature in geo_json['features']
    if int(feature['properties']['Layer'][3:5]) in references
]

print(geo_json)

This assumes each row has a value for the references column and that the csv has a header - but that's all trivial stuff to change.

It works by replacing the contents of geo_json['features'] by a filtered list of features, for which the part of the Layer property after 'Ref' (ie starting at character at index 3, ending before index 5) is equal to the value(s) you're looking for, when converted to an integer.

This does assume that all Layer properties will be of the form Ref<nn>whatever - if that's not the case, you'll need to write a safer check.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM