python point inside polygon (point cloud data)

Question

number of points 100,000,000 (4GB)

I am reading a CSV file and saving the data separate CSV file.

I'm using import csv.reader, which is working fine. But this code I noticed that it takes too much time.

How can I improve the performance of my task?

Please provide me with alternative options.

Performance is the main concern here.

from shapely.geometry import Point, Polygon
import csv
import os

req1 = input("path of the CSV file: ")

file_name = os.path.splitext(req1)
file_name = os.path.split(file_name[0])
path = file_name[0]
file_name = file_name[1]

with open(req1, "r") as f:  
    reader = csv.reader(f)
    next(reader) # skip header

    os.makedirs(path + "/" + file_name + "_output", exist_ok=True)
    outpath = path + "/" + file_name + "_output" + "/"

    coords = [[19.803499,15.2265],[-35.293499,33.7495],
            [-49.6675,33.726501],[-48.022499,20.4715],
            [-36.336498,-4.925],[-32.6105,-45.494499],
            [-10.5275,-38.3815],[-11.93835,-20.8235],
            [26.939501,-18.095501],[19.803499,15.2265]]

    poly = Polygon(coords)
    for row in reader:
        geom = Point(float(row[0]),float(row[1])) # Considering the order of elements that you gave
    
        x = float(row[0])
        y = float(row[1])
        z = float(row[2])
        r = int(row[3])
        g = int(row[4])
        b = int(row[5])
        i = int(row[6])
    
        result = geom.within(poly)
    
        if str(result) == 'True':
          with open(outpath + file_name + "_TRUE.csv", "a", newline = "") as file:
            writeData = ([str(x),',',str(y),',',str(z),',',str(r),',',str(g),',',str(b),',',str(i),('\n')])
            file.writelines(writeData)
            print('True', str(x),str(y),str(z))
        else:
          with open(outpath + file_name + "_FALSE.csv", "a", newline = "") as file:
            writeData = ([str(x),',',str(y),',',str(z),',',str(r),',',str(g),',',str(b),',',str(i),('\n')])
            file.writelines(writeData)
            #print('False', str(x),str(y),str(z))

Answer 1

I used [pd.read_csv] instead of [import csv.reader].

So the performance has been improved a bit.

However, I tried to do Python multiprocessing, but I don't understand it well.

Process result time (1234 sec -> 31 sec)

import pandas as pd
from shapely.geometry import *

data = pd.read_csv("/sample.csv")
poly = Polygon([(-0.7655,-22.758499), (17.0525,-21.657499),   (16.5735,-26.269501), (0.4755,-28.6635)])
cord = data.values.tolist()

for i in cord:
    print(poly.intersects(Point(i[0], i[1])), i)

for example code of Python Multiprocessing Pools enter link description here

import time 
from multiprocessing import Pool
def f(x):
  time.sleep(2) # Wait 2 seconds
  print(x*x)
p = Pool(8)
p.map(f, [1, 2, 3, 4])
p.close()
p.join()

How should I apply this?

python point inside polygon (point cloud data)

Question

number of points 100,000,000 (4GB)

1 answers

solution1
0 2020-06-03 17:23:04

python point inside polygon (point cloud data)

Question

number of points 100,000,000 (4GB)

1 answers

solution1 0 2020-06-03 17:23:04

solution1
0 2020-06-03 17:23:04