简体   繁体   中英

Python: How is a ~ used to exclude data?

In the below code I know that it is returning all records that are outside of the buffer, but I'm confused as to the mechanics of how that is happening.

I see that there is a "~" (aka a bitwise not) being used. From some googling my understanding of ~ is that it returns the inverse of each bit in the input it is passed eg if the bit is a 0 it returns a 1. Is this correct if not could someone please ELI5?

Could someone please explain the actual mechanics of how the below code is returning records that are outside of the "my_union" buffer?

NOTE: hospitals and collisions are just geo dataframes.

coverage = gpd.GeoDataFrame(geometry=hospitals.geometry).buffer(10000) 
my_union = coverage.geometry.unary_union 
outside_range = collisions.loc[~collisions["geometry"].apply(lambda x: my_union.contains(x))]

I'm not sure exactly what you mean by the actual mechanics and it's hard to know for sure without seeing input and output, but I had a go at explaining it below if it is helpful:

All rows from the geometry column in the collisions dataframe that contain any value in my_union will be excluded in the newly created outside_range dataframe.

~ does indeed perform a bitwise not in python. But here it is used to perform a logical not on each element of a list (or rather pandas Series) of booleans. See this answer for an example.

Let's assume the collisions GeoDataFrame contains points, but it will work similarly for other types of geometries. Let me further change the code a bit:

coverage = gpd.GeoDataFrame(geometry=hospitals.geometry).buffer(10000) 
my_union = coverage.geometry.unary_union
within_my_union = collisions["geometry"].apply(lambda x: my_union.contains(x))
outside_range = collisions.loc[~within_my_union]

Then:

  1. my_union is a single (Multi)Polygon.

  2. my_union.contains(x) returns a boolean indicating whether the point x is within the my_union MultiPolygon.

  3. collisions["geometry"] is a pandas Series containing the points.

  4. collisions["geometry"].apply(lambda x: my_union.contains(x)) will run my_union.contains(x) on each of these points. This will result in another pandas Series containing booleans, indicating whether each point is within my_union .

  5. ~ then negates these booleans, so that the Series now indicates whether each point is not within my_union .

  6. collisions.loc[~within_my_union] then selects all the rows of collisions where the entry in ~within_my_union is True , ie all the points that don't lie within my_union .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM