简体   繁体   中英

Replace only specific values in df column based on specific value in another column

I have the following datframe:

>>> name   ID     geom                                                geometry_error
0  Lily   1234  POLYGON ((5.351418786 7.471461148, 5.352018786...     overlap
1  Pil    3248  POLYGON ((7.351657486 9.341445548, 1.346718786...     overlap
2  Poli   9734  -                                                     -
0  Lily   1234  POLYGON ((5.351265486 2.471876538, 6.33355018786...   overlap

I want to "edit" the geometry_erro column, with a condition that if geom value is '-' , the geometry error value will be "no geometry", eg:

>>> name   ID     geom                                                geometry_error
0  Lily   1234  POLYGON ((5.351418786 7.471461148, 5.352018786...     overlap
1  Pil    3248  POLYGON ((7.351657486 9.341445548, 1.346718786...     overlap
2  Poli   9734  -                                                     no geometry
0  Lily   1234  POLYGON ((5.351265486 2.471876538, 6.33355018786...   overlap

I have tried to do it with this:

def gg(row):
    if row['geom'] == '-':
        val = 'no geometry generated'   
    return val

df['geometry errors'] = df.apply(gg, axis=1)

>>>UnboundLocalError: local variable 'val' referenced before assignment

I don't understand why I get this error because I have used this varuabke name val in different function in the same script so why now do I get this? and is there maybe better way to do it?

Use this, nice and simple. np.where is doing the test for you.

Code:

import numpy as np

# ...

df['geometry_error'] = np.where(df['geom'] == '-', 
                                'no geometry generated', 
                                df['geometry_error'])

Output:

   name    ID                                               geom  \
0  Lily  1234   POLYGON ((5.351418786 7.471461148, 5.352018786))   
1   Pil  3248   POLYGON ((7.351657486 9.341445548, 1.346718786))   
2  Poli  9734                                                  -   
3  Lily  1234  POLYGON ((5.351265486 2.471876538, 6.333550187...   

          geometry_error  
0                overlap  
1                overlap  
2  no geometry generated  
3                overlap
df[df['geom'] == '-']['geometry_error'] = 'no geometry generated'

A couple of approaches:

  1. Replaces all Null cases of geometery_error with 'no geometry'
df['geometry_error'] = df['geometry_error'].fillna('no geometry')
  1. Find all rows where geom == '-' and set their geometry_error to 'no geometry'
df.loc[df['geom'] == '-', 'geometry_error'] = 'no geometry'

I think your function isn't working because you need to change the indent on the return statement:

def gg(row):
    if row['geom'] == '-':
        val = 'no geometry generated'   
        return val

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM