简体   繁体   中英

Replace string values in a dataframe based on regex match

I have a python data frame with a column called "accredited This column should have the data of accreditation: "10/10/2011" Or put: "Not accredited" But in most of the cases when isn't accredited the column have some text, like: "This business is not accredited....." I want to replace the whole text and just put: "Not accredited"

Now, I wrote a function:

def notAcredited(string):
    if ('Not' in string or 'not' in string):
        return  'Not Accredited'

I'm implementing the function with a loop, is possible to do this with the ".apply" method?

for i in range(len(df_1000_1500)):
    accreditacion = notAcredited(df_1000_1500['BBBAccreditation'][i])
    if accreditacion == 'Not Accredited':
        df_1000_1500['BBBAccreditation'][i] = accreditacion

You could use the vectorized string method Series.str.replace :

In [72]: df = pd.DataFrame({'accredited': ['10/10/2011', 'is not accredited']})

In [73]: df
Out[73]: 
          accredited
0         10/10/2011
1  is not accredited

In [74]: df['accredited'] = df['accredited'].str.replace(r'(?i).*not.*', 'not accredited')

In [75]: df
Out[75]: 
       accredited
0      10/10/2011
1  not accredited

The first argument passed to replace , eg r'(?i).*not.*' , can be any regex pattern . The second can be any regex replacement value -- the same kind string as would be accepted by re.sub . The (?i) in the regex pattern makes the pattern case-insensitive so not , Not , NOt , NoT , etc. would all match.

Series.str.replace Cythonizes the calls to re.sub (which makes it faster than what you could achieve using apply since apply uses a Python loop.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM