简体   繁体   中英

Pandas Dataframe - Replace all cell value subject to regex condition

I am solving a question where in a column there are few values which are repetitions of "." , eg-"....." or"............." .

So I want to use the .loc function to replace all such values by np.NaN . I want to use the regex function to identify any cell value having at least one repetition of "." .

So i used the below code in Python -

energy.loc[bool(re.match('.+', energy['Energy Supply'])),'Energy Supply']=np.NaN

Please help

您需要如下转义点,因为点代表任何字符,加号是一个或多个,试一试:)

re.match('\\.+', energy['Energy Supply']))

You could make use of str.contains to check for a dot, and escape it to match it literally.

You don't need the + quantifier because it means 1 or more. So matching a single dot is sufficient.

import pandas as pd
import numpy as np

data = [
    "test",
    "test.",
    "..."
]
energy = pd.DataFrame(data, columns=["Energy Supply"])
energy.loc[energy['Energy Supply'].str.contains(r'\.'), 'Energy Supply'] = np.NaN
print(energy)

Output

  Energy Supply
0          test
1           NaN
2           NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM