简体   繁体   中英

Filter pandas dataframe by row with regex

I'm sure there might be a simple solution but I'm quite new to Python. I have a Pandas DataFrame with strings and NaN values. In this Dataframe I want to search for special parts of strings. This should be done row by row and the found strings will be written in a list with the same number of rows as the Dataframe (means if the partial string I was looking for could not be matched in the row the entry in the list should be 'none').

I tried: result.loc[result[0].str.contains("hello", na=False)] but this only gives me back the rows where first column contains the word hello...

I was thinking about a for loop searching with regular expressions in every row:

row = df.iloc[0:100]
for item in row:
    row_dict={}
    hello = re.search(r"hello.*", item)
    if hello is None:
       hello = "NaN"

Maybe there is also a simpler way? Thank you!

For the test purpose, I defined the source DataFrame as:

df = pd.DataFrame(data=[
    ['Halo Mike', 'How are you?', np.nan],
    ['Hello John', 'Good morning', 'What a nice day'],
    ['Ello Jack', 'Xyz hello abc', np.nan]])

As you can see, there are 2 elements containing hello and 2 NaN elements. Column names are not essential here, so I didn't define them.

The first step is to convert this DataFrame into a Series , with NaN values filtered out:

ser = pd.Series(data=df.values.flatten()).dropna()

df.values gets the underlying Numpy array, flatten reshapes it to a 1-D array and dropna deletes NaN values.

Then, to get elements of this Series with hello inside (case insensitive), run:

ser[ser.str.contains('hello', case=False)].tolist()

In case of our test data, the result is:

['Hello John', 'Xyz hello abc']

I think, it just what you described in your comment.

For real input data (longer than my example), if you want to limit the search to just 100 initial rows, change df.values to df.head(100).values .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM