简体   繁体   中英

Select rows following specific patterns in a pandas dataframe

I have a csv file that I read into a pandas dataframe. There are two specific columns, 'Notes' and 'ActivityType', that I want to use as criteria. If the 'Notes' column contains a string value of 'Morning exercise' or ' Morning workout' and/or the 'ActivityType' column contains any string value (majority of cells are Null and I don't want Null values counted) then make a new column 'MorningExercise' and insert a 1 if either conditions are met or a 0 if neither are.

I have been using the code below to create a new column and insert a 1 or 0 if the text conditions are met in the 'Notes' column, but I have not figured out how to include a 1 if the 'ActivityType' column contains any string value.

JoinedTables['MorningExercise'] = JoinedTables['Notes'].str.contains(('Morning workout' or 'Morning exercise'), case=False, na=False).astype(int)

For the 'ActivityType' column, I would think to use the pd.notnull() function as the critieria.

I really just need a way in python to see if either criteria are met in a row and if so, input a 1 or 0 in a new column.

You'll need to fashion a regex pattern to use with str.contains :

regex = r'Morning\s*(?:workout|exercise)'
JoinedTables['MorningExercise'] = \
       JoinedTables['Notes'].str.contains(regex, case=False, na=False).astype(int)

Details

Morning       # match "Morning"
\s*           # 0 or more whitespace chars
(?:           # open non-capturing group
workout       # match "workout" 
|             # OR operator
exercise      # match "exercise"
)

The pattern will look for Morning followed by either workout or exercise .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM