简体   繁体   中英

Python pandas filtering dataframe column by list of conditions

What is the best way to filter multiple columns in a dataframe?

For example I have this sample from my data:

Index   Tag_Number
666052  A1-1100-XY-001
666382  B1-001-XX-X-XX
666385  **FROM** C1-0001-XXX-100
666620  D1-001-XX-X-HP some text

"Tag_Number" column contains Tags but I need to get rid of texts before or after the tag. The common delimeter is "space". My idea was to divide up the column into multiple and filter each of these columns that start with either of these "A1-, B1-, C1-, D1-", ie if cell does not start with condition it is False, else True and the apply to the table, so that True Values remain as before, but if False, we get empty values. Finally, once the tags are cleaned up, combine them into one single column. I know this might be complicated and I'm really open to any suggestions.

What I have already tried:

Splitted = df.Tag_Number.str.split(" ",expand=True)
Splitted.columns = Splitted.columns.astype(str)
Splitted = Splitted.rename(columns=lambda s: "Tag"+s)
col_names = list(Splitted.columns)
Splitted

I got this Tag_number column splitted into 30 cols, but now I'm struggling to filter out each column. I have created a conditions to filter each column by:

asset = ('A1-','B1-','C1-','D1-')

yet this did not help, I only got an array for the last column instead off all which is expected I guess.

for col in col_names:
    Splitted_filter =  Splitted[col].str.startswith(asset, na = False)
Splitted_filter

Is there a way to filter each column by this 'asset' filter?

Many Thanks

If you want to clean out the text that does not match the asset prefixes, then I think this would work.

sample = pd.read_csv(StringIO("""Index,Tag_Number
666052,A1-1100-XY-001
666382,B1-001-XX-X-XX
666385,**FROM** C1-0001-XXX-100
666620,D1-001-XX-X-HP some text"""))

asset = ('A1-','B1-','C1-','D1-')
def asset_filter(tag_n):
    tags = tag_n.split() # common delimeter is "space"
    tags = [t for t in tags if len([a for a in asset if t.startswith(a)]) >= 1]
    return tags # can " ".join(tags) if str type is desired

sample['Filtered_Tag_Number'] = sample.Tag_Number.astype(str).apply(asset_filter)

See that it is possible to define a custom function asset_filter and the apply it to the column you wish to transform.

Result is this:

    Index   Tag_Number  Filtered_Tag_Number
0   666052  A1-1100-XY-001  [A1-1100-XY-001]
1   666382  B1-001-XX-X-XX  [B1-001-XX-X-XX]
2   666385  **FROM** C1-0001-XXX-100    [C1-0001-XXX-100]
3   666620  D1-001-XX-X-HP some text    [D1-001-XX-X-HP]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM