What is the best way to filter multiple columns in a dataframe?
For example I have this sample from my data:
Index Tag_Number
666052 A1-1100-XY-001
666382 B1-001-XX-X-XX
666385 **FROM** C1-0001-XXX-100
666620 D1-001-XX-X-HP some text
"Tag_Number" column contains Tags but I need to get rid of texts before or after the tag. The common delimeter is "space". My idea was to divide up the column into multiple and filter each of these columns that start with either of these "A1-, B1-, C1-, D1-", ie if cell does not start with condition it is False, else True and the apply to the table, so that True Values remain as before, but if False, we get empty values. Finally, once the tags are cleaned up, combine them into one single column. I know this might be complicated and I'm really open to any suggestions.
What I have already tried:
Splitted = df.Tag_Number.str.split(" ",expand=True)
Splitted.columns = Splitted.columns.astype(str)
Splitted = Splitted.rename(columns=lambda s: "Tag"+s)
col_names = list(Splitted.columns)
Splitted
I got this Tag_number column splitted into 30 cols, but now I'm struggling to filter out each column. I have created a conditions to filter each column by:
asset = ('A1-','B1-','C1-','D1-')
yet this did not help, I only got an array for the last column instead off all which is expected I guess.
for col in col_names:
Splitted_filter = Splitted[col].str.startswith(asset, na = False)
Splitted_filter
Is there a way to filter each column by this 'asset' filter?
Many Thanks
If you want to clean out the text that does not match the asset prefixes, then I think this would work.
sample = pd.read_csv(StringIO("""Index,Tag_Number
666052,A1-1100-XY-001
666382,B1-001-XX-X-XX
666385,**FROM** C1-0001-XXX-100
666620,D1-001-XX-X-HP some text"""))
asset = ('A1-','B1-','C1-','D1-')
def asset_filter(tag_n):
tags = tag_n.split() # common delimeter is "space"
tags = [t for t in tags if len([a for a in asset if t.startswith(a)]) >= 1]
return tags # can " ".join(tags) if str type is desired
sample['Filtered_Tag_Number'] = sample.Tag_Number.astype(str).apply(asset_filter)
See that it is possible to define a custom function asset_filter
and the apply it to the column you wish to transform.
Result is this:
Index Tag_Number Filtered_Tag_Number
0 666052 A1-1100-XY-001 [A1-1100-XY-001]
1 666382 B1-001-XX-X-XX [B1-001-XX-X-XX]
2 666385 **FROM** C1-0001-XXX-100 [C1-0001-XXX-100]
3 666620 D1-001-XX-X-HP some text [D1-001-XX-X-HP]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.