简体   繁体   中英

How to strip a value from a delimited string

I have a list which i have joined using the following code:

patternCore = '|'.join(list(Broker['prime_broker_id']))

patternCore
'CITI|CS|DB|JPM|ML'

Not sure why i did it that way but I used patternCore to filter multiple strings at the same time. Please note that Broker is a dataFrame

Broker['prime_broker_id']
29    CITI
30      CS
31      DB
32     JPM
33      ML
Name: prime_broker_id, dtype: object

Now I am looking to strip one string. Say I would like to strip 'DB'. How can I do that please?

I tried this

patternCore.strip('DB')
'CITI|CS|DB|JPM|ML'

but nothing is stripped

Thank you

Since Broker is a Pandas dataframe, you can use loc with Boolean indexing, then use pd.Series.tolist :

mask = Broker['prime_broker_id'] != 'DB'
patternCore = '|'.join(Broker.loc[mask, Broker['prime_broker_id']].tolist())

A more generic solution, which works with objects other than Pandas dataframes, is to use a list comprehension with an if condition:

patternCore = '|'.join([x for x in Broker['prime_broker_id'] if x != 'DB'])

Without returning to your input series, using the same idea you can split and re-join:

patternCore = 'CITI|CS|DB|JPM|ML'
patternCore = '|'.join([x for x in patternCore.split('|') if x != 'DB'])

You should expect the last option to be expensive as your algorithm requires reading each character in your input string.

I would like to mention some points which have not been touched upon till now.

I tried this

patternCore.strip('DB')

'CITI|CS|DB|JPM|ML'

but nothing is stripped

The reason why it didn't work was because strip() returns a copy of the string with the leading and trailing characters removed. NOTE:

  1. Not the characters in the occuring somewhere in the mid.
  2. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped

Here you have specified the argument characters as 'DB'. So had your string been something like 'CITI|CS|JPM|ML|DB' , your code would have worked partially(the pipe at the end would remain).

But anyways this is not a good practice. Because it would strip something like 'DCITI|CS|JPM|MLB' to 'CITI|CS|JPM|ML' or 'CITI|CS|JPM|ML|BD' to 'CITI|CS|JPM|ML|' also.

I would like to strip 'DB'.

For this part, @jpp has already given a fine answer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM