简体   繁体   中英

Remove string from all items of dataframe with Pandas

I have a dataframe df like so:

dic = {'A':['pap','cdf\nsdc','ert','dgx','kll\nsrw','sdq'],
      'B':[1,4,6,2,5,6],
      'C':['123\n12','34','55','321\n88','09','45']}
df = pd.DataFrame(dic)

My goal is to remove from all columns the string formed by \\n and whatever if precedes it: abc\\ndef ---> def

I was able to achieve my goal by using the following lines of code:

for index,row in df.iterrows():
    df['A'][index]=row['A'].split('\n')[-1]
    df['C'][index]=row['C'].split('\n')[-1]

However I would like to have a smarter and more compact way to achieve such result. Can you suggest a more elegant way than mine (some oneliner maybe)?

Note : column B is float!

You can use vectorised str.split on the cols in question, if you have a more complicated example then you'd need to filter the cols of interest based on dtype:

In [135]:
df['A'] = df['A'].str.split('\n').str[-1]
df['C'] = df['C'].str.split('\n').str[-1]
df

Out[135]:
     A  B   C
0  pap  1  12
1  sdc  4  34
2  ert  6  55
3  dgx  2  88
4  srw  5  09
5  sdq  6  45

A dynamic method:

In [142]:
str_cols = df.select_dtypes([np.object]).columns
str_cols

Out[142]:
Index(['A', 'C'], dtype='object')

In [143]:    
for col in str_cols:
    df[col] = df[col].str.split('\n').str[-1]
​
df

Out[143]:
     A  B   C
0  pap  1  12
1  sdc  4  34
2  ert  6  55
3  dgx  2  88
4  srw  5  09
5  sdq  6  45

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM