简体   繁体   中英

Replacing dataframe values after removing/replacing character in rows using Pandas

I have a dataframe df_in like so:

import pandas as pd
import numpy as np
dic_in = {'A':['aa','bb','cc','dd','ee','ff','gg','uu','xx','yy','zz'],
       'B':['200','200','AA200','AA040',np.nan,'500',np.nan,'0700','900','UKK','200'],
       'C':['UNN','400',np.nan,'AA080','AA800','B',np.nan,'400',np.nan,'500','UKK']}

My goal is to investigate column B and C in such a way that:

  • If one of the items contains the following character 'AA' , then the number such part of the string must be removed leaving only the numeric part. ( AA123 ---> 123 ). If a zeros are present before the first non null element, they must be removed ( AA001234 ---> 1234 ).
  • if the quantity is not a number then it must be set to 0.0 ( NaN ---> 0.0 , UNN ----> 0.0 , UKK ---> 0.0 and so on).
  • if an item has leading zeros before, then they must be deleted ( 070--->700 , 00007000--->7000 )
  • If an item has been modified and is non-zero then it must be multiplied by 100 .

The final result should look like this:

   # BEFORE #                     # AFTER #
     A      B      C               A      B      C
0   aa    200    UNN          0   aa    200    0.0
1   bb    200    400          1   bb    200    400
2   cc  AA200    NaN          2   cc  20000    0.0
3   dd  AA040  AA080          3   dd   4000   8000
4   ee    NaN  AA800          4   ee    0.0  80000
5   ff    500      B          5   ff    500    0.0
6   gg    NaN    NaN          6   gg    0.0    0.0
7   uu   0700    400          7   uu    700    400
8   xx    900    NaN          8   xx    900    0.0
9   yy    UKK    500          9   yy    0.0    500
10  zz    200    UKK          10  zz    200    0.0

Do you know a smart and efficient way to achieve such goal?

Notice : all the numbers are in reality string and they should remain as so.

You can use to_numeric for replace not numeric to NaN .

Then extract numbers from strings, remove 0 from left by lstrip and add 00 .

Last combine_first with fillna and assign to columns:

b = pd.to_numeric(df_in.B, errors='coerce')
c = pd.to_numeric(df_in.C, errors='coerce')

b1 = df_in.B.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'
c1 = df_in.C.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'

df_in.B = b.combine_first(b1).fillna(0)
df_in.C = c.combine_first(c1).fillna(0)
print (df_in)
     A      B      C
0   aa    200      0
1   bb    200    400
2   cc  20000      0
3   dd   4000   8000
4   ee      0  80000
5   ff    500      0
6   gg      0      0
7   uu    700    400
8   xx    900      0
9   yy      0    500
10  zz    200      0

A bit modified solution last fillna by string 0.0 convert all values to strings (avoid some strings and some numeric values):

b = pd.to_numeric(df_in.B, errors='coerce')
c = pd.to_numeric(df_in.C, errors='coerce')

b1 = df_in.B.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'
c1 = df_in.C.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'

df_in.B = b.combine_first(b1)
df_in.C = c.combine_first(c1)

df_in = df_in.fillna('0.0').astype(str)
print (df_in)
     A      B      C
0   aa  200.0    0.0
1   bb  200.0  400.0
2   cc  20000    0.0
3   dd   4000   8000
4   ee    0.0  80000
5   ff  500.0    0.0
6   gg    0.0    0.0
7   uu  700.0  400.0
8   xx  900.0    0.0
9   yy    0.0  500.0
10  zz  200.0    0.0

Assuming that all the values in your dataframe are strings (including the NaN s, otherwise you can convert them to an appropriate string with fillna ), you can use the following converter function with applymap on the two columns you want to convert.

df = pd.DataFrame(dic_in, dtype=str).fillna('NAN')

converter = lambda x: str(int(x.replace('AA', ''))*100) if 'AA' in x else str(int(x)) if x.isdigit() else '0.0'

df[['B','C']] = df[['B','C']].applymap(converter)

contents of df :

     A      B      C
0   aa    200    0.0
1   bb    200    400
2   cc  20000    0.0
3   dd   4000   8000
4   ee    0.0  80000
5   ff    500    0.0
6   gg    0.0    0.0
7   uu    700    400
8   xx    900    0.0
9   yy    0.0    500
10  zz    200    0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM