简体   繁体   English

从 dataframe python 中删除字符

[英]Removing characters from the dataframe python

I want to replace a str from one of the column from the table.我想从表中的一列替换 str 。 example: i want to remove b"SET and b"MULTISET from the df column.示例:我想从 df 列中删除 b"SET 和 b"MULTISET。 how to achieve that.如何做到这一点。 I need output like Details are as below,我需要 output 详细信息如下,

columns = ['cust_id', 'cust_name', 'vehicle', 'details', 'bill'] 
df = pd.DataFrame(data=t, columns=columns)
df
    
        cust_id     cust_name                   vehicle                             details                                                 bill
0   101         b"SET{'Tom','C'}"           b"MULTISET{'Toyota','Cruiser'}"     b"ROW('Street 1','12345678','NewYork, US')"             1200.00
1   102         b"SET{'Rachel','Green'}"    b"MULTISET{'Ford','se'}"            b"ROW('Street 2','12344444','Florida, US')"             2400.00
2   103         b"SET{'Chandler','Bing'}"   b"MULTISET{'Dodge','mpv'}"          b"ROW('Street 1','12345555','Georgia, US')"             601.10 

Required Output:所需 Output:

    cust_id     cust_name                   vehicle                             details                                         bill
0   101         {'Tom','C'}                 {'Toyota','Cruiser'}            ('Street 1','12345678','NewYork, US')               1200.00
1   102         {'Rachel','Green'}          {'Ford','se'}                   ('Street 2','12344444','Florida, US')               2400.00
2   103         {'Chandler','Bing'}         {'Dodge','mpv'}                 ('Street 1','12345555','Georgia, US')               601.10 

Here is a possible solution,这是一个可能的解决方案,

  • Let's define column of interest,让我们定义感兴趣的列,
columns = ['cust_name', 'vehicle', 'details']
  • Use regex expression to extract values between {} or ()使用正则表达式提取{}()之间的值
regex_ = r"([\{|\(].*[\}|\)])"
  • Putting together, str.decode('ascii') is to convert columns values from byte to string .总而言之, str.decode('ascii')是将列值从byte转换为string
columns = ['cust_name', 'vehicle', 'details']

regex_ = r"([\{|\(].*[\}|\)])"

for col in columns:
    df[col] = df[col].str.decode('ascii').str.extract(regex_)

   cust_id            cust_name  ...                                details    bill
0      101          {'Tom','C'}  ...  ('Street 1','12345678','NewYork, US')  1200.0
1      102   {'Rachel','Green'}  ...  ('Street 2','12344444','Florida, US')  2400.0
2      103  {'Chandler','Bing'}  ...  ('Street 1','12345555','Georgia, US')   601.1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM