[英]Removing characters from the dataframe python
我想从表中的一列替换 str 。 示例:我想从 df 列中删除 b"SET 和 b"MULTISET。 如何做到这一点。 我需要 output 详细信息如下,
columns = ['cust_id', 'cust_name', 'vehicle', 'details', 'bill']
df = pd.DataFrame(data=t, columns=columns)
df
cust_id cust_name vehicle details bill
0 101 b"SET{'Tom','C'}" b"MULTISET{'Toyota','Cruiser'}" b"ROW('Street 1','12345678','NewYork, US')" 1200.00
1 102 b"SET{'Rachel','Green'}" b"MULTISET{'Ford','se'}" b"ROW('Street 2','12344444','Florida, US')" 2400.00
2 103 b"SET{'Chandler','Bing'}" b"MULTISET{'Dodge','mpv'}" b"ROW('Street 1','12345555','Georgia, US')" 601.10
所需 Output:
cust_id cust_name vehicle details bill
0 101 {'Tom','C'} {'Toyota','Cruiser'} ('Street 1','12345678','NewYork, US') 1200.00
1 102 {'Rachel','Green'} {'Ford','se'} ('Street 2','12344444','Florida, US') 2400.00
2 103 {'Chandler','Bing'} {'Dodge','mpv'} ('Street 1','12345555','Georgia, US') 601.10
这是一个可能的解决方案,
columns = ['cust_name', 'vehicle', 'details']
{}
或()
之间的值regex_ = r"([\{|\(].*[\}|\)])"
str.decode('ascii')
是将列值从byte
转换为string
。columns = ['cust_name', 'vehicle', 'details']
regex_ = r"([\{|\(].*[\}|\)])"
for col in columns:
df[col] = df[col].str.decode('ascii').str.extract(regex_)
cust_id cust_name ... details bill
0 101 {'Tom','C'} ... ('Street 1','12345678','NewYork, US') 1200.0
1 102 {'Rachel','Green'} ... ('Street 2','12344444','Florida, US') 2400.0
2 103 {'Chandler','Bing'} ... ('Street 1','12345555','Georgia, US') 601.1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.