[英]Removing characters from the dataframe python
我想從表中的一列替換 str 。 示例:我想從 df 列中刪除 b"SET 和 b"MULTISET。 如何做到這一點。 我需要 output 詳細信息如下,
columns = ['cust_id', 'cust_name', 'vehicle', 'details', 'bill']
df = pd.DataFrame(data=t, columns=columns)
df
cust_id cust_name vehicle details bill
0 101 b"SET{'Tom','C'}" b"MULTISET{'Toyota','Cruiser'}" b"ROW('Street 1','12345678','NewYork, US')" 1200.00
1 102 b"SET{'Rachel','Green'}" b"MULTISET{'Ford','se'}" b"ROW('Street 2','12344444','Florida, US')" 2400.00
2 103 b"SET{'Chandler','Bing'}" b"MULTISET{'Dodge','mpv'}" b"ROW('Street 1','12345555','Georgia, US')" 601.10
所需 Output:
cust_id cust_name vehicle details bill
0 101 {'Tom','C'} {'Toyota','Cruiser'} ('Street 1','12345678','NewYork, US') 1200.00
1 102 {'Rachel','Green'} {'Ford','se'} ('Street 2','12344444','Florida, US') 2400.00
2 103 {'Chandler','Bing'} {'Dodge','mpv'} ('Street 1','12345555','Georgia, US') 601.10
這是一個可能的解決方案,
columns = ['cust_name', 'vehicle', 'details']
{}
或()
之間的值regex_ = r"([\{|\(].*[\}|\)])"
str.decode('ascii')
是將列值從byte
轉換為string
。columns = ['cust_name', 'vehicle', 'details']
regex_ = r"([\{|\(].*[\}|\)])"
for col in columns:
df[col] = df[col].str.decode('ascii').str.extract(regex_)
cust_id cust_name ... details bill
0 101 {'Tom','C'} ... ('Street 1','12345678','NewYork, US') 1200.0
1 102 {'Rachel','Green'} ... ('Street 2','12344444','Florida, US') 2400.0
2 103 {'Chandler','Bing'} ... ('Street 1','12345555','Georgia, US') 601.1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.