在python中搜索和分割带有特殊字符的字符串

Question

I'm scraping the timetable on Medmar website and would like to divide the route from: 我正在Medmar网站上刮擦时间表，并希望将路线划分为：

"Pozzuoli » Ischia"
"Pozzuoli - Procida"

to 至

"DEPARTURE PORT": 'Pozzuoli'
"ARRIVAL PORT": 'Ischia'
"DEPARTURE PORT": 'Pozzuoli'
"ARRIVAL PORT": 'Procida'

I've tried splitting the text from the list in two different ways depending if it has the "»" or "-" divider between the two ports. 我试过以两种不同的方式从列表中拆分文本，具体取决于两个端口之间是否有“»”或“-”分隔符。 First I search for "»" or "-" and divide the string accordingly. 首先，我搜索“»”或“-”并相应地分割字符串。 For some reason, I'm getting a re error on search 由于某种原因，我在搜索中遇到了重新错误

re.error: unterminated character set at position 0

Code: 码：

def port_name_regex(port_name, index):
     if re.search("[^\x00-\x7f",port_name):
        port_name = departure_port = re.split("[^\x00-\x7f]",port_name,1)[index].capitalize
        return port_name
     else:
        port_name = re.split("\w",port_name,1)[index].capitalize
        return port_name

medmar_live_departures_table = list(soup.select('li.tratta'))                
for li in medmar_live_departures_table:
    next_li = li.find_next_sibling("li")
    while next_li and next_li.get("data-toggle"):
        if next_li.get("class") == ["corsa-yes"]: 
            medmar_live_departures_data.append({  
            'DEPARTURE PORT': port_name_regex(li.text, 0),
            'ARRIVAL PORT': port_name_regex(li.text, -1),
            'DEPARTURE TIME': next_li.strong.text,
            'FERRY TYPE': "Traghetto",    
            'STATUS': "Active", 
            'OTHER INFO': "Next departure"  
           })
        elif next_li.get("class") == ["corsa-no"]:  
            medmar_live_departures_data.append({
                'DEPARTURE PORT': port_name_regex(li.text, 0),
                'ARRIVAL PORT': port_name_regex(li.text, -1),
                'DEPARTURE TIME' : next_li.strong.text,
                'FERRY TYPE': "Traghetto",  
                'STATUS': "Cancelled" 
            })
            next_li.find_next_sibling("li")
        else:    
            medmar_live_departures_data.append({
                'DEPARTURE PORT': port_name_regex(li.text, 0),
                'ARRIVAL PORT': port_name_regex(li.text, -1),
                'DEPARTURE TIME' : next_li.strong.text,
                'FERRY TYPE': "Traghetto",
                'STATUS': "Active"
            })
        next_li = next_li.find_next_sibling("li")

How do I solve this problem? 我该如何解决这个问题？

Answer 1

I ran into the same error and I solved it by replacing brackets and parentheses like so: 我遇到了同样的错误，我通过替换括号和括号来解决了这个问题，如下所示：

re.sub('\(|\)|\]|\[', '', word.lower())

The error hints at this (unterminated character set) - parens/brackets come in an open and close set, suggesting one is missing. 错误提示此字符（未终止的字符集）-括号/括号包含在打开和关闭集中，表示缺少该字符集。 Check your data for these characters. 检查数据中的这些字符。

在python中搜索和分割带有特殊字符的字符串

问题描述

Code: 码：

1 个解决方案

解决方案1
0 2019-05-03 23:46:49

在python中搜索和分割带有特殊字符的字符串

问题描述

Code: 码：

1 个解决方案

解决方案1 0 2019-05-03 23:46:49

解决方案1
0 2019-05-03 23:46:49