[英]Python re.sub remove starting from specific index?
我有一个脚本,该脚本通过删除仅名义上的元素来获取数据并从本质上对其进行清理。 我想知道如何调整remove='^[0-9.]+$'
以从特定索引开始,比如说索引 4? 现在它扫描了每个索引。
def split_lines(fp, delimiter, remove='^[0-9.]+$'):
with open(fp, mode="r", encoding="utf-8") as file:
clean_list = []
for line in file:
tokens = line.split(delimiter)
tokens = [re.sub(remove, "", token) for token in tokens]
clean_list.append(list(filter(lambda e: e.strip(), tokens)))
txt_edit.delete("1.0", tk.END)
unique_data = {}
for item in clean_list:
key = str(item)
if not unique_data.get(key):
unique_data[key] = 1, item
else:
unique_data[key] = (unique_data[key][0] + 1), item
for k, v in unique_data.items():
txt_edit.insert(tk.END, f"{v[1]}x {v[0]} \n")
最简单的方法可能是只对字符串的一部分运行清理:
my_string = "wowMuchCool"
part_1, part2 = my_string[:4], mystring[4:] # split at the 4th char, so "wowM" and "uchCool"
part_2 = clean_function(part_2) # Let's say it removes "o" here, part_2 = "uchCl"
my_string_cleaned = part_1 + part_2 # "wowMuchCl", the first "o" is untouched
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.