[英]Forward slash "/" in string converted to "/", is that platform independent behaviour?
I have a Python script that reads an html into lines, and then filters out the relevant lines before saving those lines back as html file.我有一个 Python 脚本,它将 html 读入行,然后过滤掉相关行,然后将这些行保存为 html 文件。 I had some problems till I figured out that a
/
in the page text was being converted to /
when saved as a string.我遇到了一些问题,直到我发现页面文本中的
/
在保存为字符串时被转换为/
。
The source html that I'm parsing through has the following line:我正在解析的源 html 具有以下行:
<h3 style="text-align:left">SYDNEY/KINGSFORD SMITH (YSSY)</h3>
which when passing through the file.readlines() would come out as:当通过 file.readlines() 时会出现:
<h3 style='text-align:left'>SYDNEY/BANKSTOWN (YSBK)</h3>
which then trips up the beautifulsoup because that then gets confused with the "&" symbol tripping up all subsequent tags.然后会导致 beautifulsoup 跳闸,因为这会与“&”符号混淆所有后续标签。
What I'm interested in is to know if this replacement value "/" is platform independent or not?我感兴趣的是知道这个替换值“/”是否独立于平台?
It's not hard to run a .replace
prior to saving each string, avoiding the issue now that I'm coding and testing on windows, but will it still work if I deploy my script on a linux server?在保存每个字符串之前运行
.replace
并不难,避免了现在我正在 windows 上进行编码和测试的问题,但是如果我在 linux 服务器上部署我的脚本,它仍然可以工作吗?
Here's what I have now, which works fine when run under windows:这是我现在所拥有的,在 windows 下运行时运行良好:
def getHTML(self,html_source):
with open(html_source, 'r') as file:
source_lines = file.readlines()
relevant = False
relevant_lines = []
for line in source_lines:
if "</table>" in line:
relevant = False
if self.airport in line:
relevant = True
if relevant:
line = line.replace("/", " ")
relevant_lines.append(line)
relevant_lines.append("</table>")
filename = f"{html_source[:-5]}_{self.airport}.html"
with open(filename, 'w') as file:
file.writelines(relevant_lines)
with open(filename, 'r') as file:
relevant_html = file.read()
return relevant_html
Can anyone tell me, without having to install a virtual machine with linux, if this will work cross-platform?谁能告诉我,无需安装带有 linux 的虚拟机,这是否可以跨平台工作? I tried to look for documentation on this, but all I could find was about ways to explicitly escape a
/
when entering a string, nothing documenting how to deal with /
or other invalid characters being read when reading a source file into strings.我试图寻找这方面的文档,但我能找到的只是关于在输入字符串时显式转义
/
的方法,没有记录如何处理/
或在将源文件读入字符串时读取的其他无效字符。
It should be OK everywhere, it is a standard.应该到处都可以,这是一个标准。 See https://www.w3schools.com/charsets/ref_html_ascii.asp
见https://www.w3schools.com/charsets/ref_html_ascii.asp
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.