如何在将web.response流保存到文本文件之前对其进行格式化

Question

I am collecting info using web.response method as a web crawler. 我正在使用web.response方法作为网络爬虫来收集信息。 I collect it to a string and then save it to a text file .Then I search that text file using regular expression. 我将其收集为字符串，然后将其保存到文本文件。然后使用正则表达式搜索该文本文件。 Now the problem is when I search that text file using regular expression I am not able to do it properly because there are many random newlines in the text file. 现在的问题是，当我使用正则表达式搜索该文本文件时，由于文本文件中有许多随机的换行符，所以我无法正确执行该操作。

My question is "Is there a way that the XML (HTML) document I get by web.response method can be formatted properly before saving it to text file , so that there are no random spaces and newlines in the text. I can not even post an unformatted HTML here otherwise I would have done it. 我的问题是“是否有办法将通过web.response方法获得的XML（HTML）文档正确格式化，然后再将其保存到文本文件中，以使文本中没有随机空格和换行符。我什至无法在此处发布未格式化的HTML，否则我会做的。

Answer 1

Internet可能会讨厌您这样做，但是如果您有预定义的条件，则可以转换字符串，例如：

var formattedHtml = html.Replace(Environment.NewLine, "");

Answer 2

This could solve your problem. 这样可以解决您的问题。 But from performance point of view it is a bad solution . 但是从性能的角度来看，这是一个bad solution 。

Perform following actions on the response 对响应执行以下操作

Extract the content between > and < symbols and perform a Trim white space operation 提取>和<符号之间的内容并执行Trim空格操作
Remove all the remaining new lines if present 删除所有剩余的新行（如果有）

Another better solution will be using a better RegEx for searching the string 另一个better solution是使用更好的RegEx搜索字符串

如何在将web.response流保存到文本文件之前对其进行格式化

问题描述

2 个解决方案

解决方案1
0 2013-01-25 16:47:58

解决方案2
0 2013-01-25 18:02:40

如何在将web.response流保存到文本文件之前对其进行格式化

问题描述

2 个解决方案

解决方案1 0 2013-01-25 16:47:58

解决方案2 0 2013-01-25 18:02:40

解决方案1
0 2013-01-25 16:47:58

解决方案2
0 2013-01-25 18:02:40