简体   繁体   English

是否可以循环文本框的内容? 如果没有,那么逐行阅读的最佳策略是什么?

[英]Is it possible to loop through a textbox's contents? If not, what's the best strategy to read line-by-line?

I am designing a crawler which will get certain content from a webpage (using either string manipulation or regex). 我正在设计一个爬虫,它将从网页获取某些内容(使用字符串操作或正则表达式)。

I'm able to get the contents of the webpage as a response stream (using the whole httpwebrequest thing), and then for testing/dev purposes, I write the stream content to a multi-line textbox in my ASP.NET webpage. 我能够将网页的内容作为响应流(使用整个httpwebrequest事件),然后为了测试/开发目的,我将流内容写入ASP.NET网页中的多行文本框。

Is it possible for me to loop through the content of the textbox and then say "If textbox1.text.contains (or save the textbox text as a string variable), a certain string then increment a count". 我是否可以遍历文本框的内容然后说“如果textbox1.text.contains(或将文本框文本保存为字符串变量),则某个字符串然后递增计数”。 The problem with the textbox is the string loses formatting, so it's in one long line with no line breaking. 文本框的问题是字符串丢失了格式,所以它在一个长行中没有换行。 Can that be changed? 可以改变吗?

I'd like to do this rather than write the content to a file because writing to a file means I would have to handle all sorts of external issues. 我想这样做而不是将内容写入文件,因为写入文件意味着我将不得不处理各种外部问题。 Of course, if this is the only way, then so be it. 当然,如果这是唯一的方法,那就这样吧。 If I do have to write to a file, then what's the best strategy to loop through each and every line (I'm a little overwhelmed and thus confused as there's many logical and language methods to use), looking for a condition? 如果我必须写一个文件,那么循环遍历每一行的最佳策略是什么(我有点不知所措,因为有许多逻辑和语言方法可以使用),寻找条件? So if I want to look for the string "Hello", in the following text: 因此,如果我想查找字符串“Hello”,请在以下文本中:

My name is xyz I am xyz years of age Hello blah blah blah Bye 我的名字是xyz我是xyz几岁你好等等等等等等

When I reach hello I want to increment an integer variable. 当我到达你好时我想增加一个整数变量。

Thanks, 谢谢,

In my opinion you can split the content of the text in words instead of lines: 在我看来,你可以用文字而不是行来分割文本的内容:

public int CountOccurences(string searchString)
{
    int i;
    var words = txtBox.Text.Split(" ");

    foreach (var s in words)
        if (s.Contains(searchString))
           i++;

    return i;
}

No need to preserve linebreaks, if I understand your purpose correctly. 如果我正确理解你的目的,就不需要保留换行符。

Also note that this will not work for multiple word searches. 另请注意,这不适用于多个单词搜索。

I do it this way in an project, there may be a better way to do it, but this works :) 我这样做是在一个项目中,可能有更好的方法来做到这一点,但这工作:)

string template = txtTemplate.Text;
            string[] lines = template.Split(Environment.NewLine.ToCharArray());

That is a nice creative way. 这是一个很好的创造性方式。

However, I am returning a complex HTML document (for testing purposes, I am using Microsoft's homepage so I get all the HTML). 但是,我正在返回一个复杂的HTML文档(出于测试目的,我正在使用Microsoft的主页,因此我获得了所有HTML)。 Do I not have to specify where I want to break the line? 我不必指定我想要破线的地方吗?

Given your method, if each line is in a collection (Which is a though I had), then I can loop through each member of the collection and look for the condition I want. 给定你的方法,如果每一行都在一个集合中(虽然我有),然后我可以循环遍历集合的每个成员并查找我想要的条件。

If textbox contents were returned with line-breaks representing where word-wrapping occurs, that result will be dependant on style (eg font-size, width of the textbox, etc.) rather than what the user actually entered. 如果返回的文本框内容带有表示换行发生位置的换行符,则该结果将取决于样式(例如字体大小,文本框的宽度等),而不是用户实际输入的内容。 Depending on what you actually want to do, this is almost certainly NOT what you want. 根据你真正想做的事情,这几乎肯定不是你想要的。

If the user physically presses the 'carriage return / enter' key, the relevant character(s) will be included in the string. 如果用户按下“回车/输入”键,相关字符将包含在字符串中。

Why do you need to have a textbox at all? 为什么你需要一个文本框? Your real goal is to increment a counter based on the text that the crawler finds. 您的真正目标是根据抓取工具找到的文本增加计数器。 You can accomplish this just by examining the stream itself: 您只需检查流本身即可完成此任务:

  Stream response = webRequest.GetResponse().GetResponseStream();
  StreamReader reader = new StreamReader(response);
  String line = null;

  while ( line = reader.ReadLine() ) 
  {
    if (line.Contains("hello"))
    {
      // increment your counter
    }
  }

Extending this if line contains more than one instance of the string in question is left as an exercise to the reader :). 如果行包含多个有问题的字符串实例,则将此扩展为读者:)。

You can still write the contents to a text box if you want to examine them manually, but attempting to iterate over the lines of the text box is simply obscuring the problem. 如果要手动检查内容,仍然可以将内容写入文本框,但尝试迭代文本框的行只是模糊了问题。

The textbox was to show the contents of the html page. 文本框用于显示html页面的内容。 This is for my use so if I am running the webpage without any breakpoints, I can see if the stream is visually being returned. 这是我的使用,所以如果我在没有任何断点的情况下运行网页,我可以看到是否在视觉上返回了流。 Also, it's a client requirement so they can see what is happening at every step. 此外,这是客户要求,因此他们可以看到每一步都发生了什么。 Not really worth the extra lines of code but it's trivial really, and the last of my concerns. 不值得额外的代码行,但它真的是微不足道的,也是我最后的担忧。

The code in the while loop I don't understand. while循环中的代码我不明白。 Where is the instruction to go to the next line? 去下一行的指示在哪里? This is my weakness with the readline method, as I seldom see the logic that forces the next line to be read. 这是我使用readline方法的弱点,因为我很少看到强制下一行被读取的逻辑。

I do need to store the line as a string var where a certain string is found, as I will need to do some operations (et a certain part of the string) so I've always been looking at readline. 我确实需要将行存储为字符串var,其中找到某个字符串,因为我需要执行一些操作(以及字符串的某个部分),所以我一直在查看readline。

Thanks! 谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM