简体   繁体   English

比较文件中的字符串行

[英]Comparing lines of strings from a file

I have a file that I have read and I want to read each individual line as if it were a string and compare each line to find certain keywords and if I find those certain keywords, take everything within those keywords and make a new string out of it. 我有一个已阅读的文件,我想读取每一行,就好像它是一个字符串,并比较每一行以查找某些关键字,如果我找到了这些关键字,则将这些关键字中的所有内容都作为一个新字符串它。 It is possible to have more than one line with the same keywords, so I would like to create separate strings... 可能有多个使用相同关键字的行,因此我想创建单独的字符串...

I have some ugly looking code right now that it would be embarrassing to put here, can someone point me in the right direction of how to do this... 我现在有一些看起来很丑陋的代码,将它放在这里会很尴尬,有人能指出我正确方法的方向...

I can put my code here if you like, but I'll have to explain it a lot. 如果您愿意,我可以将代码放在这里,但是我将不得不对其进行很多说明。

AGU UAC AUU GCG CGA UGG GCC UCG AGA CCC GGG UUU AAA GUA GGU GA AGU UAC AUU GCG CGA UGG GCC UCG AGA CCC GGG UUU AAA GUA GGU GA

GUU ACA UUG CGC GAU GGG CCU CGA GAC CCG GGU UUA AAG UAG GUG A GUA ACA UUG CGC GAU GGG CCU CGA GAC CCG GGU UUA AAG UAG GUG A

UUA CAU UGC GCG M GGC CUC GAG ACC CGG GUU UAA AGU AGG UGA UUA CAU UGC GCG M GGC CUC GAG ACC CGG GUU UAA AGU AGG UGA

UGG M AAA UUU GGG CCC AGA GCU CCG GGU AGC GCG UUA CAU UGA UGG M AAA UUU GGG CCC AGA GCU CCG GGU AGC GCG UUA CAU UGA

This would be part of my text file. 这将是我的文本文件的一部分。 I want to find 'M' and then find instances of: 1) UAA, 2) UAG, or 3) UGA. 我想找到“ M”,然后找到以下实例:1)UAA,2)UAG或3)UGA。 And make each one a separate string so that I can compare their lengths. 并将每个单独的字符串,以便我可以比较它们的长度。 I tried using the assignment operator, but it would print out the same string every time. 我尝试使用赋值运算符,但每次都会打印出相同的字符串。

ED. ED。 I guess what I would like to do is just find any instance of 'M', when I do, I would like to make that whole line into a string so I can compare the strings. 我想我想做的就是找到'M'的任何实例,当我这样做时,我想把整行变成一个字符串,以便我可以比较这些字符串。

ifstream code_File ("example.txt");   // open text file.
if (code_File.is_open()) {
    while (code_File.good()) {


        getline(code_File,line);    //get the contents of file 
        cout  << line << endl;     // output contents of file on screen.


            found = line.find_first_of('M', 0);               // Finding start code
        if (found != string::npos) {
           code_Assign.assign(line, int(found), 100);        //assign the line to code_Assign and print out string from where I found the start code 'M'.

            cout << endl << "code_Assign: " << code_Assign << endl << endl;

That seems a good task for grep or sed or awk standard Posix utilities. 对于grepsedawk标准Posix实用程序来说,这似乎是一个好任务。

If you want it (faster) inside a program, consider using standard parsing techiques eg with ANTLR 如果您想要(更快)在程序中,请考虑使用标准的解析技术,例如,使用ANTLR

It's not at all clear to me what you are trying to do, however: 但是,我对您要做什么一无所知。

To read a line and put it in a single string: use std::getline . 读取一行并将其放在单个字符串中:使用std::getline

To find a fixed string in another string, use std::search ; 要在另一个字符串中找到固定的字符串,请使用std::search for more complicated patterns, use boost::regex (or std::regex if you have a C++11 compiler). 对于更复杂的模式,请使用boost::regex (如果您具有C ++ 11编译器,请使用std::regex )。 std::search will return an iterator, and two iterators can be used to construct a new string. std::search将返回一个迭代器,并且两个迭代器可用于构造新字符串。 The regex solutions can “capture”, so you have access to the intervening string directly (or not; a lot depends on how complicated the pattern is, and the regex solution doesn't work when the string you want to capture is in a repeated pattern). regex解决方案可以“捕获”,因此您可以直接访问介入的字符串(或不可以;很大程度上取决于模式的复杂程度,并且当您要捕获的字符串重复出现时, regex解决方案不起作用图案)。 Without more information, however, it is difficult to say more. 但是,如果没有更多信息,很难说更多。

Try specifying your problem precisely; 尝试精确地指定您的问题; I think you'll find that that helps in finding a solution. 我认为您会发现这有助于找到解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM