简体   繁体   中英

Reading parts of a file after a specific tag is found in C using fgets

I would like some suggestion on how to read an 'XML' like file in such a way that the program would only read/store elements observed in a node that meets some requirements. I was thinking about using two fgets in the following way:

while (fgets(file_buffer,line_buffer,fp) != NULL)
 {
   if (p_str = (char*) strstr(file_buffer,"<element of interest opening")) )
    {
      //new fgets that starts at fp and runs only until the end of the node
       {
         //read and process
       }
    }
 }

Does this make sense or are there smarter ways of doing this?

Secondly (in my idea), will i have to define a new FILE* (like fr), set fr to fp at the start of the second fgets or can i somehow abuse the original filepointer for that?

使用XML解析器,例如Xmllib2 http://xmlsoft.org/xml.html

Your approach seems isn't bad for the job.

You could read the whole line from the file, then, process it using sprintf, strstr or whatever functions you like. This will save you time and unnecessary overheads with FILE I/O.

As per your second idea, you can use fseek() (Refer: man fseek ) or rewind() (Refer: man rewind ) using the same file pointer fp . You do not need an extra file pointer.

EDIT:

If you could change the tag format to adhere to XML structure you will be able to use libXML2 and such libraries properly.

If that's not possible, then you have to write your own parser. A few pointers:

  1. First extract data from the file into a buffer.The size of the buffer and whether dynamically or statically allocated, will depend on your specs.

  2. Search in the buffer, if non-whitespace character is < or whatever character your tag usually begins with. If not, you can just show an error and exit.

  3. Now follows the tag name, until the first whitespace, or the / or the > character. Store them. Process the =, strings and stuff, as you wish.

  4. If the next non-whitespace character is /, check that it is followed by >(or a similar pattern in your specs to find if a tag ends). If so, you've finished parsing and can return your result. Otherwise, you've got a malformed tag and should exit with an error.

    If the character is >, then you've found the end of the begin tag. Now follows the content. Otherwise what follows is an argument. Parse that, store the result, continue at step 4.

  5. Read the content until you find a < character.

  6. If that character is followed by /, it's the end tag. Check that it is followed by the tag name and >. If yes, return the result , else, throw an error.

  7. If you get here, you've found the beginning of a nested XML. Parse that with this algorithm and then continue at 4 again.

Although, its quite basic idea, I hope it should help you start.

EDIT: If you still want to reference the file as a pointer consider using mmap() .

If you add mmap with a bit of shared memory IPC and adequate memory locking stuff, you could write a parallel processing program, that will process most of your files faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM