简体   繁体   English

如何将字符串(由用户输入)与文件中一行的第一个单词进行比较?

[英]How would I compare a string (entered by the user) to the first word of a line in a file?

I am really struggling to understand how character arrays work in C. This seems like something that should be really simple, but I do not know what function to use, or how to use it. 我真的很难理解字符数组如何在C中工作。这看起来应该很简单,但是我不知道要使用什么函数,或者如何使用它。

I want the user to enter a string, and I want to iterate through a text file, comparing this string to the first word of each line in the file. 我希望用户输入一个字符串,并且要遍历文本文件,将该字符串与文件中每行的第一个单词进行比较。

By "word" here, I mean substring that consists of characters that aren't blanks. 这里的“单词”是指由非空格字符组成的子字符串。

Help is greatly appreciated! 非常感谢您的帮助!

Edit: To be more clear, I want to take a single input and search for it in a database of the form of a text file. 编辑:更清楚地说,我想输入一个并在文本文件形式的数据库中搜索它。 I know that if it is in the database, it will be the first word of a line, since that is how to database is formatted. 我知道如果它在数据库中,它将是一行的第一个字,因为这是如何格式化数据库的。 I suppose I COULD iterate through every single word of the database, but this seems less efficient. 我想我可以遍历数据库的每个单词,但这似乎效率较低。

After finding the input in the database, I need to access the two words that follow it (on the same line) to achieve the program's ultimate goal (which is computational in nature) 在数据库中找到输入之后,我需要访问它之后的两个单词(在同一行上)以实现程序的最终目标(本质上是计算性的)

Here is some code that will do what you are asking. 这是一些可以满足您要求的代码。 I think it will help you understand how string functions work a little better. 我认为它将帮助您更好地了解字符串函数的工作方式。 Note - I did not make many assumptions about how well conditioned the input and text file are, so there is a fair bit of code for removing whitespace from the input, and for checking that the match is truly "the first word", and not "the first part of the first word". 注意-对于输入和文本文件的条件调整情况,我并没有做很多假设,因此有很多代码可以从输入中删除空格,并检查匹配是否确实是“第一个单词”,而不是“第一个单词的第一部分”。 So this code will not match the input "hello" to the line "helloworld 123 234" but it will match to "hello world 123 234". 因此,此代码不会将输入“ hello”与“ helloworld 123 234”行匹配,但将与“ helloworld 123 234”行匹配。 Note also that it is currently case sensitive. 还要注意,它目前区分大小写。

#include <stdio.h>
#include <string.h>

int main(void) {
  char buf[100];     // declare space for the input string
  FILE *fp;          // pointer to the text file
  char fileBuf[256]; // space to keep a line from the file
  int ii, ll;

  printf("give a word to check:\n");
  fgets(buf, 100, stdin);    // fgets prevents you reading in a string longer than buffer
  printf("you entered: %s\n", buf);  // check we read correctly

  // see (for debug) if there are any odd characters:
  printf("In hex, that is ");
  ll = strlen(buf);
  for(ii = 0; ii < ll; ii++) printf("%2X ", buf[ii]);
  printf("\n");

  // probably see a carriage return - depends on OS. Get rid of it!
  // note I could have used the result that ii is strlen(but) but 
  // that makes the code harder to understand
  for(ii = strlen(buf) - 1; ii >=0; ii--) {
    if (isspace(buf[ii])) buf[ii]='\0';
  }

  // open the file:
  if((fp=fopen("myFile.txt", "r"))==NULL) {
    printf("cannot open file!\n");
    return 0;
  }

  while( fgets(fileBuf, 256, fp) ) {   // read in one line at a time until eof
    printf("line read: %s", fileBuf);  // show we read it correctly
  // find whitespace: we need to keep only the first word.
    ii = 0;
    while(!isspace(fileBuf[ii]) && ii < 255) ii++;
  // now compare input string with first word from input file:
  if (strlen(buf)==ii && strstr(fileBuf, buf) == fileBuf) {
        printf("found a matching line: %s\n", fileBuf);
        break;
    }
  }
  // when you get here, fileBuf will contain the line you are interested in
  // the second and third word of the line are what you are really after.
}

I think what you need is fseek() . 我认为您需要的是fseek()

1) Pre-process the database file as follows. 1)如下处理数据库文件。 Find out the positions of all the '\\n' (carriage returns), and store them in array, say a , so that you know that i th line starts at a[i] th character from the beginning of the file. 找出所有'\\ n'的位置(回车),并将它们存储在数组中,例如a ,以便您知道第i行从文件开头开始于第a[i]个字符。

2) fseek() is a library function in stdio.h, and works as given here . 2) fseek()是stdio.h中的一个库函数,并且按此处给出的方式工作。 So, when you need to process an input string, just start from the start of the file, and check the first word, only at the stored positions in the array a . 因此,当您需要处理输入字符串时,只需从文件的开头开始,然后仅在数组a存储的位置检查第一个单词。 To do that: 要做到这一点:

fseek(inFile , a[i] , SEEK_SET);

and then 接着

fscanf(inFile, "%s %s %s", yourFirstWordHere, secondWord, thirdWord);

for checking the i th line. 用于检查第i行。 Or, more efficiently, you could use: 或者,更有效地,您可以使用:

fseek ( inFile , a[i]-a[i-1] , SEEK_CURR )

Explanation: What fseek() does is, it sets the read/write position indicator associated with the file at the desired position. 说明:fseek()的作用是,将与文件关联的读写位置指示器设置在所需位置。 So, if you know at which point you need to read or write, you can just go there and read directly or write directly. 因此,如果您知道什么时候需要读或写,则可以直接去那里阅读或直接写。 This way, you won't need to read whole lines just to get first three words. 这样,您无需阅读整行代码就可以获取前三个单词。

Your recent update states that the file is really a database, in which you are looking for a word. 您最近的更新表明该文件实际上是一个数据库,您正在其中搜索单词。 This is very important. 这个非常重要。

If you have enough memory to hold the whole database, you should do just that (read the whole database and arrange it for efficient searching), so you should probably not ask about searching in a file. 如果您有足够的内存来容纳整个数据库,则应该执行此操作(读取整个数据库并安排进行有效搜索),因此您可能应该询问在文件中进行搜索。

Good database designs involve data structures like trie and hash table . 好的数据库设计涉及数据结构,例如triehash表 But for a start, you could use the most basic improvement of the database - holding the words in alphabetical order (use the somewhat tricky qsort function to achieve that). 但是首先,您可以使用数据库的最基本的改进-以字母顺序保留单词(使用有些棘手的qsort函数来实现)。

struct Database
{
    size_t count;
    struct Entry // not sure about C syntax here; I usually code in C++; sorry
    {
        char *word;
        char *explanation;
    } *entries;
};

char *find_explanation_of_word(struct Database* db, char *word)
{
    for (size_t i = 0; i < db->count; i++)
    {
        int result = strcmp(db->entries[i].word, word);
        if (result == 0)
            return db->entries[i].explanation;
        else if (result > 0)
            break; // if the database is sorted, this means word is not found
    }
    return NULL; // not found
}

If your database is too big to hold in memory, you should use a trie that holds just the beginnings of the words in the database; 如果您的数据库太大而无法容纳在内存中,则应使用一个Trie来容纳数据库中单词的开头; for each beginning of a word, have a file offset at which to start scanning the file. 对于单词的每个开头,都有一个文件偏移量,从该位置开始扫描文件。

char* find_explanation_in_file(FILE *f, long offset, char *word)
{
    fseek(f, offset, SEEK_SET);
    char line[100]; // 100 should be greater than max line in file
    while (line, sizeof(line), f)
    {
        char *word_in_file = strtok(line, " ");
        char *explanation = strtok(NULL, "");
        int result = strcmp(word_in_file, word);
        if (result == 0)
            return explanation;
        else if (result > 0)
            break;
    }
    return NULL; // not found
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM