简体   繁体   English

fscanf()仅读取没有标点符号的字符

[英]fscanf() to read in only characters with no punctuation marks

I would like to read in some words (in this example first 20) from a text file (name specified as an argument in the command line). 我想从文本文件(在命令行中指定为自变量的名称)中读取一些单词(在本示例中为前20个)。 As the below code runs, I found it takes punctuation marks with characters too. 在下面的代码运行时,我发现它也带有字符的标点符号。

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char * argv[]){
int wordCap = 20;
int wordc = 0;
char** ptr = (char **) calloc (wordCap, sizeof(char*));
FILE *myFile = fopen (argv[1], "r");
if (!myFile) return 1;
rewind(myFile);
for (wordc = 0; wordc < wordCap; wordc++){
  ptr[wordc] = (char *)malloc(30 * sizeof( char ) );
  fscanf(myFile, "%s", ptr[wordc]);
  int length = strlen(ptr[wordc]);
  ptr[wordc][length] = '\0';
   printf("word[%d] is %s\n", wordc,  ptr[wordc]);
}
 return 0;
}

As I pass through the sentence: "Once when a Lion was asleep a little Mouse began running up and down upon him;", "him" will be followed with a semicolon. 当我通过这句话时:“一旦狮子睡着了,一只小老鼠就开始在他身上奔跑;”“他”后面将跟一个分号。

I changed the fscanf() to be fscanf(myFile, "[az | AZ]", ptr[wordc]); 我将fscanf()更改为fscanf(myFile, "[az | AZ]", ptr[wordc]); , it takes the whole sentence as a word. ,它将整个句子作为一个单词。

How can I change it to make the correct output? 如何更改它以产生正确的输出?

You could accept the semi-colon and then remove it latter, like so: 您可以接受分号,然后将其删除,如下所示:

after you've stored the word in ptr[wordc]: 在将单词存储在ptr [wordc]中之后:

i = 0;
while (i < strlen(ptr[wordc]))
{
    if (strchr(".;,!?", ptr[wordc][i])) //add any char you wanna delete to that string
        memmove(&ptr[wordc][i], &ptr[wordc][i + 1], strlen(ptr[wordc]) - i);
    else
        i++;
}
if (strlen(ptr[wordc]) > 0) // to not print any word that was just punctuations beforehand
    printf("word[%d] is %s\n", wordc,  ptr[wordc]);

I haven't tested this code, so there might be a typo or something in it. 我尚未测试此代码,因此其中可能有错别字或其他内容。

Alternatively you could switch 或者,您可以切换

fscanf(myFile, "%s", ptr[wordc]);

for 对于

fscanf(myFile, "%29[a-zA-Z]%*[^a-zA-Z]", ptr[wordc]);

to capture only letters. 只捕获字母。 the 29 limits word size so you don't get overflow since you're allocating size for only 30 chars 29个限制字的大小,因此您不会溢出,因为您只分配30个字符的大小

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM