简体   繁体   中英

Removing special characters from fscanf string in C

I'm currently using the following code to scan each word in a text file, put it into a variable then do some manipulations with it before moving onto the next word. This works fine, but I'm trying to remove all characters that don't fall under AZ / az. eg if "he5llo" was entered I want the output to be "hello" . If I can't modify fscanf to do it is there way of doing it to the variable once scanned? Thanks.

while (fscanf(inputFile, "%s", x) == 1)

You can give x to a function like this. First simple version for sake of understanding:

// header needed for isalpha()
#include <ctype.h>

void condense_alpha_str(char *str) {
  int source = 0; // index of copy source
  int dest = 0; // index of copy destination

  // loop until original end of str reached
  while (str[source] != '\0') {
    if (isalpha(str[source])) {
      // keep only chars matching isalpha()
      str[dest] = str[source];
      ++dest;
    }
    ++source; // advance source always, wether char was copied or not
  }
  str[dest] = '\0'; // add new terminating 0 byte, in case string got shorter
}

It will go through the string in-place, copying chars which match isalpha() test, skipping and thus removing those which do not. To understand the code, it's important to realize that C strings are just char arrays, with byte value 0 marking end of the string. Another important detail is, that in C arrays and pointers are in many (not all!) ways same thing, so pointer can be indexed just like array. Also, this simple version will re-write every byte in the string, even when string doesn't actually change.


Then a more full-featured version, which uses filter function passed as parameter, and will only do memory writes if str changes, and returns pointer to str like most library string functions do:

char *condense_str(char *str, int (*filter)(int)) {

  int source = 0; // index of character to copy

  // optimization: skip initial matching chars
  while (filter(str[source])) {
    ++source; 
  }
  // source is now index if first non-matching char or end-of-string

  // optimization: only do condense loop if not at end of str yet
  if (str[source]) { // '\0' is same as false in C

    // start condensing the string from first non-matching char
    int dest = source; // index of copy destination
    do {
      if (filter(str[source])) {
        // keep only chars matching given filter function
        str[dest] = str[source];
        ++dest;
      }
      ++source; // advance source always, wether char was copied or not
    } while (str[source]);
    str[dest] = '\0'; // add terminating 0 byte to match condenced string

  }

  // follow convention of strcpy, strcat etc, and return the string
  return str;
}

Example filter function:

int isNotAlpha(char ch) {
    return !isalpha(ch);
}

Example calls:

char sample[] = "1234abc";
condense_str(sample, isalpha); // use a library function from ctype.h
// note: return value ignored, it's just convenience not needed here
// sample is now "abc"
condense_str(sample, isNotAlpha); // use custom function
// sample is now "", empty

// fscanf code from question, with buffer overrun prevention
char x[100];
while (fscanf(inputFile, "%99s", x) == 1) {
  condense_str(x, isalpha); // x modified in-place
  ...
}

reference:

Read int isalpha ( int c ); manual:

Checks whether c is an alphabetic letter.
Return Value :
A value different from zero (ie, true) if indeed c is an alphabetic letter. Zero (ie, false) otherwise

luser droog answer will work, but in my opinion it is more complicated than necessary.

foi your simple example you could try this:

while (fscanf(inputFile, "%[A-Za-z]", x) == 1) {   // read until find a non alpha character
   fscanf(inputFile, "%*[^A-Za-z]"))  // discard non alpha character and continue
}

你可以使用isalpha()函数检查字符串中包含的所有字符

The scanf family functions won't do this. You'll have to loop over the string and use isalpha to check each character. And "remove" the character with memmove by copying the end of the string forward.

Maybe scanf can do it after all. Under most circumstances, scanf and friends will push back any non-whitespace characters back onto the input stream if they fail to match.

This example uses scanf as a regex filter on the stream. Using the * conversion modifier means there's no storage destination for the negated pattern; it just gets eaten.

#include <stdio.h>
#include <string.h>

int main(){
    enum { BUF_SZ = 80 };   // buffer size in one place
    char buf[BUF_SZ] = "";
    char fmtfmt[] = "%%%d[A-Za-z]";  // format string for the format string
    char fmt[sizeof(fmtfmt + 3)];    // storage for the real format string
    char nfmt[] = "%*[^A-Za-z]";     // negated pattern

    char *p = buf;                               // initialize the pointer
    sprintf(fmt, fmtfmt, BUF_SZ - strlen(buf));  // initialize the format string
    //printf("%s",fmt);
    while( scanf(fmt,p) != EOF                   // scan for format into buffer via pointer
        && scanf(nfmt) != EOF){                  // scan for negated format
        p += strlen(p);                          // adjust pointer
        sprintf(fmt, fmtfmt, BUF_SZ - strlen(buf));   // adjust format string (re-init)
    }
    printf("%s\n",buf);
    return 0;
}

I'm working on a similar project so you're in good hands! Strip the word down into separate parts.

Blank spaces aren't an issue with cin each word You can use a

 if( !isPunct(x) )

Increase the index by 1, and add that new string to a temporary string holder. You can select characters in a string like an array, so finding those non-alpha characters and storing the new string is easy.

 string x = "hell5o"     // loop through until you find a non-alpha & mark that pos
 for( i = 0; i <= pos-1; i++ )
                                    // store the different parts of the string
 string tempLeft = ...    // make loops up to and after the position of non-alpha character
 string tempRight = ... 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM