I have to make a program which takes a file of DNA sequences and a DNA subsequence from command arguments and find each time the subsequence and how many times it occurs. I'm having troubles with strcmp in line 36 and 42. Currently the way I have it I figured out through GDB that I am comparing the address of the strings and not the actual strings. But if I remove the & I get an error. I'm not sure what is the correct way to go about this is. TIA
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]) {
// place subsequence in string
char *subsequence = argv[2];
// get length of subsequence
int seqLength = strlen(subsequence);
// define file type and open for reading
FILE *inputFile = fopen(argv[1], "r");
// get each line using while loop
char inputLine[200]; // string variable to store each line
int i, j, lineLength, counter = 0, flag = -1;
while (fgets(inputLine, 200, inputFile) != NULL) { // loop through each line
lineLength = strlen(inputLine);
for (i = 0; i < lineLength; i++) { // loop through each char in the line
if (strcmp(&inputLine[i], &subsequence[0]) == 0) {
// if current char matches beginning of sequence loop through
// each of the remaining chars and check them against
// corresponding chars in the sequence
flag = 0;
for (j = i + 1; j - i < seqLength; j++) {
if (strcmp(&inputLine[j], &subsequence[j - i]) != 0) {
flag = 1;
break;
}
}
if (flag == 0) {
counter++;
}
}
}
}
fclose(inputFile);
printf("%s appears %d time(s)\n", subsequence, counter);
return 0;
}
dna.txt:
GGAAGTAGCAGGCCGCATGCTTGGAGGTAAAGTTCATGGTTCCCTGGCCC
input:
./dnaSearch dna.txt GTA
expected output:
GTA appears 2 times
Just do like this:
if (inputLine[i] == subsequence[0]) {
if (inputLine[j] != subsequence[j - i]) {
You do not need library functions to compare single characters.
Your string inputLine
is a pointer to an array of characters and terminated in character '\\0'.
strcmp expects a '\\0' terminated string.
Passing &inputLine[i]
is passing the address of character in position 'i' to the pointer argument and the string will be read until the '\\0' character.
As suggested in the comments, you either use the ordinary operators to compare the strings characters:
if (inputLine[i] == subsequence[0]) {
flag = 0;
for (j = i + 1; j - i < seqLength; j++) {// loop
if (inputLine[j] != subsequence[j - i]) {
flag = 1;
break;
}
}
Or use strncmp , which compares substrings:
if (strncmp(&inputLine[i], subsequence, seqLength) == 0) {
counter++;
}
As others have mentioned, you don't need to call strcmp
the first time since you're only checking a single character. You can just compare them directly:
if (inputLine[i] == subsequence[0]) {
However, there's a must simpler way of doing what you want. Since you're looking for a substring inside of another string, you can use the strstr
function to do that:
while (fgets(inputLine, 200, inputFile) != NULL) { // loop through each line
char *sub = inputLine;
while ((sub = strstr(sub, subsequence) != NULL) {
counter++;
sub++;
}
}
The strstr
function will return a pointer inside the string to search of the substring that was found, or NULL if none was found. In the above code, if the substring is found the counter is incremented, then the substring pointer is moved up to continue the search.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.