简体   繁体   English

在字符串中搜索所有出现的 substring

[英]Search for all occurrences of a substring within a string

This program must search for all occurrences of string 2 in string 1.该程序必须搜索字符串 1 中所有出现的字符串 2。
It works fine with all the strings i have tried except with它适用于我尝试过的所有字符串,除了
s1="Ciao Cia Cio Ociao ciao Ocio CiCiao CieCiaCiu CiAo eeCCia"
s2="Cia"
in this case the correct result would be: 0 5 31 39 54在这种情况下,正确的结果是: 0 5 31 39 54
instead, it prints 0 5 39 .相反,它打印0 5 39
I don't understand why, the operation seems the same as我不明白为什么,操作似乎与
s1="Sette scettici sceicchi sciocchi con la sciatica a Shanghai"
s2="icchi"
with which the program works correctly.程序正常工作。
I can't find the error!我找不到错误!
The code:编码:

#include <stdio.h>

void main()
{
    #define MAX_LEN 100

        // Input
    char s1[] = "Ciao Cia Cio Ociao ciao Ocio CiCiao CieCiaCiu CiAo eeCCia";
    unsigned int lengthS1 = sizeof(s1) - 1;
    char s2[] = "Cia";
    unsigned int lengthS2 = sizeof(s2) - 1;
    // Output
    unsigned int positions[MAX_LEN];
    unsigned int positionsLen;

    // Blocco assembler
    __asm
    {
        MOV ECX, 0
        MOV EAX, 0
        DEC lenghtS1
        DEC lengthS2
        MOV EBX, lengthS1
        CMP EBX, 0
        JZ fine
        MOV positionsLen, 0
        XOR EBX, EBX
        XOR EDX, EDX




    uno: CMP ECX, lengthS1
    JG fine
    CMP EAX, lengthS2
    JNG restart
    XOR EAX, EAX


    restart : MOV BH, s1[ECX]
    CMP BH, s2[EAX]
    JE due
    JNE tre


    due : XOR EBX, EBX
    CMP EAX, 0
    JNE duedue
    MOV positions[EDX * 4], ECX
    INC ECX
    INC EAX
    JMP uno


    duedue : CMP EAX, lengthS2
    JNE duetre
    INC ECX
    INC EDX
    INC positionsLen
    XOR EAX, EAX
    JMP uno


    duetre : INC EAX
    INC ECX
    JMP uno


    tre : XOR EBX, EBX
    XOR EAX, EAX
    INC ECX
    JMP uno




fine:
    }

    // Stampa su video
    {
        unsigned int i;
        for (i = 0; i < positionsLen; i++)
            printf("Sottostringa in posizione=%d\n", positions[i]);
    }
}

please,help.请帮忙。

The trickier programming gets, the more systematic and thoughtful your approach should be.编程越复杂,你的方法就应该越系统和周到。 If you programmed x86 assembly for a decade, you will be able to skip a few of the steps I line out below.如果您对 x86 组件编程了十年,您将能够跳过我在下面列出的一些步骤。 But especially if you are a beginner, you are well advised to not expect from yourself, that you can just hack in assembly with confidence and without safety nets.但特别是如果您是初学者,建议您不要期望自己,您可以自信地在没有安全网的情况下进行组装。

The code below is just a best guess (I did not compile or run or debug the C-code).下面的代码只是一个最佳猜测(我没有编译、运行或调试 C 代码)。 It is there, to give the idea.它在那里,给出这个想法。

  • Make a plan for your implementation为您的实施制定计划
    So you will have 2 nested loops, comparing the characters and then collecting matches.因此,您将有 2 个嵌套循环,比较字符然后收集匹配项。
  • Implement the "assembly" in low level C, which already resembles the end product.在低级 C 中实现“组装”,这已经类似于最终产品。
    C is nearly an assembly language itself... C 本身几乎是一种汇编语言......
  • Write yourself tests, debug and analyze your "pseudo assembly" C-version.自己编写测试、调试和分析您的“伪汇编”C 版本。
  • Translate the C lines step by step by assembly lines, "promoting" the c-lines to comments.通过流水线逐步翻译 C 行,将 c 行“提升”为注释。

This is my first shot at doing that - the initial c-version, which might or might not work.这是我第一次尝试这样做——最初的 c 版本,它可能会或可能不会起作用。 But it is still faster and easier to write (with the assembly code in mind).但它仍然更快更容易编写(考虑到汇编代码)。 And easier to debug and step through.并且更容易调试和单步执行。 Once this works, it is time to "translate".一旦成功,就该“翻译”了。

#include <stdint.h>
#include <stddef.h>
#include <string.h>

size_t substring_positions(const char *s, const char* sub_string, size_t* positions, size_t positions_capacity) {
  size_t positions_index = 0;
  size_t i = 0;
  size_t j = 0;
  size_t i_max = strlen(s) - strlen(sub_string);
  size_t j_max = strlen(sub_string) - 1;

 loop0:
  if (i > i_max)
    goto end;
  j = 0;
 loop1:
  if (j == j_max)
    goto match;
  if (s[i+j] == sub_string[j])
    goto go_on;
  i++;
  goto loop0;
 go_on:
  j++;
  goto loop1;
 match:
  positions[positions_index] = i;
  positions_index++;
  if (positions_index < positions_capacity)
    goto loop0;
  goto end;
    
 end:
  return positions_index;
}

As you can see, I did not use "higher level language features" for this function (does C even have such things?: .)), And now.如您所见,我没有为此 function 使用“高级语言功能”(C 甚至有这样的东西吗?:.)),现在。 you can start to "assemble".你可以开始“组装”了。 If RAX is supposed to hold your i variable, you could replace size_t i = 0;如果RAX应该保存你的i变量,你可以替换size_t i = 0; with XOR RAX,RAX .XOR RAX,RAX And so on.等等。

With that approach, other people even have a chance to read the assembly code and with the comments (the former c-code), you state the intent of your instructions.使用这种方法,其他人甚至有机会阅读汇编代码和注释(以前的 c 代码),您 state 说明您的意图。

Thanks, everyone for the answers here's how i solved it:谢谢大家的回答,我就是这样解决的:

#include <stdio.h>


void main()
{
#define MAX_LEN 100
    //INPUT
    char s1[] = "Sette scettici sceicchi sciocchi con la sciatica a Shanghai";  //first string
    unsigned int lengthS1 = sizeof(s1) - 1;
    char s2[] = "icchi";   //second string
    unsigned int lengthS2 = sizeof(s2) - 1;
    //OUTPUT
    unsigned int positions[MAX_LEN];
    unsigned int positionsLen;

    _asm
    {
        XOR EAX, EAX //LETTER
        XOR EBX, EBX //i+j
        XOR EDX, EDX //j
        XOR ESI, ESI //num occurrences
        XOR EDI, EDI //length
        XOR ECX, ECX //i

        //if s1<s2 it means that I have no occurrences
        MOV EDI, lengthS1 //length
        CMP EDI, lengthS2 
        JB end
        SUB EDI, lengthS2 // length of s1 - lenth of s2
        MOV positionsLen, EDI
        loop0 :
        CMP ECX, positionsLen  //if i> lengthS1-lengthS2 jump to end
            JA end
            XOR EDX, EDX        //set to 0 j

            loop1 :
        CMP EDX, lengthS2 //if j==lengthS2 jump to check
            JE check
            XOR EBX, EBX    //set to 0 i+j
            MOV EBX, ECX    //load in EBX i
            ADD EBX, EDX    //i+j
            MOV AL, s1[EBX]  //move the string index in AL (8bit)
            CMP AL, s2[EDX] //check the contents of the index of both strings
            JNE check   //if not equal jump to check
            INC EDX     //increase j
            JMP loop1  //restart from loop1
            check :
        CMP EDX, lengthS2 //if j==lengthS2 jump to equal
            JE equal
            XOR EDX, EDX    //set to 0 j
            INC ECX     //increase i
            JMP loop0   //restart from loop0
            equal :
        XOR EDX, EDX     //set to 0 j
            MOV positions[ESI * 4], ECX   //load in positions[ESI*4] i
            INC ESI     //increase ESI
            INC ECX     //increase i
            JMP loop0   //restart from loop0
            end :
        MOV positionsLen, ESI   //load in positionsLen ESI

    }
    {

        unsigned int i;
        for (i = 0; i < positionsLen; i++)
            printf("substring in position-%d\n", positions[i]);
    }

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM