简体繁体 English

合并两个字符串之间的差异的方法？

[英]A Way To Merge Differences Between Two Strings?

原文 2012-01-22 01:41:28 0 1 php/ wordpress/ html-parsing/ string-parsing

I've created a wordpress plugin which automatically adds a  tag to your post based on certain criteria, but intelligently places this tag inside your post. 我已经创建了一个wordpress插件，可以根据特定条件自动为帖子添加标签，但会智能地将此标记放在帖子中。 The problem is, I haven't come up with a proper way to battle HTML code. 问题是，我没有想出一个正确的方法来对抗HTML代码。 Currently I have it checking to see if there is a < , and if so, it finds the next > in the post. 目前我检查是否有< ，如果有，它会在帖子中找到下一个> 。

What I'm hoping to do here is remove the html from the equation entirely. 我希望在这里做的是完全从等式中删除html。 I was wondering if there is any system, like a git for PHP, where I would be able to save the HTML code version of the string, and then strip the HTML away and store the plaintext version of the code in another variable, place the  tag into the plaintext version of the code, and then compare the two versions to properly merge the HTML code back into the plaintext. 我想知道是否有任何系统，比如PHP的git，我可以保存字符串的HTML代码版本，然后剥离HTML并将代码的纯文本版本存储在另一个变量中，放置标记到代码的纯文本版本，然后比较两个版本以正确地将HTML代码合并回明文。

I've tried Google, I've done about 100 hours of code changes, and I've still not come up with a solution. 我已经尝试过Google，我已经完成了大约100个小时的代码更改，而我仍然没有提出解决方案。 So now I'm bowing to the power of the cloud. 所以现在我屈服于云的力量。 Is there anyone here that can come up with a solution? 这里有没有人可以提出解决方案？

1 个解决方案

I have only a very rough idea of what you're trying to implement, so here's a very crude way of doing it. 我对你要实现的内容只有一个非常粗略的想法，所以这是一个非常粗略的方法。

Instead of taking the plain text separately and then doing all the calculations on it, you can do this in an "on the go" method. 您可以在“随时随地”的方法中执行此操作，而不是单独使用纯文本然后对其进行所有计算。

Run a loop on all the characters in your post. 对帖子中的所有字符运行循环。 If you find a < ignore ("continue") whatever comes next till you find a > so, essentially you get the plain text inside the loop, you can do all your initial counting inside this loop (total no. of character, no. of words, etc.) - Run this loop once more and add the more tag to the content based on the initial count, break out of the second loop. 如果你发现< ignore（“continue”）接下来发生的任何事情，直到你发现> so，基本上你得到循环中的纯文本，你可以在这个循环中完成所有初始计数（字符总数，不。（等等） - 再次运行此循环，并根据初始计数向内容添加更多标记，突破第二个循环。

Written below is another idea, which is a lot more complicated and assuming that you can't do without getting the plain text. 下面写的是另一个想法，它更加复杂，假设你不能没有得到纯文本。

Let the M be the main string that contains the whole post content. 设M是包含整个帖子内容的主字符串。 Every time you find a <tag> , push it into an array, remember the location of this tag in M, push that into another array. 每次找到<tag> ，将其推入数组，记住M中此标记的位置，将其推入另一个数组。

Once you have pushed all the tags in M into an array along with the location of the tag, what you have left is plain text. 将M中的所有标记与标记的位置一起推入数组后，剩下的就是纯文本。 After you're done, pop back all the tags from the array to the plain text based on the location. 完成后，根据位置将数组中的所有标记弹回到纯文本。 This ofcourse, needs a lot of refinement, but its just an idea you can develop on. 这一点，需要大量的改进，但它只是一个你可以发展的想法。