简体   繁体   English

如何在python中比较这些字符串?

[英]How to compare these strings in python?

I have the following string: 我有以下字符串:

1679.2235398,-1555.40390834,-1140.07728186,-1999.85500108

and I'm using a steganography technique to store it in an image. 我正在使用隐写技术将其存储在图像中。 Now when I retrieve it back out of the image, sometimes I got it back in a complete form and I have no issue with that. 现在,当我从图像中取回它时,有时我会以完整的形式将其取回,我对此没有任何问题。 Where in other occasions, the retrieved data are not fully retrieved (due to a modification/alteration being occurred on the image), so the result something look like this: 在其他情况下,检索到的数据未被完全检索(由于图像上发生了修改/更改),因此结果如下所示:

1679.2235398,-1555.I8\xf3\x1cj~\x9bc\x13\xac\x9e8I>[a\xfdV#\x1c\xe1\xea\xa0\x8ah\x02\xed\xd1\x1c\x84\x96\xe2\xfbk*8'l

Notice that, only "1679.2235398,-1555." 请注意,仅“1679.2235398,-1555”。 are correctly retrieved, while the rest is where the modification has been occurred. 正确检索,其余的是修改发生的地方。 Now, how do I compute (in percentage) how much I successfully retrieved? 现在,我如何计算(以百分比表示)我成功检索了多少? Since the length is not the same, I can't do a character by character comparison , it seems that I need to slice or convert the modified data into some other form to match the length of the original data. 由于长度不一样,我不能通过字符比较来做字符,似乎我需要将修改后的数据切片或转换成其他形式以匹配原始数据的长度。

Any tips? 有小费吗?

A lot of this is going to depend on the context of your problem, but you have a number of options here. 这很大程度上取决于你的问题的背景,但你有很多选择。

If your results always look like that, you could just find the longest common subsequence , then divide by the length of the original string for a percentage. 如果您的结果总是如此,您可以找到最长的公共子序列 ,然后除以原始字符串的长度百分比。

Levenshtein distance is a common way of comparing strings, as the number of characters required to change to turn one string into another. Levenshtein距离是比较字符串的常用方法,作为将一个字符串转换为另一个字符串所需的字符数。 This question has several answers discussing how to turn that into a percentage. 这个问题有几个答案,讨论如何将其转化为百分比。

If you don't expect the strings to always come out in the same order, this answer suggests some algorithms used for DNA work. 如果你不希望字符串总是以相同的顺序出现,那么这个答案就会提出一些用于DNA工作的算法。

Well it really depends.. My solution would be something like this: 嗯,这真的取决于..我的解决方案是这样的:

I would start with all the longest string possible and check if they are in the new string if original_string in new_string: 'something happens here'. 我会从所有最长的字符串开始,并检查它们是否在新字符串中, if original_string in new_string: 'something happens here'. that would be inside a loop that wld decrease the size of the original string and get all combinations possible. 这将是一个循环,它将减少原始字符串的大小,并使所有组合成为可能。 so the next one wld be N-1 long and have 2 possible combinations (cutting off the first number or the last number), and so on, until u get to a specific threshold, or to 1 long strings. 所以下一个将是N-1长并且有2种可能的组合(切掉第一个数字或最后一个数字),依此类推,直到你达到一个特定的阈值或1长的字符串。
the loop can store the longest string you find in a log inside the if conditional, and afterward you can just check the results. 循环可以存储在if条件内的日志中找到的最长字符串,然后您可以检查结果。 hope that helps. 希望有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM