简体   繁体   English

Smalltalk中子字符串的索引

[英]Indices of a substring in Smalltalk

It seems Smalltalk implementations misses an algorithm which return all the indices of a substring in a String. 似乎Smalltalk实现错过了一个算法,该算法返回String中子字符串的所有索引。 The most similar ones returns only one index of an element, for example : firstIndexesOf:in: , findSubstring:, findAnySubstring: variants. 最相似的只返回一个元素的索引,例如:firstIndexesOf:in:,findSubstring:,findAnySubstring:variants。

There are implementations in Ruby but the first one relies on a Ruby hack, the second one does not work ignoring overlapping Strings and the last one uses an Enumerator class which I don't know how to translate to Smalltalk. Ruby中实现,但第一个依赖于Ruby hack,第二个不能忽略重叠的字符串,最后一个使用Enumerator类,我不知道如何转换为Smalltalk。 I wonder if this Python implementation is the best path to start since considers both cases, overlapping or not and does not uses regular expressions. 我想知道这个Python实现是否是最好的开始路径,因为考虑两种情况,重叠或不重叠,并且不使用正则表达式。

My goal is to find a package or method which provides the following behavior: 我的目标是找到一个提供以下行为的包或方法:

'ABDCDEFBDAC' indicesOf: 'BD'. "#(2 8)"

When overlapping is considered: 考虑重叠时:

'nnnn' indicesOf: 'nn' overlapping: true. "#(0 2)"

When overlapping is not considered: 不考虑重叠时:

'nnnn' indicesOf 'nn' overlapping: false. "#(0 1 2)"

In Pharo, when a text is selected in a Playground, a scanner detects the substring and highlights matches. 在Pharo中,当在Playground中选择文本时,扫描程序会检测子字符串并突出显示匹配项。 However I couldn't find a String implementation of this. 但是我找不到这个的String实现。

My best effort so far results in this implementation in String (Pharo 6): 到目前为止,我的最大努力导致了String(Pharo 6)中的这种实现:

indicesOfSubstring: subString
  | indices i |

  indices := OrderedCollection new: self size.
  i := 0.
  [ (i := self findString: subString startingAt: i + 1) > 0 ] whileTrue: [
    indices addLast: i ].
  ^ indices

Let me firstly clarify that Smalltalk collections are 1-based, not 0-based. 首先让我澄清一下Smalltalk集合是基于1的,而不是基于0的。 Therefore your examples should read 因此,您的示例应该阅读

'nnnn' indexesOf: 'nn' overlapping: false. "#(1 3)"
'nnnn' indexesOf: 'nn' overlapping: true. "#(1 2 3)"

Note that I've also taken notice of @lurker's observation (and have tweaked the selector too). 请注意,我也注意到了@ lurker的观察(并且已经调整了选择器)。

Now, starting from your code I would change it as follows: 现在,从您的代码开始,我将更改如下:

indexesOfSubstring: subString overlapping: aBoolean
  | n indexes i |
  n := subString size.
  indexes := OrderedCollection new.                            "removed the size"
  i := 1.                                                      "1-based"
  [
    i := self findString: subString startingAt: i.             "split condition"
    i > 0]
  whileTrue: [
    indexes add: i.                                            "add: = addLast:"
    i := aBoolean ifTrue: [i + 1] ifFalse: [i + n]].           "new!"
  ^indexes

Make sure you write some few unit tests (and don't forget to exercise the border cases!) 确保你写了一些单元测试(并且不要忘记练习边框情况!)

Edited 编辑

It would also be nice if you would tell us what you need to achieve in the "greater picture". 如果你能告诉我们你需要在“更大的图景”中实现什么,那也很好。 Sometimes Smalltalk offers different approaches. 有时Smalltalk提供不同的方法。

Leandro beat me to the the code (and his code is more efficient), but I have already written it so I'll share it too. Leandro打败了我的代码(他的代码效率更高),但我已经写过了,所以我也会分享它。 Heed his advice on Smalltalk being 1-based => rewritten example. 注意他对Smalltalk的建议是基于1的=>重写的例子。

I have used Smalltalk/X and Pharo 6.1 for the example. 我已经使用Smalltalk / X和Pharo 6.1作为示例。

The code would be: 代码是:

indexesOfSubstring: substringToFind overlapping: aBoolean

    | substringPositions aPosition currentPosition |

    substringPositions := OrderedSet new. "with overlap on you could get multiple same 
              positions in the result when there is more to find in the source string"

    substringToFindSize := substringToFind size. "speed up for large strings"
    aPosition := 1.

    [ self size > aPosition ] whileTrue: [
        currentPosition := self findString: substringToFind startingAt: aPosition.
        (currentPosition = 0) ifTrue: [ aPosition := self size + 1 ] "ends the loop substringToFind is not found"
                             ifFalse: [
                                 substringPositions add: currentPosition.
                                 aBoolean ifTrue: [ aPosition := aPosition + 1 ] "overlapping is on"
                                         ifFalse: [ aPosition := currentPosition + substringToFindSize ] "overlapping is off"
                             ]
    ].

    ^ substringPositions

I have fixed some issues that occured to me. 我已经修复了一些发生在我身上的问题。 Don't forget to test it as much as you can! 别忘了尽可能多地测试它!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM