在python中查找带有多个后缀的公共前缀结尾

Question

I have a list of string. 我有一个字符串列表。

A = [
  'kite1.json',
  'kite1.mapping.json',
  'kite1.analyzer.json',
  'kite2.json',
  'kite3.mapping.json',
  'kite3.mapping.mapping.json',
  'kite3.mapping.analyzer.json',
 ]

I need to find common prefix which ends with all of .json , .mapping.json , .analyzer.json . 我需要找到共同的前缀与所有的结束.json ， .mapping.json ， .analyzer.json 。

Here, kite1 & kite3.mapping are satisfied. 在这里， kite1和kite3.mapping是满意的。 But kite2 isn't, because it only ends with .json . 但是kite2不是，因为它只有结束.json 。

How can I find those prefix which ends with all of .json , .mapping.json , .analyzer.json . 我如何才能找到与所有这些结束前缀.json ， .mapping.json ， .analyzer.json 。

Answer 1

If this were code-golf , I might win: 如果这是代码高尔夫，我可能会赢：

def ew(sx): 
   return set([s[:-len(sx)] for s in A if s.endswith(sx)])

ew('.analyzer.json') & ew('.mapping.json') & ew('.json')

The ew() function loops through A , finding all elements that end with the given suffix and stripping the suffix off, returning the results at a set. ew()函数循环遍历A ，查找以给定后缀结尾的所有元素并将后缀剥离，将结果返回到集合。

Using it, I just calculate the intersection of the sets produced from each of the three suffixes. 使用它，我只计算从三个后缀中的每一个产生的集合的交集。 ( & is the operator for intersection.) （ &是交叉的运营商。）

For brevity's sake, I abbreviated "ends with" to ew and "suffix" to sx . 为简洁起见，我将“end with”缩写为ew ，将“suffix”缩写为sx 。

The expression s[:-len(sx)] means "the substring of s starting at 0 and going to len(sx) characters from the end", which has the effect of the snipping suffix off the end. 表达式s[:-len(sx)]表示“ s的子字符串从0开始并从末尾转到len(sx)字符”，它具有剪切后缀的结尾。

Answer 2

Well, all you need is to collect a set of prefixes for each suffix in ['.json', '.mapping.json', '.analyzer.json'] and then just take an intersection of these sets: 好吧，你只需要为['.json', '.mapping.json', '.analyzer.json']每个后缀收集一组前缀，然后只取这些集合的交集：

In [1]: A = [
   ...:   'kite1.json',
   ...:   'kite1.mapping.json',
   ...:   'kite1.analyzer.json',
   ...:   'kite2.json',
   ...:   'kite3.mapping.json',
   ...:   'kite3.mapping.mapping.json',
   ...:   'kite3.mapping.analyzer.json',
   ...:  ]

In [2]: suffixes = ['.json', '.mapping.json', '.analyzer.json']

In [3]: prefixes = {s: set() for s in suffixes}

In [4]: for word in A:
   ....:     for suffix in suffixes:
   ....:         if word.endswith(suffix):
   ....:             prefixes[suffix].add(word[:-len(suffix)])
   ....:             

In [5]: prefixes
Out[5]: 
{'.analyzer.json': {'kite1', 'kite3.mapping'},
 '.json': {'kite1',
  'kite1.analyzer',
  'kite1.mapping',
  'kite2',
  'kite3.mapping',
  'kite3.mapping.analyzer',
  'kite3.mapping.mapping'},
 '.mapping.json': {'kite1', 'kite3', 'kite3.mapping'}}

In [6]: prefixes['.json'] & prefixes['.mapping.json'] & prefixes['.analyzer.json']
Out[6]: {'kite1', 'kite3.mapping'}

Answer 3

Use re.match and capturing groups to extract all matches for each of your patterns. 使用re.match和捕获组来提取每个模式的所有匹配项。 Then take the intersection of the resulting sets: 然后取结果集的交集：

import re

s1, s2, s3 = (
    set(m.group(1) for m in (re.match(pattern, s) for s in A) if m) 
    for pattern in (
        r'^(.+)\.json$',          # group(1) is the part within '()'
        r'^(.+)\.mapping\.json$', 
        r'^(.+)\.analyzer\.json$'
    )
)

result = list(s1 & s2 & s3)  # intersection
# ['kite3.mapping', 'kite1']

Answer 4

string = "\n".join(A)

json_prefices = re.findall(r"(.*?)\.json", string)
mapping_json_prefices = re.findall(r"(.*?)\.mapping\.json", string)
analyzer_json_prefices = re.findall(r"(.*?)\.analyzer\.json", string)

result = list(set(json_prefices) & set(mapping_json_prefices)
               & set(analyzer_json_prefices))

在python中查找带有多个后缀的公共前缀结尾

问题描述

4 个解决方案

解决方案1
3 已采纳 2016-06-07 06:59:36

解决方案2
1 2016-06-07 06:43:48

解决方案3
1 2016-06-07 06:44:17

解决方案4
0 2016-06-07 06:59:59

在python中查找带有多个后缀的公共前缀结尾

问题描述

4 个解决方案

解决方案1 3 已采纳 2016-06-07 06:59:36

解决方案2 1 2016-06-07 06:43:48

解决方案3 1 2016-06-07 06:44:17

解决方案4 0 2016-06-07 06:59:59

解决方案1
3 已采纳 2016-06-07 06:59:36

解决方案2
1 2016-06-07 06:43:48

解决方案3
1 2016-06-07 06:44:17

解决方案4
0 2016-06-07 06:59:59