如何字符串分割，匹配和输出特定模式？

Question

I'm trying to solve a problem which I have done it with PHP, not sure how to do that in Python. 我正在尝试解决我用PHP完成的问题，不确定如何在Python中完成。

In the following three Rows, we like to match based on these two patterns: 在以下三行中，我们希望基于以下两种模式进行匹配：

only vine.co and twitter.com URLs (other domains should be ignored) 仅vine.co和twitter.com URL（其他域应忽略）
only URLs before commas , (last URL in each Row should be ignored) 只有逗号之前的网址（每行一个网址就应该被忽略）

Input 输入

Row 1: https://vine.co/v/5W2Dg3XPX7a,https://vine.co/v/5W2Dg3XPX7a
Row 2: https://twitter.com/dog_rates/status/836677758902222849/photo/1,https://twitter.com/dog_rates/status/836677758902222849/photo/1
Row 3: https://www.gofundme.com/lolas-life-saving-surgery-funds,https://twitter.com/dog_rates/status/835264098648616962/photo/1,https://twitter.com/dog_rates/status/835264098648616962/photo/1

The output would be an array in Python (which this output is based on PHP): 输出将是Python中的数组（此输出基于PHP）：

array(3) {
  [0]=>
  string(30) "https://vine.co/v/5W2Dg3XPX7a
"
  [1]=>
  string(64) "https://twitter.com/dog_rates/status/836677758902222849/photo/1
"
  [2]=>
  string(63) "https://twitter.com/dog_rates/status/835264098648616962/photo/1"
}

PHP Code: PHP代码：

$input = 'Row 1: https://vine.co/v/5W2Dg3XPX7a,https://vine.co/v/5W2Dg3XPX7a
Row 2: https://twitter.com/dog_rates/status/836677758902222849/photo/1,https://twitter.com/dog_rates/status/836677758902222849/photo/1
Row 3: https://www.gofundme.com/lolas-life-saving-surgery-funds,https://twitter.com/dog_rates/status/835264098648616962/photo/1,https://twitter.com/dog_rates/status/835264098648616962/photo/1';

$array = preg_split('/Row\s\d:\s/s', $input);

$output = array();
foreach ($array as $key => $value) {
    if (strlen($value) > 1) {
        $URL_arrays = explode(',', $value);
        foreach ($URL_arrays as $key => $value) {
            if ($key = sizeof($URL_arrays) - 1) {
                unset($URL_arrays[sizeof($URL_arrays) - 1]);
            } else {
                $match = preg_match('/twitter\.com|vine\.co/s', $value);
                if ($match) {
                    array_push($output, $value);
                }
            }
        }
    }
}

var_dump($output);

This question is based on this RegEx problem , which you may answer either of which. 此问题基于此RegEx问题，您可以回答其中一个。

Answer 1

You can use this regex to capture all URLs having vine.com or twitter.com domain which have a comma just after the URL, 您可以使用此正则表达式来捕获所有具有vine.com或twitter.com域的URL，这些URL vine.com是逗号，

https:\/\/(?:www\.)?(?:vine\.co|twitter\.com)[^,\s]*(?=,)

As you wanted, the key point is this positive look ahead (?=,) which ensures, your URL is followed by a comma immediately after the URL. 如您所愿，关键是要积极向前看(?=,) ，这可以确保URL后面紧跟一个逗号。

Regex Demo 正则表达式演示

Python code extracting URLs using re.findall 使用re.findall提取URL的Python代码

import re

s = '''Row 1: https://vine.co/v/5W2Dg3XPX7a,https://vine.co/v/5W2Dg3XPX7a
Row 2: https://twitter.com/dog_rates/status/836677758902222849/photo/1,https://twitter.com/dog_rates/status/836677758902222849/photo/1
Row 3: https://www.gofundme.com/lolas-life-saving-surgery-funds,https://twitter.com/dog_rates/status/835264098648616962/photo/1,https://twitter.com/dog_rates/status/835264098648616962/photo/1'''

print(re.findall(r'https:\/\/(?:www\.)?(?:vine\.co|twitter\.com)[^,\s]*(?=,)', s))

Outputs, 输出，

['https://vine.co/v/5W2Dg3XPX7a', 'https://twitter.com/dog_rates/status/836677758902222849/photo/1', 'https://twitter.com/dog_rates/status/835264098648616962/photo/1']

Answer 2

Because you don't need to hold duplicates, I would suggest to use a set instead of array (but order changes): 因为您不需要保留重复项，所以我建议使用集合而不是数组（但是顺序会发生变化）：

{url for x in s.split('\n') for url in x.split(': ')[1].split(',')  if 'vine.co' in url or 'twitter.co' in url}

Code : 代码：

s = '''Row 1: https://vine.co/v/5W2Dg3XPX7a,https://vine.co/v/5W2Dg3XPX7a
Row 2: https://twitter.com/dog_rates/status/836677758902222849/photo/1,https://twitter.com/dog_rates/status/836677758902222849/photo/1
Row 3: https://www.gofundme.com/lolas-life-saving-surgery-funds,https://twitter.com/dog_rates/status/835264098648616962/photo/1,https://twitter.com/dog_rates/status/835264098648616962/photo/1'''

print({url for x in s.split('\n') for url in x.split(': ')[1].split(',')  if 'vine.co' in url or 'twitter.co' in url})

# {'https://twitter.com/dog_rates/status/835264098648616962/photo/1', 
#  'https://twitter.com/dog_rates/status/836677758902222849/photo/1',
#  'https://vine.co/v/5W2Dg3XPX7a'}

如何字符串分割，匹配和输出特定模式？

问题描述

Input 输入

PHP Code: PHP代码：

2 个解决方案

解决方案1
2 已采纳 2019-04-28 06:50:08

解决方案2
1 2019-04-28 06:50:39

如何字符串分割，匹配和输出特定模式？

问题描述

Input 输入

PHP Code: PHP代码：

2 个解决方案

解决方案1 2 已采纳 2019-04-28 06:50:08

解决方案2 1 2019-04-28 06:50:39

解决方案1
2 已采纳 2019-04-28 06:50:08

解决方案2
1 2019-04-28 06:50:39