檢查字符串是否有特定的子串格式，如何...？

Question

兩個字符串。 我的商品名稱：

香水名稱EDT 50ml

而競爭對手的物品名稱：

Parfume另一個名字EDP 60ml

我在一列中列出了這些名稱的長列表，其他列中的競爭對手名稱，我想只保留數據幀中的那些行，無論我和競爭對手的名字中的所有其他內容都是相同的看起來像。 那么如何在更大的字符串中找到以'ml'結尾的子字符串？ 我可以干脆做

"**ml" in competitors_name

看看它們是否含有相同量的ml。

謝謝

UPDATE

'ml'並不總是在字符串的末尾。 它可能看起來像這樣

Parfume又名60ml EDP

Answer 1

嘗試這個：

import re

def same_measurement(my_item, competitor_item, unit="ml"):
    matcher = re.compile(r".*?(\d+){}".format(unit))
    my_match = matcher.match(my_item)
    competitor_match = matcher.match(competitor_item)
    return my_match and competitor_match and my_match.group(1) == competitor_match.group(1)

my_item = "Parfume name EDT 50ml"
competitor_item = "Parfume another name EDP 50ml"
assert same_measurement(my_item, competitor_item)

my_item = "Parfume name EDT 50ml"
competitor_item = "Parfume another name EDP 60ml"
assert not same_measurement(my_item, competitor_item)

Answer 2

您可以使用python Regex庫為每個數據行選擇'xxml'值，然后執行一些邏輯來檢查它們是否匹配。

import re

data_rows = [["Parfume name EDT", "Parfume another name EDP 50ml"]]

for data_pairs in data_rows:
    my_ml = None
    comp_ml = None

    # Check for my ml matches and set value
    my_ml_matches = re.search(r'(\d{1,3}[Mm][Ll])', data_pairs[0])
    if my_ml_matches != None:
        my_ml = my_ml_matches[0]
    else:
        print("my_ml has no ml")

    # Check for comp ml matches and set value
    comp_ml_matches = re.search(r'(\d{1,3}[Mm][Ll])', data_pairs[1])     
    if comp_ml_matches != None:
        comp_ml = comp_ml_matches[0]
    else:
        print("comp_ml has no ml")

    # Print outputs
    if (my_ml != None) and (comp_ml != None):
        if my_ml == comp_ml:
            print("my_ml: {0} == comp_ml: {1}".format(my_ml, comp_ml))
        else:
            print("my_ml: {0} != comp_ml: {1}".format(my_ml, comp_ml))

其中data_rows =數據集中的每一行

data_pairs = {your_item_name，competitor_item_name}

Answer 3

您可以使用lambda函數來執行此操作。

import pandas as pd
import re
d = {
    'Us':
        ['Parfume one 50ml', 'Parfume two 100ml'],
    'Competitor':
        ['Parfume uno 50ml', 'Parfume dos 200ml']
}
df = pd.DataFrame(data=d)

df['Eq'] = df.apply(lambda x : 'Yes' if re.search(r'(\d+)ml', x['Us']).group(1) == re.search(r'(\d+)ml', x['Competitor']).group(1) else "No", axis = 1)

結果：

無論'ml'是否在字符串中間的末尾都無關緊要。

檢查字符串是否有特定的子串格式，如何...？

問題描述

3 個解決方案

解決方案1
3 2019-04-18 10:23:20

解決方案2
1 已采納 2019-04-18 10:25:00

解決方案3
-1 2019-04-18 10:41:38

檢查字符串是否有特定的子串格式，如何...？

問題描述

3 個解決方案

解決方案1 3 2019-04-18 10:23:20

解決方案2 1 已采納 2019-04-18 10:25:00

解決方案3 -1 2019-04-18 10:41:38

解決方案1
3 2019-04-18 10:23:20

解決方案2
1 已采納 2019-04-18 10:25:00

解決方案3
-1 2019-04-18 10:41:38