简体   繁体   中英

Python locate specific words without duplicates

I have a problem. I am trying to find device names in a string. All the device names that I am looking for are stored in a List. There is one thing very important in what I want:

  • A command can have multiple devices!!!

Now the problem I have is this:

I have two devices ( Fan and Fan Light ). When I give the command: Turn on Fan Light both devices have been found, but I want only Fan Light to be found. I tried checking all the devices that have been found and set the longest one as found device like this:

# Create 2 dummy devices
device1 = {
    "name": "fan"
}

device2 = {
    "name": "fan light"
}


# Add devices to list
devices = []
devices.append(device1)
devices.append(device2)


# Given command
command = "Turn on fan light"
    
    
foundDevices = []

# Search devices in sentence
for device in devices:

    # Splits a device name if it has multiple words
    deviceSplit = device["name"].split()
    numOfSubNames = len(deviceSplit)

    # Checks for every sub-name if it is found in the string
    i = 0
    for subName in deviceSplit:
        if subName in command:
            i += 1

    # Checks if all names where located in string
    if i == numOfSubNames:
         foundDevices.append(device["name"])

# Checks if multiple devices have been found
if len(foundDevices) >= 2:
    largestNameLength = 0

    # Checks which device has the largest name
    for device in foundDevices:
        if (len(device) > largestNameLength):
            largestName = device
            largestNameLength = len(device)


    # Clears list and only add longest one
    foundDevices.clear()
    foundDevices.append(largestName)


print(foundDevices)

But that gives a problem when I say for example: "Turn on Fan Light and the Fan", because that command does contain multiple devices. How can I scan for devices the way I want?

A regular expression search is one way of quickly doing what you want, with a pattern made from the different device names.

import re

def find_with_regex(command, pattern):
    return list(set(re.findall(pattern, command, re.IGNORECASE)))

I would also suggest building the reversed dictionary of device: name shape, maybe it would help quickly finding the code name of a given device.

devices = [{'name': 'fan light'}, {'name': 'fan'}]

# build a quick-reference dict with device>name structure
transformed = {dev: name for x in devices for name, dev in x.items()}
# should also help weeding out duplicated devices
# as it would raise an error as soon as it fids one

# print(transformed)
# {'fan light': 'name', 'fan': 'name'}

Special thanks to buddemat for pointing out that device names to be in a particular order for this solution to work, fixed it with reversed(sorted(... on the pattern making line from the next code block.

Testing the function

test_cases = [
    'Turn on fan light',
    'Turn on fan light and fan',
    'Turn on fan and fan light',
    'Turn on fan and fan',
]

pattern = '|'.join(reversed(sorted(transformed)))
for command in test_cases:
    matches = find_with_regex(command, pattern)
    print(matches)

Output

['fan light']
['fan', 'fan light']
['fan', 'fan light']
['fan']

You can use the python regex module instead of the re module (to improve upon RichieV's nice answer), if you don't want to rely on sorting the list of devices to ensure the correct result.

The problem with re is, is that it is not POSIX compliant and thus, the pipe operator | will not ensure that the longest leftmost match is returned (see also How to order regular expression alternatives to get longest match? ).

However, in regex you can specify (?p) before a regex pattern to ensure POSIX matching.

Altogether

import regex

devices = [{'name': 'fan'}, {'name': 'fan light'}]

test_cases = [
    'Turn on fan light',
    'Turn on fan light and fan',
    'Turn on fan and fan light',
    'Turn on fan and fan',
]

transformed = {dev: name for x in devices for name, dev in x.items()}

pattern = '|'.join(transformed)

for command in test_cases:
    matches = regex.findall(r'(?p)'+pattern,command)
    print(matches)

will give you

['fan light']
['fan light', 'fan']
['fan', 'fan light']
['fan', 'fan']

regardless of the order of the dictionaries in devices .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM