简体   繁体   中英

How can I split a string in python that has multiple spaces between?

It's difficult for me to explain this within a title, so please allow me to do such here.

I'm working on a search interface for a utility I'm developing, one with google(ish) filters.

It works fine when there's only one filter in the query, but when there are two or more, problems appear.

So, let's say I have a query like intitle:foo bar inbody: boo far

As an example, while the first part makes it to the second part of the loop and is correctly interpreted as {intitle:foo bar} , the next one is printed out in the first part of the loop as foo bar inbody , followed by its value boo far

What should be happening is each filter should be recognized and isolated into its own pair (eg {intitle:foo bar} {inbody: bar foo} )

Below is the code responsible for this problem.

def ParseFilters(query):
    filterVals = []

    if ":" in query:
        query = query.split(":")

        for part in query:
            # This is the first part of the loop
            print(part)
            if part in filters:
                # This is the second part of the loop
                listIndex = query.index(part)
                filtering = query[listIndex + 1]

                for f in filters:
                    filtering = filtering.strip(f).lstrip()

                pair = {
                    part: filtering
                }
                
                print(pair)

                filterVals.append(pair)
    return filterVals

The "filters" table is

filters = [
    "intitle",
    "inbody"
]

If I understand your requirements correctly. I would write something like this:

from collections import defaultdict

filters = [
    "intitle",
    "inbody"
]

query = 'intitle:foo bar inbody: boo far '

result = defaultdict(list)
current_filter = None
for elem in query.split():
    left, _, right = elem.partition(':')
    if left in filters:
        current_filter = left
        if right:
            result[current_filter].append(right)
    else:
        result[current_filter].append(left)

print(result)

Output:

defaultdict(<class 'list'>, {'intitle': ['foo', 'bar'], 'inbody': ['boo', 'far']})

In my opinion this is slightly more declarative and easier to make more robust in the future. You can experiment with it to make it meet your requirements. I suggest you check out str.partition , it is incredibly useful for a lot of stuff like this. And defaultdict works just like a dictionary.

That's because when you do query.split(":") your program has no way of knowing that inbody is a filter and not part of intitle value. The best way would be to use Regular Expressions to find all filters and all values and store them in different lists (ie: query_filters and query_values ) and then make a dict :

import re


filter_table = ["intitle", "inbody"]
query = "intitle:foo bar inbody: boo far"

# Create a regular expression to match filters
filters_re = re.compile(r"\s*[a-zA-Z]+\:\s*")

# Find all filters
query_filters = filters_re.findall(query)
# Find all values by splitting query at the values matched by filters_re
query_values = filters_re.split(query)

# Cleaning up the strings
query_filters = map(lambda x: x.strip().replace(":", ""), query_filters)
query_values = map(lambda x: x.strip(), filter(None, query_values))

# Make pairs
filter_pairs = zip(query_filters, query_values)

# Remove filters that are not in filter_table
filter_pairs = filter(lambda x: x[0] in filter_table, filter_pairs)

filter_dict = dict(filter_pairs)

print(filter_dict)

Or, if you like one-liners:

import re

filter_table = ["intitle", "inbody"]
query = "intitle:foo bar inbody: boo far"

filter_dict = dict(filter(lambda x: x[0] in filter_table, zip(re.findall(r"[a-zA-Z]+(?=\:)", query), map(lambda x: x.strip(), filter(None, re.split(r"\s*[a-zA-Z]+\:\s*", query))))))

print(filter_dict)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM