简体   繁体   中英

Referencing list to remove string values

The following def clean_sheet_title function references INVALID_TITLE_CHAR and INVALID_TITLE_CHAR_MAP to strip out invalid characters and limits the title to 31 characters -

# This strips characters that are invalid to Excel
INVALID_TITLE_CHARS = ["]", "[", "*", ":", "?", "/", "\\", "'"]
INVALID_TITLE_CHAR_MAP = {ord(x): "" for x in INVALID_TITLE_CHARS}

# How would I remove strings, as well as the characters from INVALID_TITLE_CHARS?
INVALID_TITLE_NAMES = ["zz_ FeeRelationship", " Family"]

def clean_sheet_title(title):
    title = title or ""
    title = title.strip()
    title = title.translate(INVALID_TITLE_CHAR_MAP)
    return title[:31]

My question is how I would expand this to also remove strings from within the INVALID_TITLE_NAMES list?

What I've tried:
I have tried making the following update to def clean_sheet_title however this makes no difference to title -

INVALID_TITLE_CHARS = ["]", "[", "*", ":", "?", "/", "\\", "'"]
INVALID_TITLE_CHAR_MAP = {ord(x): "" for x in INVALID_TITLE_CHARS}

INVALID_TITLE_NAMES = ["zz_ FeeRelationship", "Family"]


def clean_sheet_title(title):
    title = title or ""
    title = title.strip()
    title = title.translate(INVALID_TITLE_CHAR_MAP, "")
    for name in INVALID_TITLE_NAMES:
        title = title.replace(name, "")
    return title[:31]

Examples:

  • Current function ability - if title == Courtenay:Family then currently the def clean_sheet_title will ensure the title will be Courtenay Family .

  • Desired function ability - Sometimes title can be prefixed or sufixed with either zz_ FeeRelationship or Family , in both cases, these strings should be dropped. Eg zz_ FeeRelationship Courtenay:Family would become Courtenay

Try this:

for name in INVALID_TITLE_NAMES:
    title = title.replace(name, "")

Is that the result you are trying to achieve? It should replace each invalid name in title with an empty string.

You could use regular expressions to match any of your keywords or characters and replace them with an empty string:

import re

INVALID_TITLE_CHARS = ["]", "[", "*", ":", "?", "/", "\\", "'"]
INVALID_TITLE_NAMES = ["zz_ FeeRelationship", " Family"]

inv_char_grp = re.escape("".join(INVALID_TITLE_CHARS))
inv_name_grp = "|".join(re.escape(name) for name in INVALID_TITLE_NAMES)


regex = f"[{inv_char_grp}]|{inv_name_grp}"


title = "zz_ FeeRelationship Courtenay: Family"
result = re.sub(regex, "", title)
print(result)

which prints Courtenay


An explanation of the regular expressions:

  • Since we have special characters in INVALID_TITLE_CHARS , they need to be escaped so that the regex engine recognizes them as literal characters instead of using their special meaning. So we join all the characters in INVALID_TITLE_CHARS , then use re.escape to escape the resulting string. This gives us the regex inv_char_grp = r"\]\[\*:\?/\\'"
  • We wrap that in [ and ] to denote that we want to match one of any of those characters using `f"[{inv_char_grp}]".
  • We also want to match any of the names in INVALID_TITLE_NAMES . Since these are whole strings, we won't use a character group for them. Instead, we can use the | operator to indicate that we want to match any of its operands. Also remember to escape the names in case they contain any special characters.

The final regex we get is

[\]\[\*:\?/\\']|zz_\ FeeRelationship|\ Family

[\]\[\*:\?/\\']                                : Any of these chars ][*:?/\
               |                               : Or
                zz_\ FeeRelationship           : Exactly zz_, then a space, then FeeRelationship
                                    |          : Or
                                     \ Family  : Exactly one space, then Family

Try it online

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM