简体   繁体   中英

How to find a string match in df col based on list of strings?

I have a list of 1000 corporate companies and a df of all previous transactions for the year. For every match, I would like to create a new row value (True) in the new column (df$Covered).

I am not sure why I keep getting the errors below. I tried researching these questions but no luck so far.

Match string to list of defined strings

Pandas extract rows from df where df['col'] values match df2['col'] values

Code Example: when I set regex=False

Customer_List = ['3M','Cargill,'Chili's,---]

df['Covered'] = df[df['End Customer Name'].str.contains('|'.join(Customer_List),case=False, na=False, regex=False)]

ValueError: Wrong number of items passed 32, placement implies 1

Code Example: when I set regex=True

error: bad character range HD at position 177825

 ~/opt/anaconda3/lib/python3.7/sre_parse.py in parse(str, flags, pattern)
    928 
    929     try:
--> 930         p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
    931     except Verbose:
    932         **# the VERBOSE flag was switched on inside the pattern.  to be**

~/opt/anaconda3/lib/python3.7/sre_parse.py in _parse_sub(source, state, verbose, nested)
    424     while True:
    425         itemsappend(_parse(source, state, verbose, nested + 1,
--> 426                            **not nested and not items**))
    427         if not sourcematch("|"):
    428             break

How about:

mask = df['End Customer Name'].isin(Customer_List)
df['covered'] = 0
df.loc[mask, 'covered'] = 1

Thanks everyone, it has to do with my Customer_List having special characters so I needed to use map(re.escape

This link helped me below Python regex bad character range.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM