简体   繁体   中英

How do I use Python to look at the values of one column in a dataframe and append a new value in another column based on those results?

I have a CSV file with one column full of values that resembles this...

Column A
Hi [First]!
Thank You!
What Car?
Are you [First] [Last]?
Did you know?
Save 25% [First]!
Get $2,000 Back!
Embrace the road ahead
Everyone saves 30%!

What I would like to do is build some logic that goes through column A and then assigns a new value to column B based on what the logic found.

For example the end result would look like this...

Column A Column B
Hi [First]! Personalized
Thank You! Gratitude
What Car? Curiosity
Are you [First] [Last]? Personalized
Did you know? Curiosity
Save 25% [First]! Personalized
Get $2,000 Back! Offer
Embrace the road ahead Generic
Everyone saves 30% Offer

I'd like for the logic to find any instances where it matches column A before moving on to the next step as well. For example there's two values that have a percentage but one is personalized. I'd like to put the personalized at the top of hierarchy and then say anything else that has "save" or a percentage might be an offer.

The beginning of my code looks similarly to this. I know the logic is completely wrong, but I don't know what I need to do to fix it and get the results I'm looking for. Any help would be much appreciated!

import pandas as pd
import glob
import re
import numpy as np

header_names=['Line', 'Type']
df = pd.read_csv('2021 Lines Types.csv',header=None, skiprows=1, names=header_names)

for i in df:
    if re.search('[First]|[Last]', df.columnA):
      columnB.append("Personalized")
    elif re.search('Save', df.columnA):
        columnB.append('Savings')

I'd use np.select for this case. It takes a list of conditions, and a list of the same length of replacement values. For each condition, all values that it matches will be replaced with the corresponding value. np.select also takes as a third parameter a default value when none of the conditions match.

conds = [
    df['Column A'].str.contains(r'\[(?:First|Last)\]'),
    df['Column A'].str.contains('Save'),
    df['Column A'].str.contains('?', regex=False),
    df['Column A'].str.contains('Thank', regex=False),
]

vals = [
    'Personalized',
    'Offer',
    'Curiosity',
    'Gratitude',
]

df['Column B'] = np.select(conds, vals, 'Generic')

Output:

>>> df
                  Column A      Column B
0              Hi [First]!  Personalized
1               Thank You!     Gratitude
2                What Car?     Curiosity
3  Are you [First] [Last]?  Personalized
4            Did you know?     Curiosity
5        Save 25% [First]!  Personalized
6         Get $2,000 Back!       Generic
7   Embrace the road ahead       Generic
8      Everyone saves 30%!       Generic

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM