简体   繁体   中英

Create new column in DataFrame based on respective values in two different columns

I have the following dataframe:

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','No Brand'],
        'Country': ['Japan','No Country','United States','Germany']
        }

df = pd.DataFrame(cars, columns = ['Brand', 'Country'])
df.head(4)

            Brand        Country
0     Honda Civic          Japan
1  Toyota Corolla     No Country
2      Ford Focus  United States
3        No Brand        Germany

Would like to create a new column in the dataframe which will combine based on the values of column 'Brand' and 'Country'. If there is 'No Brand' value in column Brand then column Desc only takes the value in column Country. If there is 'No Country' value in column Country then column Desc only takes the value in column Brand. Desired output:

            Brand        Country    Desc
0     Honda Civic          Japan    Honda Civic Japan
1  Toyota Corolla     No Country    Toyota Corolla
2      Ford Focus  United States    Ford Focus United States
3        No Brand        Germany    Germany

If it is checking the string in one column, I am able to do so but not sure how to proceed for two columns. Right now I can only check the boolean on the condition I want.

df['Desc'] = df['Brand'].str.contains("No Brand") | df['Country'].str.contains("No Country")

            Brand        Country    Desc
0     Honda Civic          Japan    False
1  Toyota Corolla     No Country    True
2      Ford Focus  United States    False
3        No Brand        Germany    True

I read that it is not recommended to iterate dataframe and avoid doing so.

def get_desc(brand, country):
    return (brand if brand != 'No Brand' else '') +\
           (' ' + country if country != 'No Country' else '')


df['Desc'] = df['Brand'].combine(df['Country'], get_desc)

print(df.head(4))

Output:

            Brand        Country                      Desc
0     Honda Civic          Japan         Honda Civic Japan
1  Toyota Corolla     No Country            Toyota Corolla
2      Ford Focus  United States  Ford Focus United States
3        No Brand        Germany                   Germany

Let's concat both the columns then use str.replace to replace the No Brand and No Country values with empty string:

df['Desc'] = (df['Brand'] + ' ' + df['Country']).str.replace(r'No Brand\s*|\s*No Country', '')

Result:

            Brand        Country                      Desc
0     Honda Civic          Japan         Honda Civic Japan
1  Toyota Corolla     No Country            Toyota Corolla
2      Ford Focus  United States  Ford Focus United States
3        No Brand        Germany                   Germany
In [2]: cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','No Brand'],
   ...:         'Country': ['Japan','No Country','United States','Germany']
   ...:         }
   ...: 
   ...: df = pd.DataFrame(cars, columns = ['Brand', 'Country'])
   ...: df
Out[2]: 
            Brand        Country
0     Honda Civic          Japan
1  Toyota Corolla     No Country
2      Ford Focus  United States
3        No Brand        Germany

In [3]: df['New_col'] = (df.Brand + " " + df.Country).str.replace("No Brand", "").str.replace("No Country", "").str.strip()

In [4]: df
Out[4]: 
            Brand        Country                   New_col
0     Honda Civic          Japan         Honda Civic Japan
1  Toyota Corolla     No Country            Toyota Corolla
2      Ford Focus  United States  Ford Focus United States
3        No Brand        Germany                   Germany

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM