简体   繁体   English

根据两个不同列中的各自值在 DataFrame 中创建新列

[英]Create new column in DataFrame based on respective values in two different columns

I have the following dataframe:我有以下 dataframe:

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','No Brand'],
        'Country': ['Japan','No Country','United States','Germany']
        }

df = pd.DataFrame(cars, columns = ['Brand', 'Country'])
df.head(4)

            Brand        Country
0     Honda Civic          Japan
1  Toyota Corolla     No Country
2      Ford Focus  United States
3        No Brand        Germany

Would like to create a new column in the dataframe which will combine based on the values of column 'Brand' and 'Country'.想在 dataframe 中创建一个新列,它将根据“品牌”和“国家”列的值进行组合。 If there is 'No Brand' value in column Brand then column Desc only takes the value in column Country.如果 Brand 列中存在“No Brand”值,则 Desc 列仅采用 Country 列中的值。 If there is 'No Country' value in column Country then column Desc only takes the value in column Brand.如果 Country 列中有“No Country”值,则 Desc 列仅采用 Brand 列中的值。 Desired output:所需的 output:

            Brand        Country    Desc
0     Honda Civic          Japan    Honda Civic Japan
1  Toyota Corolla     No Country    Toyota Corolla
2      Ford Focus  United States    Ford Focus United States
3        No Brand        Germany    Germany

If it is checking the string in one column, I am able to do so but not sure how to proceed for two columns.如果它检查一列中的字符串,我可以这样做,但不确定如何处理两列。 Right now I can only check the boolean on the condition I want.现在我只能在我想要的条件下检查 boolean。

df['Desc'] = df['Brand'].str.contains("No Brand") | df['Country'].str.contains("No Country")

            Brand        Country    Desc
0     Honda Civic          Japan    False
1  Toyota Corolla     No Country    True
2      Ford Focus  United States    False
3        No Brand        Germany    True

I read that it is not recommended to iterate dataframe and avoid doing so.我读到不建议迭代 dataframe 并避免这样做。

def get_desc(brand, country):
    return (brand if brand != 'No Brand' else '') +\
           (' ' + country if country != 'No Country' else '')


df['Desc'] = df['Brand'].combine(df['Country'], get_desc)

print(df.head(4))

Output: Output:

            Brand        Country                      Desc
0     Honda Civic          Japan         Honda Civic Japan
1  Toyota Corolla     No Country            Toyota Corolla
2      Ford Focus  United States  Ford Focus United States
3        No Brand        Germany                   Germany

Let's concat both the columns then use str.replace to replace the No Brand and No Country values with empty string:让我们str.replace concat No BrandNo Country值替换为空字符串:

df['Desc'] = (df['Brand'] + ' ' + df['Country']).str.replace(r'No Brand\s*|\s*No Country', '')

Result:结果:

            Brand        Country                      Desc
0     Honda Civic          Japan         Honda Civic Japan
1  Toyota Corolla     No Country            Toyota Corolla
2      Ford Focus  United States  Ford Focus United States
3        No Brand        Germany                   Germany
In [2]: cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','No Brand'],
   ...:         'Country': ['Japan','No Country','United States','Germany']
   ...:         }
   ...: 
   ...: df = pd.DataFrame(cars, columns = ['Brand', 'Country'])
   ...: df
Out[2]: 
            Brand        Country
0     Honda Civic          Japan
1  Toyota Corolla     No Country
2      Ford Focus  United States
3        No Brand        Germany

In [3]: df['New_col'] = (df.Brand + " " + df.Country).str.replace("No Brand", "").str.replace("No Country", "").str.strip()

In [4]: df
Out[4]: 
            Brand        Country                   New_col
0     Honda Civic          Japan         Honda Civic Japan
1  Toyota Corolla     No Country            Toyota Corolla
2      Ford Focus  United States  Ford Focus United States
3        No Brand        Germany                   Germany

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有办法比较包含浮点值的数据帧的两列并创建一个新列以基于它添加标签? - Is there a way to compare two columns of a dataframe containing float values and create a new column to add labels based on it? 基于来自不同数据框的其他列创建新列 - Create new column based on other columns from a different dataframe Pandas DataFrame 基于其他两列创建新的 csv 列 - Pandas DataFrame create new csv column based on two other columns 根据多个列中的值创建新的数据框列 - Create new dataframe column based on values in multiple columns 根据其他列中的“NaN”值在 Pandas Dataframe 中创建一个新列 - Create a new column in Pandas Dataframe based on the 'NaN' values in other columns 如何根据其他列的值在数据框中创建新列? - How to create a new column in a dataframe based off values of other columns? 从具有不同值和类型的一列创建新的 dataframe 列 - Create new dataframe columns from one column with different values and types 如果两列的值不同,则在 dataframe 中创建新行 - Create new row in a dataframe if values from two columns are different 根据两个现有列的对应值创建一个新列 - Create a new column based on corresponding values of two existing columns 如何根据 Spark 中两列的值创建新列 - How to create a new column based on values of two columns in Spark
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM