[英]Create new column in DataFrame based on respective values in two different columns
I have the following dataframe:我有以下 dataframe:
cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','No Brand'],
'Country': ['Japan','No Country','United States','Germany']
}
df = pd.DataFrame(cars, columns = ['Brand', 'Country'])
df.head(4)
Brand Country
0 Honda Civic Japan
1 Toyota Corolla No Country
2 Ford Focus United States
3 No Brand Germany
Would like to create a new column in the dataframe which will combine based on the values of column 'Brand' and 'Country'.想在 dataframe 中创建一个新列,它将根据“品牌”和“国家”列的值进行组合。 If there is 'No Brand' value in column Brand then column Desc only takes the value in column Country.
如果 Brand 列中存在“No Brand”值,则 Desc 列仅采用 Country 列中的值。 If there is 'No Country' value in column Country then column Desc only takes the value in column Brand.
如果 Country 列中有“No Country”值,则 Desc 列仅采用 Brand 列中的值。 Desired output:
所需的 output:
Brand Country Desc
0 Honda Civic Japan Honda Civic Japan
1 Toyota Corolla No Country Toyota Corolla
2 Ford Focus United States Ford Focus United States
3 No Brand Germany Germany
If it is checking the string in one column, I am able to do so but not sure how to proceed for two columns.如果它检查一列中的字符串,我可以这样做,但不确定如何处理两列。 Right now I can only check the boolean on the condition I want.
现在我只能在我想要的条件下检查 boolean。
df['Desc'] = df['Brand'].str.contains("No Brand") | df['Country'].str.contains("No Country")
Brand Country Desc
0 Honda Civic Japan False
1 Toyota Corolla No Country True
2 Ford Focus United States False
3 No Brand Germany True
I read that it is not recommended to iterate dataframe and avoid doing so.我读到不建议迭代 dataframe 并避免这样做。
def get_desc(brand, country):
return (brand if brand != 'No Brand' else '') +\
(' ' + country if country != 'No Country' else '')
df['Desc'] = df['Brand'].combine(df['Country'], get_desc)
print(df.head(4))
Output: Output:
Brand Country Desc
0 Honda Civic Japan Honda Civic Japan
1 Toyota Corolla No Country Toyota Corolla
2 Ford Focus United States Ford Focus United States
3 No Brand Germany Germany
Let's concat
both the columns then use str.replace
to replace the No Brand
and No Country
values with empty string:让我们
str.replace
concat
No Brand
和No Country
值替换为空字符串:
df['Desc'] = (df['Brand'] + ' ' + df['Country']).str.replace(r'No Brand\s*|\s*No Country', '')
Result:结果:
Brand Country Desc
0 Honda Civic Japan Honda Civic Japan
1 Toyota Corolla No Country Toyota Corolla
2 Ford Focus United States Ford Focus United States
3 No Brand Germany Germany
In [2]: cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','No Brand'],
...: 'Country': ['Japan','No Country','United States','Germany']
...: }
...:
...: df = pd.DataFrame(cars, columns = ['Brand', 'Country'])
...: df
Out[2]:
Brand Country
0 Honda Civic Japan
1 Toyota Corolla No Country
2 Ford Focus United States
3 No Brand Germany
In [3]: df['New_col'] = (df.Brand + " " + df.Country).str.replace("No Brand", "").str.replace("No Country", "").str.strip()
In [4]: df
Out[4]:
Brand Country New_col
0 Honda Civic Japan Honda Civic Japan
1 Toyota Corolla No Country Toyota Corolla
2 Ford Focus United States Ford Focus United States
3 No Brand Germany Germany
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.