[英]Creating a new column on conditional of two other columns pandas
I have a dataframe with two columns.我有一个包含两列的数据框。 I want to create a new column and input whichever column has the longest string.
我想创建一个新列并输入具有最长字符串的列。 so
所以
column_a column_b column_c
0 'dog is fast' 'dog is faster' 'dog is faster' (desired output)
I tried this code but got an error saying that int is not iterable, I was thinking in merging the series after to the df.我试过这段代码,但得到一个错误,说 int 不可迭代,我正在考虑将系列后合并到 df。 I wasn't sure how to implement it right away into a column of the df.
我不确定如何立即将它实施到 df 的列中。
column_c = pd.Series()
for i in len(df.column_a):
if len(df.column_a.iloc[i]) >= len(df.column_b.iloc[0]):
column_c.append(df.column_a.iloc[i])
else:
column_c.append(df.column_b.iloc[i])
any help is apreciated.任何帮助都值得赞赏。
Use pandas.DataFrame.apply
:使用
pandas.DataFrame.apply
:
Given sample data给定样本数据
import pandas as pd
df = pd.DataFrame([['fast', 'faster'], ['slower', 'slow']])
0 1
0 fast faster
1 slower slow
df['column_c'] = df.apply(lambda x:max(x, key=len), 1)
Output:输出:
0 1 column_c
0 fast faster faster
1 slower slow slower
Using np.where
with str.len
使用
np.where
和str.len
df['column_c']=np.where(df.column_a.str.len()>df.column_b.str.len(),df.column_a,df.column_b)
df
Out[301]:
column_a column_b column_c
0 'dog is fast' 'dog is faster' 'dog is faster'
可以使用 df.apply()
df['column_c'] = df.apply(lambda x: x[0] if len(x[0]) > len(x[1]) else x[1], axis=1)
You can use DataFrame.apply
.您可以使用
DataFrame.apply
。 You need to apply on specific columns if you have more than two columns in your dataframe如果数据框中有两列以上,则需要对特定列进行应用
df['column_c'] = df.apply(lambda x: x[0] if len(x[0]) > len(x[1]) else x[1], axis = 1)
column_a column_b column_c
0 'dog is fast' 'dog is faster' 'dog is faster'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.