[英]Create a new column from two existing text columns in a DataFrame using pandas/python
I have a Dataframe with two columns "Start_location"
and "end_location"
.我有一个 Dataframe 有两列"Start_location"
和"end_location"
。 I want to create a new column called "location"
from the 2 previous columns with the following conditions.我想从具有以下条件的前 2 列中创建一个名为"location"
的新列。
If the values of "start_location" == "end_location"
, then the value of "location"
will be either of the values of the first two columns.如果"start_location" == "end_location"
的值,那么"location"
的值将是前两列的值之一。 else, if the values of of "start_location"
and "end_location
are different, then values of "Location"
will be "start_location"-"end_location".
否则,如果"start_location"
和"end_location
”的值不同,则"Location"
的值将是"start_location"-"end_location".
An example of what I want is this.我想要的一个例子就是这个。
+---+--------------------+-----------------------+
| | Start_location | End_location |
+---+--------------------+-----------------------+
| 1 | Stratford | Stratford |
| 2 | Bromley | Stratford |
| 3 | Brighton | Manchester |
| 4 | Delaware | Delaware |
+---+--------------------+-----------------------+
The result I want is this.我想要的结果是这样的。
+---+--------------------+-----------------------+--------------------+
| | Start_location | End_location | Location |
+---+--------------------+-----------------------+--------------------+
| 1 | Stratford | Stratford | Stratford |
| 2 | Bromley | Stratford | Brombley-Stratford |
| 3 | Brighton | Manchester | Brighton-Manchester|
| 4 | Delaware | Delaware | Delaware |
+---+--------------------+-----------------------+--------------------+
I would be happy if anyone can help.如果有人可以提供帮助,我会很高兴。
PS- forgive me if this is a very basic question. PS-如果这是一个非常基本的问题,请原谅我。 I have gone through some similar questions on this topic but couldn't get a headway.我在这个主题上经历了一些类似的问题,但没有取得进展。
You can make your own function that does this and then use apply
and a lambda function:您可以制作自己的 function 执行此操作,然后使用apply
和 lambda function:
def get_location(start, end):
if start == end:
return start
else:
return start + ' - ' + end
df['location'] = df.apply(lambda x: get_location(x.Start_location, x.End_location), axis = 1)
df['Location'] = df[['start_location','end_location']].apply(lambda x: x[0] if x[0] == x[1] else x[0] + '-' + x[1], axis = 1)
Use np.select(condition, choice)
.使用np.select(condition, choice)
。 To join start, use .str.cat()
method要加入开始,请使用.str.cat()
方法
import numpy as np
condition=[df['Start_location']==df['End_location'],df['Start_location']!= df['End_location']]
choice=[df['Start_location'], df['Start_location'].str.cat(df['End_location'], sep='_')]
df['Location']=np.select(condition, choice)
df
Start_location End_location Location
1 Stratford Stratford Stratford
2 Bromley Stratford Bromley_Stratford
3 Brighton Manchester Brighton_Manchester
4 Delaware Delaware Delaware
You can use Numpy
to compare both columns.您可以使用Numpy
来比较两列。 Follow This code遵循此代码
import numpy as np
df["Location"] = np.where((df['Start_location'] == df['End_location'])
, df['Start_location'],df['Start_location']+"-"+ df['End_location'])
df
Output: Output:
Start_location End_location Location
0 Stratford Stratford Stratford
1 Bromley Stratford Bromley-Stratford
2 Brighton Manchester Brighton-Manchester
3 Delaware Delaware Delaware
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.