
[英]Merge two columns into one within the same data frame in pandas/python
[英]Python Pandas - Merge two Data Frame and Substring on columns
我在Python中有两个数据框,如下所示
df1
CUSTOMER_KEY LAST_NAME FIRST_NAME
30 f2b6769129 97bb97bebc
46 ca0464878d e276539bc2
51 62f2905a7a 8dfabd6d61
57 21032ca3bc 1f7e5e0c6e
62 f7e7fdd8ce eb6cf4af99
64 f536998bbb 7fc39eacd1
80 6069198f63 d873a71620
99 0ba61a6f66 a6cf7af3eb
102 e8b579b776 c8048fd459
df2
CUSTOMER_KEY LAST_NAME FIRST_NAME
30 Arthur Anderson
46 Teresa Johns
51 Louise Hurwitz
57 Timothy Addy
62 Jeffery Wilson
64 Andres Tuller
80 Daniel Green
99 Frank Nader
102 Faith Young
我想在Customer_key
上加入这两个数据框(我可以在Merge中进行此操作),然后在数据框的几列上串联起来,以在结果数据框中形成新的字符串。 从下面的数据帧中,我正在寻找的结果如下
result_df
CUSTOMER_KEY LAST_NAME FIRST_NAME
30 Artf2b676 And97bb97
46 Terca0464 Johe27653
基本上,df2中的substring(last_name,1,4)和df1中的substring(last_name,1,6)并将它们串联到新列中。 类似地,其他列。
我怎样才能做到这一点。
谢谢并恭祝安康
巴拉
使用str
df2['LAST_NAME']=df2['LAST_NAME'].str[:3]+df1['LAST_NAME'].str[:6]
df2['FIRST_NAME']=df2['FIRST_NAME'].str[:3]+df1['FIRST_NAME'].str[:6]
df2
Out[768]:
CUSTOMER_KEY LAST_NAME FIRST_NAME
0 30 Artf2b676 And97bb97
1 46 Terca0464 Johe27653
2 51 Lou62f290 Hur8dfabd
3 57 Tim21032c Add1f7e5e
4 62 Jeff7e7fd Wileb6cf4
5 64 Andf53699 Tul7fc39e
6 80 Dan606919 Gred873a7
7 99 Fra0ba61a Nada6cf7a
8 102 Faie8b579 Youc8048f
如果需要合并。
result=df1.merge(df2,on=['CUSTOMER_KEY'])
使用merge + str
import pandas as pd
df = pd.DataFrame([
['30','f2b6769129','97bb97bebc'],
['46','ca0464878d','e276539bc2'],
['51','62f2905a7a','8dfabd6d61'],
['57','21032ca3bc','1f7e5e0c6e'],
['62','f7e7fdd8ce','eb6cf4af99'],
['64','f536998bbb','7fc39eacd1'],
['80','6069198f63','d873a71620'],
['99','0ba61a6f66','a6cf7af3eb'],
['102','e8b579b776','c8048fd459']]
)
df2 = pd.DataFrame([
['30','Arthur','Anderson'],
['46','Teresa','Johns'],
['51','Louise','Hurwitz'],
['57','Timothy','Addy'],
['62','Jeffery','Wilson'],
['64','Andres','Tuller'],
['80','Daniel','Green'],
['99','Frank','Nader'],
['102','Faith','Young']]
)
keys = ['CUSTOMER_KEY','LAST_NAME','FIRST_NAME']
df.columns = keys
df2.columns = keys
df_join = pd.merge(df, df2, on="CUSTOMER_KEY", suffixes=['_1', '_2'])
df_join['LAST_NAME'] = df_join['LAST_NAME_2'].str.slice(0,3)+df_join['LAST_NAME_1'].str.slice(0,5)
df_join['FIRST_NAME'] = df_join['FIRST_NAME_2'].str.slice(0,3)+df_join['FIRST_NAME_1'].str.slice(0,5)
result_df = df_join[keys]
result_df.head()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.