[英]pandas new column equals another column with condition
got a pd database called data: 有一个名为data的pd数据库:
transaction_id house_id date_sale sale_price boolean_2015
0 1 1 31 Mar 2016 £880,000 True
3 4 2 31 Mar 2016 £450,000 True
4 5 3 31 Mar 2016 £680,000 True
6 7 4 31 Mar 2016 £1,850,000 True
7 8 5 31 Mar 2016 £420,000 True
and another one called houses: 另一个叫房屋:
id address postcode postcode first
0 1 Flat 78, Andrewes House, Barbican, London, Gre... EC2Y 8AY EC2Y
1 2 Flat 35, John Trundle Court, Barbican, London,... EC2Y 8DJ EC2Y
and question is how do I add a column to data called 'postcode_first' where I look up data['house_id'] and add the first part of the postcode to each row in data['postcode_first']? 问题是如何在名为“ postcode_first”的数据中添加一列,在其中查找data ['house_id']并将邮政编码的第一部分添加到data ['postcode_first']的每一行中?
the closest I got was 我最接近的是
data['postcode'] = np.where(houses['id'] == data['house_id'])
but this doesnt make sense at all any help guys? 但这对所有帮助人员都没有意义吗? EDIT also tried data['postcode'] = houses.loc[houses['id'] == data['house_id']]['postcode_first']
编辑也尝试过data['postcode'] = houses.loc[houses['id'] == data['house_id']]['postcode_first']
but this returned 但这又回来了
Traceback (most recent call last):
File "/Users/saminahbab/Documents/House_Prices/final project/sql_functions.py", line 30, in <module>
data['postcode'] = houses.loc[houses['id'] == data['house_id']]['postcode_first']
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/ops.py", line 735, in wrapper
raise ValueError('Series lengths must match to compare')
ValueError: Series lengths must match to compare
which shouldnt matter because I am trying to essentially say data['postcode'] equals houses['postcode_first'] WHERE houses['id'] equals data['house_id']
这无关紧要,因为我实际上是要说data['postcode'] equals houses['postcode_first'] WHERE houses['id'] equals data['house_id']
You can use map() method: 您可以使用map()方法:
In [108]: df['postcode'] = df.house_id.map(h.set_index('id')['postcode first'])
In [109]: df
Out[109]:
transaction_id house_id date_sale sale_price boolean_2015 postcode
0 1 1 31 Mar 2016 £880,000 True EC2Y
3 4 2 31 Mar 2016 £450,000 True EC2Y
4 5 3 31 Mar 2016 £680,000 True NaN
6 7 4 31 Mar 2016 £1,850,000 True NaN
7 8 5 31 Mar 2016 £420,000 True NaN
Data: 数据:
In [110]: h
Out[110]:
id address postcode postcode first
0 1 Flat 78, Andrewes House, Barbican, London, Gre EC2Y 8AY EC2Y
1 2 Flat 35, John Trundle Court, Barbican, London EC2Y 8DJ EC2Y
In [113]: df
Out[113]:
transaction_id house_id date_sale sale_price boolean_2015
0 1 1 31 Mar 2016 £880,000 True
3 4 2 31 Mar 2016 £450,000 True
4 5 3 31 Mar 2016 £680,000 True
6 7 4 31 Mar 2016 £1,850,000 True
7 8 5 31 Mar 2016 £420,000 True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.