简体   繁体   English

有条件地根据另一列中的值替换值

[英]Conditionally replacing values based on values in another column

I have a very large dataframe (~1.7MM rows x 6 columns). 我有一个非常大的数据帧(~1.7MM行×6列)。 A simplified example of the relevant data is: 相关数据的简化示例如下:

City        Borough

Brooklyn    Brooklyn
Astoria     Queens
Astoria     Unspecified
Ridgewood   Unspecified
Ridgewood   Queens

So I'm trying to fill the 'Unspecified' values based on the information from the City column. 所以我试图根据City列中的信息填充'Unspecified'值。 So for example, the City Ridgewood is in an Unspecified Borough in one instance, but correctly has the Borough listed as Queens elsewhere in the dataset. 例如,City Ridgewood在一个实例中位于Unspecified Borough,但正确地将Borough列为数据集中其他地方的Queens。

I've already explored Panda's fillna, but it doesn't seem to meet my needs. 我已经探索过Panda的补充,但它似乎不符合我的需求。 I've also considered the np.where method, but I'm not sure how'd it work in this situation. 我也考虑过np.where方法,但我不确定它在这种情况下是如何工作的。 I'm pretty new to Pandas, but maybe the map/apply function are what I need? 我对Pandas很新,但是地图/应用功能可能是我需要的吗? This can probably be accomplished a thousand different ways, but looking for something that won't crawl given the size of the data. 这可能可以通过一千种不同的方式实现,但是在寻找不会因数据大小而爬行的东西。

EDIT: I was able to create a dictionary which contains the highest occurring "pairs" between cities and boroughs with the following code: 编辑:我能够创建一个字典,其中包含城市和行政区之间发生率最高的“对”,其代码如下:

specified = data[['Borough','City']][data['Borough']!= 'Unspecified']
paired = specified.Borough.groupby(specified.City).max()
paired = paired.to_dict()

The paired dict has the city as the key and the borough as the value. 配对的词典以城市为关键,自治市镇为价值。 Now the last step is to apply/map it back to the borough column...how do I do that? 现在最后一步是将其应用/映射回行政区列......我该怎么做?

Here's one way: 这是一种方式:

>>> d
         City      Borough
0   Brooklyn     Brooklyn
1    Astoria       Queens
2    Astoria  Unspecified
3  Ridgewood  Unspecified
4  Ridgewood       Queens
>>> realData = d[d.Borough != "Unspecified"]
>>> realData = pandas.Series(data=realData.Borough.values, index=realData.City)
>>> d['Borough'] = d.City.map(realData)
>>> d
         City   Borough
0   Brooklyn  Brooklyn
1    Astoria    Queens
2    Astoria    Queens
3  Ridgewood    Queens
4  Ridgewood    Queens

This assumes that every City has exactly one non-unspecified Borough value. 这假设每个城市都有一个非未指定的自治市镇价值。 (If a city has no value but Unspecified, the borough will show up as NA.) (如果一个城市没有价值但是未指定,那么该行政区将显示为NA。)

Edit: If you've already created your dict as in your edited post, just use d['Borough'] = d.City.map(paired['Borough']) to map each city to the borough from your dict. 编辑:如果你已经在你编辑的帖子中创建了你的词典,只需使用d['Borough'] = d.City.map(paired['Borough'])将每个城市从你的词典映射到自治市镇。 map is a useful method to know about. map是一种有用的方法。 It can map values either with a Pandas series, with a dict, or with a function that returns the mapped value given the key. 它可以使用Pandas系列,使用dict或使用返回给定键的映射值的函数来映射值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM