简体   繁体   English

熊猫:使用基于其他列值的函数有条件地填充列

[英]Pandas: Conditionally fill column using a function based on other columns values

I have a Pandas DataFrame that contains two sets of coordinates (lat1, lon1, lat2, lon2). 我有一个Pandas DataFrame,其中包含两组坐标(lat1,lon1,lat2,lon2)。 I have a function that computes distance using these coordinates. 我有一个使用这些坐标计算距离的函数。 But some of the rows in the dataframe are invalid. 但是数据框中的某些行无效。 I would like to apply my function only to valid rows and save the result of the function to a 'dist' column (the column already exists in the dataframe). 我只想将我的函数应用于有效行,并将函数结果保存到“ dist”列(该列已存在于数据框中)。 I want something like this SQL: 我想要这样的SQL:

UPDATE dataframe
SET dist=calculate_dist(lat1, lon1, lat2, lon2)
WHERE lat1 IS NOT NULL AND lat2 IS NOT NULL AND user_id>100;

How can I achieve this? 我该如何实现?

I tried using df = df.apply(calculate_dist, axis=1) but with this approach I need to process all rows, not only the rows that match my conditions, and I need to have an if statement inside the calculate_dist function that ignores invalid rows. 我尝试使用df = df.apply(calculate_dist, axis=1)但是通过这种方法,我需要处理所有行,不仅是要匹配我的条件的行,而且还需要在calculate_dist函数内部有一个if语句,该语句将忽略无效行。 Is there a better way? 有没有更好的办法?

I know that similar questions already appeared on StackOverflow but I could not find any question that utilizes both a function and conditional selection of rows. 我知道类似的问题已经出现在StackOverflow上,但是我找不到任何同时利用函数和条件选择行的问题。

I think you need filter by boolean indexing first: 我认为您首先需要通过boolean indexing过滤:

mask = (df.lat1.notnull()) & (df.lat2.notnull()) & (df.user_id>100)

df['dist'] = df[mask].apply(calculate_dist, axis=1)

Sample: 样品:

df = pd.DataFrame({'lat1':[1,2,np.nan,1],
                   'lon1':[4,5,6,2],
                   'lat2':[7,np.nan,9,3],
                   'lon2':[1,3,5,1],
                   'user_id':[200,30,60,50]})

print (df)
   lat1  lat2  lon1  lon2  user_id
0   1.0   7.0     4     1      200
1   2.0   NaN     5     3       30
2   NaN   9.0     6     5       60
3   1.0   3.0     2     1       50

#function returning Series
def calculate_dist(x):
    return x.lat2 - x.lat1

mask = (df.lat1.notnull()) & (df.lat2.notnull()) & (df.user_id>100)
df['dist'] = df[mask].apply(calculate_dist, axis=1)
print (df)
   lat1  lat2  lon1  lon2  user_id  dist
0   1.0   7.0     4     1      200   6.0
1   2.0   NaN     5     3       30   NaN
2   NaN   9.0     6     5       60   NaN
3   1.0   3.0     2     1       50   NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据熊猫df中其他列的值有条件地填充列 - Conditionally fill column based off values in other columns in a pandas df 根据其他列 pandas 中的值填入列 - Fill in column based on values in other columns pandas 根据pandas中的另一个列值有条件地填充列值 - Conditionally fill column values based on another columns value in pandas 如何在不使用where函数的情况下基于pandas DataFrame下的其他列有条件地选择列? - How to conditionally select column based on other columns under pandas DataFrame without using where function? 根据其他列的值填充熊猫列的简便方法 - Easy way to fill up pandas column based on values of other columns 使用 Pandas 根据其他三列中类别级别值的条件填充第四列 - Using Pandas fill a fourth column based on conditions on category level values in three other columns 在 Python 中使用 pandas 或 numpy 根据其他三列中的值填充第四列 - Using pandas or numpy in Python fill a fourth column based on values in other three columns 我需要根据其他列中的值有条件地在 dataframe 中填充一个新列 - I need to conditionally fill a new column in a dataframe based on the values in other columns 如何根据另一列中的单元格值有条件地填充 Pandas 列 - How to Conditionally Fill Pandas Column based on Cell Values in another column 使用熊猫基于其他两列中的值替换列中的值 - Replace values in column based on values in two other columns using pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM