Pandas DataFrame-通过比较创建新列

Question

I'm trying to create a columns called 'city_code' with values from the 'code' column. 我正在尝试使用“代码”列中的值创建一个名为“ city_code”的列。 But in order to do this I need to compare if 'ds_city' and 'city' values are equal. 但是为了做到这一点，我需要比较“ ds_city”和“ city”的值是否相等。

Here is a table sample: 这是一个表示例：

https://i.imgur.com/093GJF1.png https://i.imgur.com/093GJF1.png

I've tried this: 我已经试过了：

def find_code(data):
    if data['ds_city'] == data['city'] :
        return data['code']
    else:
        return 'UNKNOWN'

df['code_city'] = df.apply(find_code, axis=1)

But since there are duplicates in the 'ds_city' columns that's the result: 但是，由于在“ ds_city”列中存在重复项，因此结果是：

https://i.imgur.com/geHyVUA.png https://i.imgur.com/geHyVUA.png

Here is a image of the expected result: 这是预期结果的图像：

https://i.imgur.com/HqxMJ5z.png https://i.imgur.com/HqxMJ5z.png

How can I work around this? 我该如何解决？

Answer 1

You can use pandas merge: 您可以使用熊猫合并：

df = pd.merge(df, df[['code', 'city']], how='left', 
              left_on='ds_city', right_on='city', 
              suffixes=('', '_right')).drop(columns='city_right')

# output:
#   code    city        ds_city     code_right
# 0 1500107 ABAETETUBA  ABAETETUBA  1500107
# 1 2900207 ABARE       ABAETETUBA  1500107
# 2 2100055 ACAILANDIA  ABAETETUBA  1500107
# 3 2300309 ACOPIARA    ABAETETUBA  1500107
# 4 5200134 ACREUNA     ABARE       2900207

Here's pandas.merge's documentation . 这是pandas.merge的文档。 It takes the input dataframe and left joins itself's code and city columns when ds_city equals city . 它使用输入数据帧，并在ds_city等于city时将其自身的code和city列连接起来。

The above code will fill code_right when city is not found with nan . 上面的代码将填补code_right当city没有与发现nan 。 You can further do the following to fill it with 'UNKNOWN': 您可以进一步执行以下操作以将其填充为“未知”：

df['code_right'] = df['code_right'].fillna('UNKNOWN')

Answer 2

This is more like np.where 这更像是np.where

import numpy as np 

df['code_city'] = np.where(data['ds_city'] == data['city'],data['code'],'UNKNOWN')

Answer 3

You could try this out: 您可以尝试一下：

# Begin with a column of only 'UNKNOWN' values.
data['code_city'] = "UNKNOWN"
# Iterate through the cities in the ds_city column.
for i, lookup_city in enumerate(data['ds_city']):
  # Note the row which contains the corresponding city name in the city column.
  row = data['city'].tolist().index(lookup_city)
  # Reassign the current row's code_city column to that code from the row we found in the last step.
  data['code_city'][i] = data['code'][row]

Pandas DataFrame-通过比较创建新列

问题描述

3 个解决方案

解决方案1
2 已采纳 2019-04-04 01:13:11

解决方案2
0 2019-04-04 01:02:28

解决方案3
0 2019-04-04 01:18:16

Pandas DataFrame-通过比较创建新列

问题描述

3 个解决方案

解决方案1 2 已采纳 2019-04-04 01:13:11

解决方案2 0 2019-04-04 01:02:28

解决方案3 0 2019-04-04 01:18:16

解决方案1
2 已采纳 2019-04-04 01:13:11

解决方案2
0 2019-04-04 01:02:28

解决方案3
0 2019-04-04 01:18:16