简体   繁体   English

根据多个列中的值创建新的数据框列

[英]Create new dataframe column based on values in multiple columns

FYI, performance/speed is not important for this question. 仅供参考,性能/速度对于这个问题并不重要。

I have an existing pandas dataframe named cost_table ... 我有一个名为cost_table的现有熊猫数据cost_table ...

+----------+---------+------+-------------------------+-----------------+
| material | percent | qty  | price_control_indicator | acct_assign_cat |
+----------+---------+------+-------------------------+-----------------+
| abc111   | 1.00    |   50 | v                       | #               |
| abc222   | 0.25    | 2000 | s                       | #               |
| xyz789   | 0.45    |    0 | v                       | m               |
| def456   | 0.9     |    0 | v                       | #               |
| 123xyz   | 0.2     |    0 | v                       | m               |
| lmo888   | 0.6     |    0 | v                       | m               |
+----------+---------+------+-------------------------+-----------------+

I need to add a field cost_source based on values in multiple fields. 我需要基于多个字段中的值添加一个字段cost_source

Most answers that come up on google involve a list comprehension or a ternary operator but those only include logic based on a value in one column. 谷歌上出现的大多数答案都涉及列表理解或三元运算符,但它们仅包含基于一列中值的逻辑。 For example, 例如,

cost_table['cost_source'] = ['map' if qty > 0 else None for qty in cost_table['qty']]

This works based on a value in one column, but I don't know how to expand this to include logic in multiple columns (or if it's even possible?). 这是基于一列中的值工作的,但是我不知道如何扩展它以在多列中包含逻辑(或者是否有可能?)。 It also doesn't seem like a very readable/maintainable solution. 它似乎也不是一个易读/可维护的解决方案。

I tried using a for in loop with an if elif statement but the value in cost_table['cost_source'] remains unchanged and is None for all rows. 我尝试使用带if elif语句的for in循环,但是cost_table['cost_source']保持不变,并且对于所有行均为None But if I print each individual row within my loop then row['cost_source'] has the desired value. 但是,如果我在循环中打印每一行,则row['cost_source']具有所需的值。

d = {
  'material': ['abc111', 'abc222', 'xyz789', 'def456', '123xyz', 'lmo888'],
  'percent': [1, .25, .45, .9, .2, .6],
  'qty': [50, 2000, 0, 0, 0, 0],
  'price_control_indicator': ['v', 's','v', 'v', 'v', 'v'],
  'acct_assign_cat': ['#', '#', 'm', '#', 'm', 'm']
}

cost_table = pd.DataFrame(data=d)

cost_table['cost_source'] = None

for index, row in cost_table.iterrows():
  if (row['qty'] > 0) or (row['price_control_indicator'] == "s") or (row['acct_assign_cat'] == "#"):
    row['cost_source'] = "map"
  elif (row['percent'] >= 40) and (row['acct_assign_cat'] == "m"):
    row['cost_source'] = "vendor"
  else:
    row['cost_source'] = None

  print(row['cost_source']) # outputs map, vendor, or None as expected

print(cost_table)

Which outputs ... 哪个输出...

+----------+---------+------+-------------------------+-----------------+-------------+
| material | percent | qty  | price_control_indicator | acct_assign_cat | cost_source |
+----------+---------+------+-------------------------+-----------------+-------------+
| abc111   | 1.00    |   50 | v                       | #               | None        |
| abc222   | 0.25    | 2000 | s                       | #               | None        |
| xyz789   | 0.45    |    0 | v                       | m               | None        |
| def456   | 0.9     |    0 | v                       | #               | None        |
| 123xyz   | 0.2     |    0 | v                       | m               | None        |
| lmo888   | 0.6     |    0 | v                       | m               | None        |
+----------+---------+------+-------------------------+-----------------+-------------+

And this is my desired result ... 这是我想要的结果...

+----------+---------+------+-------------------------+-----------------+-------------+
| material | percent | qty  | price_control_indicator | acct_assign_cat | cost_source |
+----------+---------+------+-------------------------+-----------------+-------------+
| abc111   | 1.00    |   50 | v                       | #               | map         |
| abc222   | 0.25    | 2000 | s                       | #               | map         |
| xyz789   | 0.45    |    0 | v                       | m               | vendor      |
| def456   | 0.9     |    0 | v                       | #               | map         |
| 123xyz   | 0.2     |    0 | v                       | m               | None        |
| lmo888   | 0.6     |    0 | v                       | m               | vendor      |
+----------+---------+------+-------------------------+-----------------+-------------+

As @bazinga stated, use df.apply(lambda x: fun(x) , but with parameter axis=1 , so the lambda function is applied to row by row (default is column by column). 如@bazinga所述,请使用df.apply(lambda x: fun(x) ,但参数axis=1 ,因此lambda函数将逐行应用(默认为逐列)。

d = {
  'material': ['abc111', 'abc222', 'xyz789', 'def456', '123xyz', 'lmo888'],
  'percent': [100, 25, 45, 90, 20, 60],
  'qty': [50, 2000, 0, 0, 0, 0],
  'price_control_indicator': ['v', 's','v', 'v', 'v', 'v'],
  'acct_assign_cat': ['#', '#', 'm', '#', 'm', 'm']
}

cost_table = pd.DataFrame(data=d)

def process_row(row):
    if (row['qty'] > 0) or (row['price_control_indicator'] == "s") or (row['acct_assign_cat'] == "#"):
        return "map"
    elif (row['percent'] >= 40) and (row['acct_assign_cat'] == "m"):
        return "vendor"
    else:
        return None

cost_table['cost_source'] = cost_table.apply(lambda row: process_row(row), axis=1)

print(cost_table)

(I also corrected an inconsistency: in the data procents should be probably multiplied by 100) (我还纠正了一个不一致的地方:数据中的procents可能应乘以100)

If you wish to use np.select 如果您想使用np.select

cond1 = cost_table.qty.gt(0) | cost_table.price_control_indicator.eq('s') | cost_table.acct_assign_cat.eq('#')
cond2 = cost_table.percent.ge(0.4) & cost_table.acct_assign_cat.eq('m')
cost_table['cost_source'] = np.select([cond1, cond2], ['map', 'vendor'], default='None')
print(cost_table)

  material  percent   qty price_control_indicator acct_assign_cat cost_source
0   abc111     1.00    50                       v               #         map
1   abc222     0.25  2000                       s               #         map
2   xyz789     0.45     0                       v               m      vendor
3   def456     0.90     0                       v               #         map
4   123xyz     0.20     0                       v               m        None
5   lmo888     0.60     0                       v               m      vendor

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据来自其他列的值使用将 function 应用于多个列,在 dataframe 中创建新列 - Create new column into dataframe based on values from other columns using apply function onto multiple columns 如何根据其他列的值在数据框中创建新列? - How to create a new column in a dataframe based off values of other columns? 根据两个不同列中的各自值在 DataFrame 中创建新列 - Create new column in DataFrame based on respective values in two different columns 根据其他列中的“NaN”值在 Pandas Dataframe 中创建一个新列 - Create a new column in Pandas Dataframe based on the 'NaN' values in other columns 根据多列中的行值创建新的数据框列 - Creating a new dataframe column based on row values from multiple columns 根据多个其他列的条件新建 Python DataFrame 列 - Create new Python DataFrame column based on conditions of multiple other columns 根据Pandas DataFrame中单个列中的值创建多个列 - Create multiple columns based on values in single column in Pandas DataFrame 如何根据多列值创建 DataFrame 列 - How to create a DataFrame column based on multiple columns values 根据多列中的值和相同条件在熊猫中创建新列 - Create a new column in pandas based on values in multiple columns and the same condition 基于另一个多个数据框列的新列 - New column based on another multiple dataframe columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM