使用熊猫通过csv中的另一列中的条件更新一列中的值

Question

I have a csv file where there are columns Position and Quantity(Menge). 我有一个csv文件，其中有“位置”和“数量”（Menge）列。 I have to add copy of rows where(in my case it is Menge) Quantity >1 and I did it using : 我必须添加行的副本，其中（在我的情况下是Menge）数量> 1，并且我使用以下方法完成了操作：

file_to_check = file_to_check.loc[file_to_check.index.repeat(file_to_check.Menge)].reset_index(drop=True)

this works great, copied exactly as I wanted but I additionally have to update the Positions for them. 效果很好，完全按照我的要求进行了复制，但是我还必须为他们更新职位。 For example : 例如：

Position  Menge  Product
200        3       a
200        3       a
200        3       a
400        7       b
400        7       b
400        7       b
400        7       b
400        7       b
400        7       b
400        7       b
200        4       c
200        4       c
200        4       c
200        4       c

I want it to look like this 我希望它看起来像这样

Position  Menge   Product
200.1        3       a
200.2        3       a 
200.3        3       a
400.1        7       b
400.2        7       b
400.3        7       b
400.4        7       b
400.5        7       b
400.6        7       b
400.7        7       b
200.1        4       c
200.2        4       c
200.3        4       c
200.4        4       c
.
.
.

Afterwards I know I can change the Menge(Quantity) by using : 之后，我知道我可以使用以下方法更改Menge（Quantity）：

selected.loc[:, 'Menge'] = 1

I tried using for loop and loc as: 我尝试使用for循环和loc作为：

counter=0
if selected.loc[selected.Menge >1]:
    counter=selected['Menge']
    i=1
    while counter>=1:
        selected['Pos.']+=i/10
        i+=1
        counter-=1

But I keep getting the error : 但我不断收到错误：

'ValueError: The truth value of a DataFrame is ambiguous. 'ValueError：DataFrame的真实值不明确。 Use a.empty, a.bool(), a.item(), a.any() or a.all().' 使用a.empty，a.bool（），a.item（），a.any（）或a.all（）。

I searched for the answer but nothing is really helping me. 我搜索了答案，但没有任何帮助。 Some help is needed since I am pretty new to python and pandas. 因为我对python和pandas还很陌生，所以需要一些帮助。

So I edited my question. 所以我编辑了我的问题。 I have different Products but with some of them have same Pos. 我有不同的产品，但其中一些具有相同的Pos。 . 。 How can I change for each product the Pos. 如何为每种产品更改Pos。 and not add as if all of them are one product.as shown in the table 而不是全部添加为一种产品。

Answer 1

You can do it like this: 您可以这样做：

In[75]:
df['Position'] = df['Position'] + df.groupby('Position')['Position'].rank(method='first')/10
df

Out[75]: 
   Position  Menge
0     200.1      3
1     200.2      3
2     200.3      3
3     400.1      7
4     400.2      7
5     400.3      7
6     400.4      7
7     400.5      7
8     400.6      7
9     400.7      7

So here I groupby on 'Position' and call rank with param method='first' so that equal values are ranked in order of appearance, this effectively ranks the values in order which is the same as a counter 所以在这里我对'Position' groupby ，并使用param method='first'调用rank ，以便按出现顺序对相等的值进行排名，这有效地按与计数器相同的顺序对值进行排名

Your error comes from this: 您的错误来自此：

counter=selected['Menge']

and then doing a comparison using: 然后使用以下方法进行比较：

while counter>=1:

So the error is telling you that the it doesn't understand how to interpret a Series as doing counter >= returns a boolean Series, it wants a scalar boolean value to interpret. 所以错误告诉您，它不理解如何解释Series因为执行counter >=返回布尔系列，它想要解释标量布尔值。 You'd have to iterate row-wise so you get a scalar value in order to interpret correctly, besides, you should look to avoid loops where possible as it's slow 您必须逐行进行迭代，以便获得标量值才能正确解释，此外，由于速度较慢，应尽可能避免循环

EDIT 编辑

Based on your new data, you just groupby on multiple columns: 根据您的新数据，您可以按以下几列进行groupby ：

In[81]:
df['Position'] = df['Position'] + df.groupby(['Position','Menge'])['Position'].rank(method='first')/10
df

Out[81]: 
    Position  Menge Product
0      200.1      3       a
1      200.2      3       a
2      200.3      3       a
3      400.1      7       b
4      400.2      7       b
5      400.3      7       b
6      400.4      7       b
7      400.5      7       b
8      400.6      7       b
9      400.7      7       b
10     200.1      4       c
11     200.2      4       c
12     200.3      4       c

EDIT 编辑

OK, to handle the situation where you have more than 10 occurrences so that it avoids doing 201 for instance: 好的，要处理您出现10次以上的情况，例如避免执行201 ：

In[98]:
df['Position'] = (df['Position'].astype(str) + '.' + df.groupby(['Position','Menge'])['Position'].rank(method='first').astype(int).astype(str)).astype(float)
df

Out[98]: 
    Position  Menge Product
0      200.1      3       a
1      200.2      3       a
2      200.3      3       a
3      400.1      7       b
4      400.2      7       b
5      400.3      7       b
6      400.4      7       b
7      400.5      7       b
8      400.6      7       b
9      400.7      7       b
10     200.1      4       c
11     200.2      4       c
12     200.3      4       c
13     200.4      4       c

So this converts the output from rank first to an int as it produces a float , then casts to a str so we can just create a string eg '200' + '.' + '1' 因此，这会在生成float将输出从rank首先转换为int ，然后转换为str这样我们就可以创建一个字符串，例如'200' + '.' + '1' '200' + '.' + '1' and then cast back to a float if necessary '200' + '.' + '1' ，然后根据需要转换为float

Answer 2

import pandas as pd
df = pd.DataFrame({'Position':[200,200,200,400,400,400,400,400,400,400],'Menge':[3,3,3,7,7,7,7,7,7,7]})

for pos in df.Position.unique():    
    counter = 0.1
    for idx,row in df.iterrows():        
        if row['Position'] == pos:
            df.at[idx,'Position_1'] = df.at[idx,'Position']+counter
            counter+=0.1

df.drop(['Position'],axis=1,inplace=True)   
df.columns = ['Menge','Position']

Output: 输出：

print(df)



  Menge Position
0   3      200.1
1   3      200.2
2   3      200.3
3   7      400.1
4   7      400.2
5   7      400.3
6   7      400.4
7   7      400.5
8   7      400.6
9   7      400.7

使用熊猫通过csv中的另一列中的条件更新一列中的值

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-04-11 08:17:49

解决方案2
0 2019-04-11 08:43:43

使用熊猫通过csv中的另一列中的条件更新一列中的值

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-04-11 08:17:49

解决方案2 0 2019-04-11 08:43:43

解决方案1
2 已采纳 2019-04-11 08:17:49

解决方案2
0 2019-04-11 08:43:43