简体   繁体   English

使用熊猫通过csv中的另一列中的条件更新一列中的值

[英]Update a value in one column by a condition in another in csv using pandas

I have a csv file where there are columns Position and Quantity(Menge). 我有一个csv文件,其中有“位置”和“数量”(Menge)列。 I have to add copy of rows where(in my case it is Menge) Quantity >1 and I did it using : 我必须添加行的副本,其中(在我的情况下是Menge)数量> 1,并且我使用以下方法完成了操作:

file_to_check = file_to_check.loc[file_to_check.index.repeat(file_to_check.Menge)].reset_index(drop=True)

this works great, copied exactly as I wanted but I additionally have to update the Positions for them. 效果很好,完全按照我的要求进行了复制,但是我还必须为他们更新职位。 For example : 例如 :

Position  Menge  Product
200        3       a
200        3       a
200        3       a
400        7       b
400        7       b
400        7       b
400        7       b
400        7       b
400        7       b
400        7       b
200        4       c
200        4       c
200        4       c
200        4       c

I want it to look like this 我希望它看起来像这样

Position  Menge   Product
200.1        3       a
200.2        3       a 
200.3        3       a
400.1        7       b
400.2        7       b
400.3        7       b
400.4        7       b
400.5        7       b
400.6        7       b
400.7        7       b
200.1        4       c
200.2        4       c
200.3        4       c
200.4        4       c
.
.
.

Afterwards I know I can change the Menge(Quantity) by using : 之后,我知道我可以使用以下方法更改Menge(Quantity):

selected.loc[:, 'Menge'] = 1

I tried using for loop and loc as: 我尝试使用for循环和loc作为:

counter=0
if selected.loc[selected.Menge >1]:
    counter=selected['Menge']
    i=1
    while counter>=1:
        selected['Pos.']+=i/10
        i+=1
        counter-=1

But I keep getting the error : 但我不断收到错误:

'ValueError: The truth value of a DataFrame is ambiguous. 'ValueError:DataFrame的真实值不明确。 Use a.empty, a.bool(), a.item(), a.any() or a.all().' 使用a.empty,a.bool(),a.item(),a.any()或a.all()。

I searched for the answer but nothing is really helping me. 我搜索了答案,但没有任何帮助。 Some help is needed since I am pretty new to python and pandas. 因为我对python和pandas还很陌生,所以需要一些帮助。

So I edited my question. 所以我编辑了我的问题。 I have different Products but with some of them have same Pos. 我有不同的产品,但其中一些具有相同的Pos。 . How can I change for each product the Pos. 如何为每种产品更改Pos。 and not add as if all of them are one product.as shown in the table 而不是全部添加为一种产品。

You can do it like this: 您可以这样做:

In[75]:
df['Position'] = df['Position'] + df.groupby('Position')['Position'].rank(method='first')/10
df

Out[75]: 
   Position  Menge
0     200.1      3
1     200.2      3
2     200.3      3
3     400.1      7
4     400.2      7
5     400.3      7
6     400.4      7
7     400.5      7
8     400.6      7
9     400.7      7

So here I groupby on 'Position' and call rank with param method='first' so that equal values are ranked in order of appearance, this effectively ranks the values in order which is the same as a counter 所以在这里我对'Position' groupby ,并使用param method='first'调用rank ,以便按出现顺序对相等的值进行排名,这有效地按与计数器相同的顺序对值进行排名

Your error comes from this: 您的错误来自此:

counter=selected['Menge']

and then doing a comparison using: 然后使用以下方法进行比较:

while counter>=1:

So the error is telling you that the it doesn't understand how to interpret a Series as doing counter >= returns a boolean Series, it wants a scalar boolean value to interpret. 所以错误告诉您,它不理解如何解释Series因为执行counter >=返回布尔系列,它想要解释标量布尔值。 You'd have to iterate row-wise so you get a scalar value in order to interpret correctly, besides, you should look to avoid loops where possible as it's slow 您必须逐行进行迭代,以便获得标量值才能正确解释,此外,由于速度较慢,应尽可能避免循环

EDIT 编辑

Based on your new data, you just groupby on multiple columns: 根据您的新数据,您可以按以下几列进行groupby

In[81]:
df['Position'] = df['Position'] + df.groupby(['Position','Menge'])['Position'].rank(method='first')/10
df

Out[81]: 
    Position  Menge Product
0      200.1      3       a
1      200.2      3       a
2      200.3      3       a
3      400.1      7       b
4      400.2      7       b
5      400.3      7       b
6      400.4      7       b
7      400.5      7       b
8      400.6      7       b
9      400.7      7       b
10     200.1      4       c
11     200.2      4       c
12     200.3      4       c

EDIT 编辑

OK, to handle the situation where you have more than 10 occurrences so that it avoids doing 201 for instance: 好的,要处理您出现10次以上的情况,例如避免执行201

In[98]:
df['Position'] = (df['Position'].astype(str) + '.' + df.groupby(['Position','Menge'])['Position'].rank(method='first').astype(int).astype(str)).astype(float)
df

Out[98]: 
    Position  Menge Product
0      200.1      3       a
1      200.2      3       a
2      200.3      3       a
3      400.1      7       b
4      400.2      7       b
5      400.3      7       b
6      400.4      7       b
7      400.5      7       b
8      400.6      7       b
9      400.7      7       b
10     200.1      4       c
11     200.2      4       c
12     200.3      4       c
13     200.4      4       c

So this converts the output from rank first to an int as it produces a float , then casts to a str so we can just create a string eg '200' + '.' + '1' 因此,这会在生成float将输出从rank首先转换为int ,然后转换为str这样我们就可以创建一个字符串,例如'200' + '.' + '1' '200' + '.' + '1' and then cast back to a float if necessary '200' + '.' + '1' ,然后根据需要转换为float

import pandas as pd
df = pd.DataFrame({'Position':[200,200,200,400,400,400,400,400,400,400],'Menge':[3,3,3,7,7,7,7,7,7,7]})

for pos in df.Position.unique():    
    counter = 0.1
    for idx,row in df.iterrows():        
        if row['Position'] == pos:
            df.at[idx,'Position_1'] = df.at[idx,'Position']+counter
            counter+=0.1

df.drop(['Position'],axis=1,inplace=True)   
df.columns = ['Menge','Position']

Output: 输出:

print(df)



  Menge Position
0   3      200.1
1   3      200.2
2   3      200.3
3   7      400.1
4   7      400.2
5   7      400.3
6   7      400.4
7   7      400.5
8   7      400.6
9   7      400.7

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据条件将值从一列复制到另一列(使用熊猫) - Copy value from one column to another based on condition (using pandas) Python + Pandas:基于另一个 csv 更新 csv 中的一列 - Python + Pandas : Update ONE column in csv based on another csv 使用来自另一个具有条件的数据帧的值更新熊猫数据帧列 - update pandas dataframe column with value from another dataframe with condition 使用正则表达式根据 Pandas 中另一列的值更新一列的值 - Update one column's value based on another column's value in Pandas using regular expression 需要使用另一个 csv 的值更新一个 csv 中的特定列,但整行都会更新 - Need to update particular column in one csv using value of another csv but whole row get updated 根据条件使用另一个数据帧列值更新一个数据帧值 - Update one dataframe value with another dataframe column value based on the condition 如果满足条件,pandas 将值从一列复制到另一列 - pandas copy value from one column to another if condition is met 使用 lambda 如果基于 Pandas dataframe 中另一列的值的列的条件 - Using lambda if condition to column based on value of another column in Pandas dataframe 使用Python熊猫根据条件将行值复制到另一列 - Copy a row value to another column based on condition using Python pandas Pandas:设置 dataframe 的一列的值,条件是另一个 dataframe 的另一列 - Pandas: set the value of one column of a dataframe with condition on another column of another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM