[英]Update a value in one column by a condition in another in csv using pandas
I have a csv file where there are columns Position and Quantity(Menge). 我有一个csv文件,其中有“位置”和“数量”(Menge)列。 I have to add copy of rows where(in my case it is Menge) Quantity >1 and I did it using :
我必须添加行的副本,其中(在我的情况下是Menge)数量> 1,并且我使用以下方法完成了操作:
file_to_check = file_to_check.loc[file_to_check.index.repeat(file_to_check.Menge)].reset_index(drop=True)
this works great, copied exactly as I wanted but I additionally have to update the Positions for them. 效果很好,完全按照我的要求进行了复制,但是我还必须为他们更新职位。 For example :
例如 :
Position Menge Product
200 3 a
200 3 a
200 3 a
400 7 b
400 7 b
400 7 b
400 7 b
400 7 b
400 7 b
400 7 b
200 4 c
200 4 c
200 4 c
200 4 c
I want it to look like this 我希望它看起来像这样
Position Menge Product
200.1 3 a
200.2 3 a
200.3 3 a
400.1 7 b
400.2 7 b
400.3 7 b
400.4 7 b
400.5 7 b
400.6 7 b
400.7 7 b
200.1 4 c
200.2 4 c
200.3 4 c
200.4 4 c
.
.
.
Afterwards I know I can change the Menge(Quantity) by using : 之后,我知道我可以使用以下方法更改Menge(Quantity):
selected.loc[:, 'Menge'] = 1
I tried using for loop and loc as: 我尝试使用for循环和loc作为:
counter=0
if selected.loc[selected.Menge >1]:
counter=selected['Menge']
i=1
while counter>=1:
selected['Pos.']+=i/10
i+=1
counter-=1
But I keep getting the error : 但我不断收到错误:
'ValueError: The truth value of a DataFrame is ambiguous.
'ValueError:DataFrame的真实值不明确。 Use a.empty, a.bool(), a.item(), a.any() or a.all().'
使用a.empty,a.bool(),a.item(),a.any()或a.all()。
I searched for the answer but nothing is really helping me. 我搜索了答案,但没有任何帮助。 Some help is needed since I am pretty new to python and pandas.
因为我对python和pandas还很陌生,所以需要一些帮助。
So I edited my question. 所以我编辑了我的问题。 I have different Products but with some of them have same Pos.
我有不同的产品,但其中一些具有相同的Pos。 .
。 How can I change for each product the Pos.
如何为每种产品更改Pos。 and not add as if all of them are one product.as shown in the table
而不是全部添加为一种产品。
You can do it like this: 您可以这样做:
In[75]:
df['Position'] = df['Position'] + df.groupby('Position')['Position'].rank(method='first')/10
df
Out[75]:
Position Menge
0 200.1 3
1 200.2 3
2 200.3 3
3 400.1 7
4 400.2 7
5 400.3 7
6 400.4 7
7 400.5 7
8 400.6 7
9 400.7 7
So here I groupby
on 'Position'
and call rank
with param method='first'
so that equal values are ranked in order of appearance, this effectively ranks the values in order which is the same as a counter 所以在这里我对
'Position'
groupby
,并使用param method='first'
调用rank
,以便按出现顺序对相等的值进行排名,这有效地按与计数器相同的顺序对值进行排名
Your error comes from this: 您的错误来自此:
counter=selected['Menge']
and then doing a comparison using: 然后使用以下方法进行比较:
while counter>=1:
So the error is telling you that the it doesn't understand how to interpret a Series
as doing counter >=
returns a boolean Series, it wants a scalar boolean value to interpret. 所以错误告诉您,它不理解如何解释
Series
因为执行counter >=
返回布尔系列,它想要解释标量布尔值。 You'd have to iterate row-wise so you get a scalar value in order to interpret correctly, besides, you should look to avoid loops where possible as it's slow 您必须逐行进行迭代,以便获得标量值才能正确解释,此外,由于速度较慢,应尽可能避免循环
EDIT 编辑
Based on your new data, you just groupby
on multiple columns: 根据您的新数据,您可以按以下几列进行
groupby
:
In[81]:
df['Position'] = df['Position'] + df.groupby(['Position','Menge'])['Position'].rank(method='first')/10
df
Out[81]:
Position Menge Product
0 200.1 3 a
1 200.2 3 a
2 200.3 3 a
3 400.1 7 b
4 400.2 7 b
5 400.3 7 b
6 400.4 7 b
7 400.5 7 b
8 400.6 7 b
9 400.7 7 b
10 200.1 4 c
11 200.2 4 c
12 200.3 4 c
EDIT 编辑
OK, to handle the situation where you have more than 10 occurrences so that it avoids doing 201
for instance: 好的,要处理您出现10次以上的情况,例如避免执行
201
:
In[98]:
df['Position'] = (df['Position'].astype(str) + '.' + df.groupby(['Position','Menge'])['Position'].rank(method='first').astype(int).astype(str)).astype(float)
df
Out[98]:
Position Menge Product
0 200.1 3 a
1 200.2 3 a
2 200.3 3 a
3 400.1 7 b
4 400.2 7 b
5 400.3 7 b
6 400.4 7 b
7 400.5 7 b
8 400.6 7 b
9 400.7 7 b
10 200.1 4 c
11 200.2 4 c
12 200.3 4 c
13 200.4 4 c
So this converts the output from rank
first to an int
as it produces a float
, then casts to a str
so we can just create a string eg '200' + '.' + '1'
因此,这会在生成
float
将输出从rank
首先转换为int
,然后转换为str
这样我们就可以创建一个字符串,例如'200' + '.' + '1'
'200' + '.' + '1'
and then cast back to a float
if necessary '200' + '.' + '1'
,然后根据需要转换为float
import pandas as pd
df = pd.DataFrame({'Position':[200,200,200,400,400,400,400,400,400,400],'Menge':[3,3,3,7,7,7,7,7,7,7]})
for pos in df.Position.unique():
counter = 0.1
for idx,row in df.iterrows():
if row['Position'] == pos:
df.at[idx,'Position_1'] = df.at[idx,'Position']+counter
counter+=0.1
df.drop(['Position'],axis=1,inplace=True)
df.columns = ['Menge','Position']
Output: 输出:
print(df)
Menge Position
0 3 200.1
1 3 200.2
2 3 200.3
3 7 400.1
4 7 400.2
5 7 400.3
6 7 400.4
7 7 400.5
8 7 400.6
9 7 400.7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.