[英]Set value of first item in slice in python pandas
So I would like make a slice of a dataframe and then set the value of the first item in that slice without copying the dataframe. 所以我想制作一个数据帧片,然后设置该片中第一个项的值而不复制数据帧。 For example: 例如:
df = pandas.DataFrame(numpy.random.rand(3,1))
df[df[0]>0][0] = 0
The slice here is irrelevant and just for the example and will return the whole data frame again. 这里的切片是无关紧要的,仅用于示例,并将再次返回整个数据帧。 Point being, by doing it like it is in the example you get a setting with copy warning (understandably). 重点是,通过这样做,就像在示例中,您获得了带有复制警告的设置(可以理解)。 I have also tried slicing first and then using ILOC/IX/LOC and using ILOC twice, ie something like: 我还尝试先切片,然后使用ILOC / IX / LOC并使用ILOC两次,例如:
df.iloc[df[0]>0,:][0] = 0
df[df[0]>0,:].iloc[0] = 0
And neither of these work. 这些都不起作用。 Again- I don't want to make a copy of the dataframe even if it id just the sliced version. 再次 - 我不想复制数据框,即使它只是切片版本。
EDIT: It seems there are two ways, using a mask or IdxMax. 编辑:似乎有两种方法,使用掩码或IdxMax。 The IdxMax method seems to work if your index is unique, and the mask method if not. 如果索引是唯一的,IdxMax方法似乎有效,如果不是,则掩码方法。 In my case, the index is not unique which I forgot to mention in the initial post. 在我的情况下,索引不是唯一的,我在最初的帖子中忘了提到。
I think you can use idxmax
for get index of first True
value and then set by loc
: 我想你可以使用idxmax
获取第一个True
值的索引,然后按loc
设置:
np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)))
print (df)
0
0 1
1 3
2 0
3 0
4 3
print ((df[0] == 0).idxmax())
2
df.loc[(df[0] == 0).idxmax(), 0] = 100
print (df)
0
0 1
1 3
2 100
3 0
4 3
df.loc[(df[0] == 3).idxmax(), 0] = 200
print (df)
0
0 1
1 200
2 0
3 0
4 3
EDIT: 编辑:
Solution with not unique index: 解决方案没有唯一索引:
np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)), index=[1,2,2,3,4])
print (df)
0
1 1
2 3
2 0
3 0
4 3
df = df.reset_index()
df.loc[(df[0] == 3).idxmax(), 0] = 200
df = df.set_index('index')
df.index.name = None
print (df)
0
1 1
2 200
2 0
3 0
4 3
EDIT1: EDIT1:
Solution with MultiIndex
: MultiIndex
解决方案:
np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)), index=[1,2,2,3,4])
print (df)
0
1 1
2 3
2 0
3 0
4 3
df.index = [np.arange(len(df.index)), df.index]
print (df)
0
0 1 1
1 2 3
2 2 0
3 3 0
4 4 3
df.loc[(df[0] == 3).idxmax(), 0] = 200
df = df.reset_index(level=0, drop=True)
print (df)
0
1 1
2 200
2 0
3 0
4 3
EDIT2: EDIT2:
Solution with double cumsum
: 双cumsum
解决方案:
np.random.seed(1)
df = pd.DataFrame([4,0,4,7,4], index=[1,2,2,3,4])
print (df)
0
1 4
2 0
2 4
3 7
4 4
mask = (df[0] == 0).cumsum().cumsum()
print (mask)
1 0
2 1
2 2
3 3
4 4
Name: 0, dtype: int32
df.loc[mask == 1, 0] = 200
print (df)
0
1 4
2 200
2 4
3 7
4 4
Consider the dataframe df
考虑数据帧df
df = pd.DataFrame(dict(A=[1, 2, 3, 4, 5]))
print(df)
A
0 1
1 2
2 3
3 4
4 5
Create some arbitrary slice slc
创建一些任意切片slc
slc = df[df.A > 2]
print(slc)
A
2 3
3 4
4 5
Access the first row of slc
within df
by using index[0]
and loc
使用index[0]
和loc
访问df
第一行slc
df.loc[slc.index[0]] = 0
print(df)
A
0 1
1 2
2 0
3 4
4 5
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(6,1),index=[1,2,2,3,3,3])
df[1] = 0
df.columns=['a','b']
df['b'][df['a']>=0.5]=1
df=df.sort(['b','a'],ascending=[0,1])
df.loc[df[df['b']==0].index.tolist()[0],'a']=0
In this method extra copy of the dataframe is not created but an extra column is introduced which can be dropped after processing. 在此方法中,不会创建数据帧的额外副本,但会引入一个额外的列,可以在处理后删除。 To choose any index instead o the first one you can change the last line as follows 要选择任何索引而不是第一个索引,您可以更改最后一行,如下所示
df.loc[df[df['b']==0].index.tolist()[n],'a']=0
to change any nth item in a slice 更改切片中的任何第n个项目
df DF
a
1 0.111089
2 0.255633
2 0.332682
3 0.434527
3 0.730548
3 0.844724
df after slicing and labelling them df切片并标记后
a b
1 0.111089 0
2 0.255633 0
2 0.332682 0
3 0.434527 0
3 0.730548 1
3 0.844724 1
After changing value of first item in slice (labelled as 0) to 0 将切片中第一项的值(标记为0)更改为0后
a b
3 0.730548 1
3 0.844724 1
1 0.000000 0
2 0.255633 0
2 0.332682 0
3 0.434527 0
So using some of the answers I managed to find a one liner way to do this: 因此,使用一些答案,我设法找到一个单行方式来做到这一点:
np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)))
print df
0
0 1
1 3
2 0
3 0
4 3
df.loc[(df[0] == 0).cumsum()==1,0] = 1
0
0 1
1 3
2 1
3 0
4 3
Essentially this is using the mask inline with a cumsum. 基本上这是使用与cumsum内联的面具。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.