简体   繁体   English

在python pandas中设置切片中第一项的值

[英]Set value of first item in slice in python pandas

So I would like make a slice of a dataframe and then set the value of the first item in that slice without copying the dataframe. 所以我想制作一个数据帧片,然后设置该片中第一个项的值而不复制数据帧。 For example: 例如:

df = pandas.DataFrame(numpy.random.rand(3,1))
df[df[0]>0][0] = 0

The slice here is irrelevant and just for the example and will return the whole data frame again. 这里的切片是无关紧要的,仅用于示例,并将再次返回整个数据帧。 Point being, by doing it like it is in the example you get a setting with copy warning (understandably). 重点是,通过这样做,就像在示例中,您获得了带有复制警告的设置(可以理解)。 I have also tried slicing first and then using ILOC/IX/LOC and using ILOC twice, ie something like: 我还尝试先切片,然后使用ILOC / IX / LOC并使用ILOC两次,例如:

df.iloc[df[0]>0,:][0] = 0
df[df[0]>0,:].iloc[0] = 0

And neither of these work. 这些都不起作用。 Again- I don't want to make a copy of the dataframe even if it id just the sliced version. 再次 - 我不想复制数据框,即使它只是切片版本。

EDIT: It seems there are two ways, using a mask or IdxMax. 编辑:似乎有两种方法,使用掩码或IdxMax。 The IdxMax method seems to work if your index is unique, and the mask method if not. 如果索引是唯一的,IdxMax方法似乎有效,如果不是,则掩码方法。 In my case, the index is not unique which I forgot to mention in the initial post. 在我的情况下,索引不是唯一的,我在最初的帖子中忘了提到。

I think you can use idxmax for get index of first True value and then set by loc : 我想你可以使用idxmax获取第一个True值的索引,然后按loc设置:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)))
print (df)
   0
0  1
1  3
2  0
3  0
4  3

print ((df[0] == 0).idxmax())
2

df.loc[(df[0] == 0).idxmax(), 0] = 100
print (df)
     0
0    1
1    3
2  100
3    0
4    3

df.loc[(df[0] == 3).idxmax(), 0] = 200
print (df)
     0
0    1
1  200
2    0
3    0
4    3

EDIT: 编辑:

Solution with not unique index: 解决方案没有唯一索引:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)), index=[1,2,2,3,4])
print (df)
   0
1  1
2  3
2  0
3  0
4  3

df = df.reset_index()
df.loc[(df[0] == 3).idxmax(), 0] = 200
df = df.set_index('index')
df.index.name = None
print (df)
     0
1    1
2  200
2    0
3    0
4    3

EDIT1: EDIT1:

Solution with MultiIndex : MultiIndex解决方案:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)), index=[1,2,2,3,4])
print (df)
   0
1  1
2  3
2  0
3  0
4  3

df.index = [np.arange(len(df.index)), df.index]
print (df)
     0
0 1  1
1 2  3
2 2  0
3 3  0
4 4  3

df.loc[(df[0] == 3).idxmax(), 0] = 200
df = df.reset_index(level=0, drop=True)

print (df)
     0
1    1
2  200
2    0
3    0
4    3

EDIT2: EDIT2:

Solution with double cumsum : cumsum解决方案:

np.random.seed(1)
df = pd.DataFrame([4,0,4,7,4], index=[1,2,2,3,4])
print (df)
   0
1  4
2  0
2  4
3  7
4  4

mask = (df[0] == 0).cumsum().cumsum()
print (mask)
1    0
2    1
2    2
3    3
4    4
Name: 0, dtype: int32

df.loc[mask == 1, 0] = 200
print (df)
     0
1    4
2  200
2    4
3    7
4    4

Consider the dataframe df 考虑数据帧df

df = pd.DataFrame(dict(A=[1, 2, 3, 4, 5]))

print(df)

   A
0  1
1  2
2  3
3  4
4  5

Create some arbitrary slice slc 创建一些任意切片slc

slc = df[df.A > 2]

print(slc)

   A
2  3
3  4
4  5

Access the first row of slc within df by using index[0] and loc 使用index[0]loc访问df第一行slc

df.loc[slc.index[0]] = 0
print(df)

   A
0  1
1  2
2  0
3  4
4  5
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(6,1),index=[1,2,2,3,3,3])
df[1] = 0
df.columns=['a','b']
df['b'][df['a']>=0.5]=1
df=df.sort(['b','a'],ascending=[0,1])
df.loc[df[df['b']==0].index.tolist()[0],'a']=0

In this method extra copy of the dataframe is not created but an extra column is introduced which can be dropped after processing. 在此方法中,不会创建数据帧的额外副本,但会引入一个额外的列,可以在处理后删除。 To choose any index instead o the first one you can change the last line as follows 要选择任何索引而不是第一个索引,您可以更改最后一行,如下所示

df.loc[df[df['b']==0].index.tolist()[n],'a']=0

to change any nth item in a slice 更改切片中的任何第n个项目

df DF

          a  
1  0.111089  
2  0.255633  
2  0.332682  
3  0.434527  
3  0.730548  
3  0.844724  

df after slicing and labelling them df切片并标记后

          a  b
1  0.111089  0
2  0.255633  0
2  0.332682  0
3  0.434527  0
3  0.730548  1
3  0.844724  1

After changing value of first item in slice (labelled as 0) to 0 将切片中第一项的值(标记为0)更改为0后

          a  b
3  0.730548  1
3  0.844724  1
1  0.000000  0
2  0.255633  0
2  0.332682  0
3  0.434527  0

So using some of the answers I managed to find a one liner way to do this: 因此,使用一些答案,我设法找到一个单行方式来做到这一点:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)))
print df
   0
0  1
1  3
2  0
3  0
4  3
df.loc[(df[0] == 0).cumsum()==1,0] = 1
   0
0  1
1  3
2  1
3  0
4  3

Essentially this is using the mask inline with a cumsum. 基本上这是使用与cumsum内联的面具。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将值设置为Pandas数据框的切片 - Set value to slice of a Pandas dataframe python pandas:尝试在DataFrame的切片副本上设置一个值 - python pandas: A value is trying to be set on a copy of a slice from a DataFrame Python Pandas 警告:试图在 DataFrame 的切片副本上设置值 - Python Pandas Warning: A value is trying to be set on a copy of a slice from a DataFrame 在Pandas中为组切片设置值的最快方法 - Fastest way to set value for group slice in Pandas Pandas Python:切片/转换URL以获取项目和项目计数 - Pandas Python: Slice / Transform URL to get item and item counts Python Pandas:插入字数 - 错误试图在数据帧的切片副本上设置值 - Python Pandas: Insert Count of words - Error A value is trying to be set on a copy of a slice from a DataFrame Pandas DataFrame:SettingWithCopyWarning:尝试在DataFrame的切片副本上设置值 - Pandas DataFrame: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame pandas 值正在尝试在 DataFrame 的切片副本上设置 - pandas value is trying to be set on a copy of a slice from a DataFrame pandas 错误:正在尝试在 DataFrame 中的切片副本上设置值 - pandas error: value is trying to be set on a copy of a slice from a DataFrame 试图在 DataFrame 的切片副本上设置一个值。- pandas - A value is trying to be set on a copy of a slice from a DataFrame. - pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM