![](/img/trans.png)
[英]Creating a new column in panda dataframe using logical indexing and group by
[英]Copy rows into python dataframe using column logical indexing
各位程序員,大家好
我正在嘗試使用邏輯索引將行復制到數據幀中,但我得到了 nans。 這個想法是使用邏輯索引快速替換數據框中的許多行:
例如我期望它如何工作:
# pseudo code: example to create logical vector
logical_vector = df.loc[:,'colname']==x
# pseudo code: example to use logical vector to index a dataframe
df[logical_vector,:]=df1.loc[logical_vector2,:]
如果可能,這是允許快速矩陣運算的基本運算之一。 如何解決這個問題,最好沒有循環?
我創建了一個示例來說明問題所在:
# create example 10x5 dataframe containing random numbers
x=pd.DataFrame(np.random.randn(10,5),columns=['ac', 'bc','cc','dc',
'ec'],index = ['a','b','c','d','e','f','g','h','i','j'])
# add a column containing information for use in example of logical indexing
x['t']= [1,0,1,0,1,0,1,0,1,0]
x
Out[76]:
ac bc cc dc ec t
a -1.029517 1.936904 1.143655 0.708996 -1.218484 1
b 1.836638 -0.723243 -0.501546 -2.046355 0.248156 0
c 2.369828 0.559880 -0.878904 0.673454 -0.630927 1
d -0.629210 1.261608 -0.190508 -0.582700 0.068166 0
e 1.500134 0.534379 0.375362 0.849761 -1.675824 1
f 1.399520 0.038366 -0.137986 0.156580 -0.674619 0
g -1.359863 0.433721 -0.625973 -0.477530 -0.542612 1
h -0.694573 -0.196907 -0.372210 0.464188 -1.217399 0
i 1.357809 -0.017611 0.539137 -1.016894 0.172672 1
j 0.366195 0.750404 -0.055895 0.358795 0.181593 0
然后我嘗試使用索引替換我得到這個:
# Use logical indexes x.loc[:,'t']==0 and x.loc[:,'t']==1 to point and get
# data into x. This should replace all row values that contain '0' in column
# t with row values from columns that have '1' for column t
x.loc[x.loc[:,'t']==0,:]=x.loc[x.loc[:,'t']==1,:]
x
Out[78]:
ac bc cc dc ec t
a -1.029517 1.936904 1.143655 0.708996 -1.218484 1.0
b NaN NaN NaN NaN NaN NaN
c 2.369828 0.559880 -0.878904 0.673454 -0.630927 1.0
d NaN NaN NaN NaN NaN NaN
e 1.500134 0.534379 0.375362 0.849761 -1.675824 1.0
f NaN NaN NaN NaN NaN NaN
g -1.359863 0.433721 -0.625973 -0.477530 -0.542612 1.0
h NaN NaN NaN NaN NaN NaN
i 1.357809 -0.017611 0.539137 -1.016894 0.172672 1.0
j NaN NaN NaN NaN NaN NaN
雖然我期待這個:
Out[76]:
ac bc cc dc ec t
a -1.029517 1.936904 1.143655 0.708996 -1.218484 1
b -1.029517 1.936904 1.143655 0.708996 -1.218484 1
c 2.369828 0.559880 -0.878904 0.673454 -0.630927 1
d 2.369828 0.559880 -0.878904 0.673454 -0.630927 1
e 1.500134 0.534379 0.375362 0.849761 -1.675824 1
f 1.500134 0.534379 0.375362 0.849761 -1.675824 1
g -1.359863 0.433721 -0.625973 -0.477530 -0.542612 1
h -1.359863 0.433721 -0.625973 -0.477530 -0.542612 1
i 1.357809 -0.017611 0.539137 -1.016894 0.172672 1
j 1.357809 -0.017611 0.539137 -1.016894 0.172672 1
我錯過了什么?
好問題,問題是右手邊的索引與左手邊的不匹配。 下面是一個更簡單的例子來解決它:
df=pd.DataFrame({'a':[1,0,1,0],'b':[2,0,2,0]})
a b
0 1 2
1 0 0
2 1 2
3 0 0
df.loc[df['a']==0,:]=df.loc[df['a']==1,:].set_index(df.loc[df['a']==0,:].index)
a b
0 1 2
1 1 2
2 1 2
3 1 2
或者,如果您知道形狀相同,則可以簡單地取值:
df=pd.DataFrame({'a':[1,0,1,0],'b':[2,0,2,0]})
df.loc[df['a']==0,:]=df.loc[df['a']==1,:].values
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.