简体   繁体   中英

Pandas: How to add columns to a superset from a subset of the superset?

In these codes, In [15]: df4['t']=2 didn't apply to df3 . This is not what I want. I want adding columns action be applyed to df3 too, instead of just be applied to df4 .(However df4['t']=2 didn't add a column named t but add a row, which confused me)

In Addition, I noticed that it hinted A value is trying to be set on a copy of a slice from a DataFrame .

Any idea to solve this problem?

In [6]: df2 =pandas. DataFrame(np.random.randn(10, 5))

In [7]: df2
Out[7]:
          0         1         2         3         4
0  0.222512 -0.907183  0.516238 -1.307885  1.604694
1 -0.648315  0.024165  0.487837 -0.374203 -0.193131
2  0.961563  1.847492 -1.773695 -0.791906 -0.458998
3  0.550847  2.221003  0.197836 -1.260352  0.794854
4 -0.211655  0.555512  0.832657 -0.703831 -0.586403
5 -0.384389  1.622995 -0.858065 -0.455278 -1.354076
6 -0.331782  1.256876 -1.080412  1.425681  0.017413
7 -1.008093  0.914414  2.023874 -0.004319  0.733349
8 -0.038734 -0.771304 -0.644371 -0.492886  2.111187
9 -2.812306 -1.434702 -0.074720  1.413066 -0.160265

In [8]: df3=df2

In [9]: df3
Out[9]:
          0         1         2         3         4
0  0.222512 -0.907183  0.516238 -1.307885  1.604694
1 -0.648315  0.024165  0.487837 -0.374203 -0.193131
2  0.961563  1.847492 -1.773695 -0.791906 -0.458998
3  0.550847  2.221003  0.197836 -1.260352  0.794854
4 -0.211655  0.555512  0.832657 -0.703831 -0.586403
5 -0.384389  1.622995 -0.858065 -0.455278 -1.354076
6 -0.331782  1.256876 -1.080412  1.425681  0.017413
7 -1.008093  0.914414  2.023874 -0.004319  0.733349
8 -0.038734 -0.771304 -0.644371 -0.492886  2.111187
9 -2.812306 -1.434702 -0.074720  1.413066 -0.160265

In [10]: df3['d']=1

In [11]: df3
Out[11]:
          0         1         2         3         4  d
0  0.222512 -0.907183  0.516238 -1.307885  1.604694  1
1 -0.648315  0.024165  0.487837 -0.374203 -0.193131  1
2  0.961563  1.847492 -1.773695 -0.791906 -0.458998  1
3  0.550847  2.221003  0.197836 -1.260352  0.794854  1
4 -0.211655  0.555512  0.832657 -0.703831 -0.586403  1
5 -0.384389  1.622995 -0.858065 -0.455278 -1.354076  1
6 -0.331782  1.256876 -1.080412  1.425681  0.017413  1
7 -1.008093  0.914414  2.023874 -0.004319  0.733349  1
8 -0.038734 -0.771304 -0.644371 -0.492886  2.111187  1
9 -2.812306 -1.434702 -0.074720  1.413066 -0.160265  1

In [12]: df2
Out[12]:
          0         1         2         3         4  d
0  0.222512 -0.907183  0.516238 -1.307885  1.604694  1
1 -0.648315  0.024165  0.487837 -0.374203 -0.193131  1
2  0.961563  1.847492 -1.773695 -0.791906 -0.458998  1
3  0.550847  2.221003  0.197836 -1.260352  0.794854  1
4 -0.211655  0.555512  0.832657 -0.703831 -0.586403  1
5 -0.384389  1.622995 -0.858065 -0.455278 -1.354076  1
6 -0.331782  1.256876 -1.080412  1.425681  0.017413  1
7 -1.008093  0.914414  2.023874 -0.004319  0.733349  1
8 -0.038734 -0.771304 -0.644371 -0.492886  2.111187  1
9 -2.812306 -1.434702 -0.074720  1.413066 -0.160265  1

In [13]: df4=df3.loc[:,'d']

In [14]: df4
Out[14]:
0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    1
8    1
9    1
Name: d, dtype: int64

In [15]: df4['t']=2
C:\Users\jiahao\AppData\Local\Programs\Python\Python35\Scripts\ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [16]: df4
Out[16]:
0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    1
8    1
9    1
t    2
Name: d, dtype: int64

In [17]: df3
Out[17]:
          0         1         2         3         4  d
0  0.222512 -0.907183  0.516238 -1.307885  1.604694  1
1 -0.648315  0.024165  0.487837 -0.374203 -0.193131  1
2  0.961563  1.847492 -1.773695 -0.791906 -0.458998  1
3  0.550847  2.221003  0.197836 -1.260352  0.794854  1
4 -0.211655  0.555512  0.832657 -0.703831 -0.586403  1
5 -0.384389  1.622995 -0.858065 -0.455278 -1.354076  1
6 -0.331782  1.256876 -1.080412  1.425681  0.017413  1
7 -1.008093  0.914414  2.023874 -0.004319  0.733349  1
8 -0.038734 -0.771304 -0.644371 -0.492886  2.111187  1
9 -2.812306 -1.434702 -0.074720  1.413066 -0.160265  1

In [18]:

There are a couple misunderstandings here. The statement df4=df3.loc[:,'d'] returns a Series and not a DataFrame . So df4 is now a Series. Series don't have columns. They have values referenced by an index. The brackets operator attempts to find the index of your Series. Your following statement df4['t'] = 2 adds the index t to the Series and assigns it the value 2.

It is possible to have df4 stay a DataFrame by using a list of column names sent to .iloc like this: df4=df3.loc[:,['d']] . df4 would now be a DataFrame and running command df4['t'] = 2 would now append a column to df4.

You are getting the setwithcopy warning it appears that the statement df4=df3.loc[:,'d'] may not make a new copy of the column d and so df4 may still reference it. However, df4=df3.loc[:,['d']] appears to be completely independent DataFrame and adding a column to it will not create the warning and also not modify d3 which will have to be done with an additional line of code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM