简体   繁体   中英

How to subtract two partial columns with pandas?

I'm just getting started with Pandas so I may be missing something important, but I can't seem to successfully subtract two columns I'm working with. I have a spreadsheet in excel that I imported as follows:

df = pd.read_excel('/path/to/file.xlsx',sheetname='Sheet1')

My table when doing df.head() looks similar to the following:

      a         b        c        d
0     stuff     stuff    stuff    stuff
1     stuff     stuff    stuff    stuff
2     data      data     data     data
...   ...       ...      ...      ...
89    data      data     data     data

I don't care about the "stuff;" I would like to subtract two columns of just the data and make this its own column. Therefore, it seemed obvious that I should trim off the rows I'm not interested in and work with what remains, so I have tried the following:

dataCol1 = df.ix[2:,0:1]
dataCol2 = df.ix[2:,1:2]
print(dataCol1.sub(dataCol2,axis=0))

But it results in

             a          b
2           NaN        NaN
3           NaN        NaN
4           NaN        NaN
...         ...        ...
89          NaN        NaN

I get the same result if I also simply try print(dataCol1-dataCol2) . I really don't understand how both of these subtraction operations not only result in all NaN's, but also two columns instead of just one with the end result. Because when I print(dataCol1) , for example, I do obtain the column I want to work with:

      a      
2     data     
3     data   
4     data     
...   ...
89    data      

Is there any way to both work simply and directly from an Excel spreadsheet and perform basic operations with a truncated portion of the columns of said spreadsheet? Maybe there is a better way to go about this than using df.ix and I am definitely open to those methods as well.

The problem is the misallignment of your indices.

One thing to do would be to subtract the values, so you don't have to deal with alignment issues:

dataCol1 = df.iloc[2: , 0:1]   # ix is deprecated
dataCol2 = df.iloc[2: , 1:2]

result = pd.DataFrame(dataCol1.values - dataCol2.values)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM