简体   繁体   中英

Python Pandas Linear Interpolate Y over X

I'm trying to answer this Udacity question: https://www.udacity.com/course/viewer#!/c-st101/l-48696651/e-48532778/m-48635592

I like Python & Pandas so I'm using Pandas (version 0.14)

I have this DataFrame df=

pd.DataFrame(dict(size=(1400,
                        2400,
                        1800,
                        1900,
                        1300,
                        1100), 
                   cost=(112000,
                         192000,
                         144000,
                         152000,
                         104000,
                         88000)))

I added this value of 2100 square foot to my data frame (notice there is no cost; that is the question; what would you expect to pay for a house of 2,100 sq ft)

 df.append(pd.DataFrame({'size':(2100,)}), True)

The question wants you to answer what cost/price you expect to pay , using linear interpolation.

Can Pandas interpolate? And how?

I tried this:

df.interpolate(method='linear')

But it gave me a cost of 88,000 ; just the last cost value repeated

I tried this:

df.sort('size').interpolate(method='linear')

But it gave me a cost of 172,000 ; just halfway between the costs of 152,000 and 192,000 Closer, but not what I want. The correct answer is 168,000 (because there is a "slope" of $80/sqft)

EDIT:

I checked these SO questions

Pandas' method='linear' interpolation will do what I call "1D" interpolation

If you want to interpolate a "dependent" variable over an "independent" variable, make the "independent" variable; ie the Index of a Series, and use the method='index' (or method='values' , they're the same)

In other words:

pd.Series(index=df.size, data=df.cost.values) #Make size the independent variable
    # SEE ANSWER BELOW; order() method is deprecated; use sort_values() instead
    .order() #Orders by the index, which is size in sq ft; interpolation depends on order (see OP)
    .interpolate(method='index')[2100] #Interpolate using method 'index'

This returns the correct answer 168,000

This is not clear to me from the example in Pandas Documentation , where the Series' data and index are the same list of values.

with my version of Pandas (0.19.2) index=df.size breaks unlucky choice of words -- things is size of the table ... so this works

df=df.append(pd.DataFrame({'size':(2100,)}), True)
pd.Series(index=df['size'].values, 
data=df['cost'].values).order().interpolate(method='index')[2100]

=168000.0

In my version of Pandas (1.1.1), order() is deprecated. you should use sort_values() instead. This does the job:

df = df.append(pd.DataFrame({'size':(2100,)}), True) 
pd.Series(index=df['size'].values, 
data=df['size'].values).sort_values().interpolate(method='index')[2100]

=168000.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM