I'm trying to answer this Udacity question: https://www.udacity.com/course/viewer#!/c-st101/l-48696651/e-48532778/m-48635592
I like Python & Pandas so I'm using Pandas (version 0.14)
I have this DataFrame df=
pd.DataFrame(dict(size=(1400,
2400,
1800,
1900,
1300,
1100),
cost=(112000,
192000,
144000,
152000,
104000,
88000)))
I added this value of 2100 square foot to my data frame (notice there is no cost; that is the question; what would you expect to pay for a house of 2,100 sq ft)
df.append(pd.DataFrame({'size':(2100,)}), True)
The question wants you to answer what cost/price you expect to pay , using linear interpolation.
Can Pandas interpolate? And how?
I tried this:
df.interpolate(method='linear')
But it gave me a cost of 88,000 ; just the last cost value repeated
I tried this:
df.sort('size').interpolate(method='linear')
But it gave me a cost of 172,000 ; just halfway between the costs of 152,000 and 192,000 Closer, but not what I want. The correct answer is 168,000 (because there is a "slope" of $80/sqft)
EDIT:
I checked these SO questions
quantities
library.Pandas' method='linear'
interpolation will do what I call "1D" interpolation
If you want to interpolate a "dependent" variable over an "independent" variable, make the "independent" variable; ie the Index of a Series, and use the method='index'
(or method='values'
, they're the same)
In other words:
pd.Series(index=df.size, data=df.cost.values) #Make size the independent variable
# SEE ANSWER BELOW; order() method is deprecated; use sort_values() instead
.order() #Orders by the index, which is size in sq ft; interpolation depends on order (see OP)
.interpolate(method='index')[2100] #Interpolate using method 'index'
This returns the correct answer 168,000
This is not clear to me from the example in Pandas Documentation , where the Series' data
and index
are the same list of values.
with my version of Pandas (0.19.2) index=df.size breaks unlucky choice of words -- things is size of the table ... so this works
df=df.append(pd.DataFrame({'size':(2100,)}), True)
pd.Series(index=df['size'].values,
data=df['cost'].values).order().interpolate(method='index')[2100]
=168000.0
In my version of Pandas (1.1.1), order()
is deprecated. you should use sort_values()
instead. This does the job:
df = df.append(pd.DataFrame({'size':(2100,)}), True)
pd.Series(index=df['size'].values,
data=df['size'].values).sort_values().interpolate(method='index')[2100]
=168000.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.