Below is the dataframe:
CNSSSBDVSN CNSSSBDVS1 CNMCRGNNM \
0 5941833 Kluskus 1 Cariboo
1 5949832 Iskut 6 North Coast / Cote-nord
2 5941016 Cariboo H Cariboo
3 5955040 Peace River B Northeast / Nord-est
4 5941801 Alkali Lake 1 Cariboo
CNSSSBDVS3 instagram_posts airports \
0 Indian Reserve 0 0
1 Indian Reserve 0 0
2 Regional District Electoral Area 0 0
3 Regional District Electoral Area 1 17
4 Indian Reserve 0 0
railway_stations accommodations visitor_centers festivals \
0 0 0 0 0
1 0 0 0 0
2 0 5 0 0
3 11 0 0 0
4 0 0 0 0
ports_and_ferry_terminals attractions
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
Below are the code. before you read it, two points I would like to mention: 1. I believe something went wrong with the residual or indexing 2. CNSSSBDVSN can be used as indices if needed
# -*- coding: utf-8 -*-
import pandas as pd
import statsmodels.formula.api as sm
import matplotlib.pyplot as plt
import scipy.stats as stats
from tabulate import tabulate
if __name__ == "__main__":
# Read data
census_subdivision_without_lower_mainland_and_van_island = pd.read_csv('../data/augmented/census_subdivision_without_lower_mainland_and_van_island.csv')
# Select data
cities = census_subdivision_without_lower_mainland_and_van_island[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] == 'City']
non_cities = census_subdivision_without_lower_mainland_and_van_island[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] != 'City']
# Fit
fit_cities = sm.ols(formula="instagram_posts ~ airports + railway_stations + ports_and_ferry_terminals + accommodations + visitor_centers + festivals + attractions", data=cities).fit()
fit_non_cities = sm.ols(formula="instagram_posts ~ airports + railway_stations + ports_and_ferry_terminals + accommodations + visitor_centers + festivals + attractions", data=non_cities).fit()
print(fit_cities.summary())
print(fit_non_cities.summary())
# Residual
cities['residual'] = fit_cities.resid
non_cities['residual'] = fit_non_cities.resid
gives error:
/Users/Chu/Documents/dssg/done/linear_model_cities.py:27: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
cities['residual'] = fit_cities.resid
/Users/Chu/Documents/dssg/done/linear_model_cities.py:28: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
non_cities['residual'] = fit_non_cities.resid
Your issue is that cities is a slice of census_subdivision_without_lower_mainland_and_van_island if you want to use cities as its own dataframe from here on you can just create a copy with:
cities = census_subdivision_without_lower_mainland_and_van_island[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] == 'City'].copy()
Alternatively if you wish to modify the original dataframe you can insert the results using loc as the error mentioned:
census_subdivision_without_lower_mainland_and_van_island.loc[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] == 'City','residuals'] = fit_cities.resid
And similarly for the non-cities. As an fyi, I'd use shorter dataframe names in order to keep your code readable and within recommended python line limits
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.