I'm a beginner to python and trying to set instance of dataframe with only a subset of columns (slicing?) and have two methods where I think both should work but only one seems to work and trying to understand why. Method1 works but method2 returns an error KeyError: ('Name', 'Cost') method1:
import pandas as pd
purchase_1 = pd.Series({'Name': 'Chris',
'Item Purchased': 'Dog Food',
'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
'Item Purchased': 'Kitty Litter',
'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
'Item Purchased': 'Bird Seed',
'Cost': 5.00})
df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])
columns_to_keep = ['Name','Cost']
df = df[columns_to_keep]
method 2:
import pandas as pd
purchase_1 = pd.Series({'Name': 'Chris',
'Item Purchased': 'Dog Food',
'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
'Item Purchased': 'Kitty Litter',
'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
'Item Purchased': 'Bird Seed',
'Cost': 5.00})
df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])
columns_to_keep = ['Name','Cost']
df = df['Name','Cost']
As far as I can see, both seem to set the instance df with list of columns. Would like to understand why method2 doesn't work?
That's how the advanced index slicing in numpy/pandas works.
Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool)
Note that in Method 2 df = df['Name','Cost']
is the same as df = df[('Name','Cost')]
- which implies using a tuple as the selection object; referred to as basic indexing.
In Python,
x[(exp1, exp2, ..., expN)]
is equivalent tox[exp1, exp2, ..., expN]
; the latter is just syntactic sugar for the former.
You need to put the columns in an array or list (as in your method 1) not a tuple to trigger the advanced indexing that will select items from multiple columns at a go:
>>> df = df[['Name','Cost']] # also df[np.array(['Name','Cost'])] works
>>> df
Name Cost
Store 1 Chris 22.5
Store 1 Kevyn 2.5
Store 2 Vinod 5.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.