I am running into a value error when trying to create a new column in my dataframe. It looks like this:
state veteran_pop pct_gulf pct_vietnam
0 Alaska 70458 20.0 31.2
1 Arizona 532634 8.8 15.8
2 Colorado 395350 10.1 20.8
3 Georgia 693809 10.8 21.8
4 Iowa 234659 7.1 13.7
So I have a function that looks like this:
def addProportions(table, col1, col2, new_col):
for row, index in table.iterrows():
table[new_col] = ((table[col1] + table[col2])/100)
return(table)
Where table
is the table above and col1 = "pct_gulf"
, col2 = "pct_vietnam"
, and new_col = "pct_total"
like so:
addProportions(table, "pct_gulf", "pct_vietnam", "total_pct")
But when I run this function I get this error message:
ValueError: Wrong number of items passed 2, placement implies 1
--- Alternatively---
I have made my addProportions
function like this:
def addProportions(table, col1, col2, new_col):
table[new_col] = 0
for row, index in table.iterrows():
table[new_col] = ((table[col1] + table[col2])/100)
return(table)
And I get this output, which seems like a step in the right direction.
state veteran_pop pct_gulf pct_vietnam total_pct
0 Alaska 70458 20.0 31.2 NaN
1 Arizona 532634 8.8 15.8 NaN
2 Colorado 395350 10.1 20.8 NaN
3 Georgia 693809 10.8 21.8 NaN
4 Iowa 234659 7.1 13.7 NaN
But the problem is when I use type()
on the two columns I try to add it comes up as a dataframe and that's why I think I'm getting NaN.
---- Table Info ----
t.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 4 columns):
(state,) 55 non-null object
(veteran_pop,) 55 non-null int64
(pct_gulf,) 55 non-null float64
(pct_vietnam,) 55 non-null float64
dtypes: float64(2), int64(1), object(1)
memory usage: 1.8+ KB
t.index
RangeIndex(start=0, stop=55, step=1)
t.columns
MultiIndex(levels=[[u'pct_gulf', u'pct_vietnam', u'state', u'veteran_pop']], codes=[[2, 3, 0, 1]])
You don't need a loop. You only need (table is the name of your dataframe):
table.columns=table.columns.droplevel()
table['total_pct']=(table['pct_gulf']+table['pct_vietnam'])/100
print(table)
I think the problem is that you have a MultiIndex.
My DataFrame, when I construct one from your info, looks like this:
table = pd.DataFrame(data={"state":["Alaska", "Arizona", "Colorado",
"Georgia", "Iowa"],
"veteran_pop":[70458, 532634, 395350, 693809, 234659],
"pct_gulf": [20.0, 8.8, 10.1, 10.8, 7.1],
"pct_vietnam": [31.2, 15.8, 20.8, 21.8, 13.7]})
And table.info() shows this:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
state 5 non-null object
veteran_pop 5 non-null int64
pct_gulf 5 non-null float64
pct_vietnam 5 non-null float64
total_pct 5 non-null float64
dtypes: float64(3), int64(1), object(1)
memory usage: 280.0+ bytes
If I construct a MultiIndex, I get an error:
multi = pd.DataFrame(data={("state",):["Alaska", "Arizona", "Colorado", "Georgia", "Iowa"],
("veteran_pop",):[70458, 532634, 395350, 693809, 234659],
("pct_gulf",): [20.0, 8.8, 10.1, 10.8, 7.1],
("pct_vietnam",): [31.2, 15.8, 20.8, 21.8, 13.7]})
If I run addProportions(table) on my regular DataFrame, I get the right answer:
state veteran_pop pct_gulf pct_vietnam total_pct
0 Alaska 70458 20.0 31.2 0.512
1 Arizona 532634 8.8 15.8 0.246
2 Colorado 395350 10.1 20.8 0.309
3 Georgia 693809 10.8 21.8 0.326
4 Iowa 234659 7.1 13.7 0.208
but running it on the MultiIndex throws an error.
TypeError: addProportions() missing 3 required positional arguments:
'col1', 'col2', and 'new_col'
Somehow, you ended up with a MultiIndex in your columns, even though you don't have hierarchical categories here. (You'd only want it if you were breaking down percentages, for example, by year:
columns = pd.MultiIndex.from_product([["percentage","veteran_pop"], ["army","navy"], ["2010", "2015"]])
pd.DataFrame( columns=columns, index=pd.RangeIndex(start=0, stop=5))
percentage veteran_pop
army navy army navy
2010 2015 2010 2015 2010 2015 2010 2015
0 NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN
...
You'll need to reshape your DataFrame to use the function you've written. The function works, but you have the wrong kind of index in your columns.
If you want to keep the data as multi-index, change the function to:
def addProportions(table, col1, col2, new_col):
table[new_col] = ((table[(col1,)] + table[(col2,)])/100)
# you can enable the return line if it is in need
# return table
If you want to reshape the data into normal data:
def addProportions(table, col1, col2, new_col):
table[new_col] = ((table[col1] + table[col2])/100)
# you can enable the return line if it is in need
# return table
# shape a new df without the multi-index
new_col = [i[0] for i in multi.columns]
new_df = pd.DataFrame(multi.values, columns = new_col)
# call funtion
addProportions(new_df, "pct_gulf", "pct_vietnam", "total_pct")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.