简体   繁体   中英

Value error when trying to create a new dataframe column with a function

I am running into a value error when trying to create a new column in my dataframe. It looks like this:

      state  veteran_pop  pct_gulf  pct_vietnam
0    Alaska        70458      20.0         31.2
1   Arizona       532634       8.8         15.8
2  Colorado       395350      10.1         20.8
3   Georgia       693809      10.8         21.8
4      Iowa       234659       7.1         13.7

So I have a function that looks like this:

def addProportions(table, col1, col2, new_col):

    for row, index in table.iterrows():
        table[new_col] = ((table[col1] + table[col2])/100)
    return(table)

Where table is the table above and col1 = "pct_gulf" , col2 = "pct_vietnam" , and new_col = "pct_total" like so:

addProportions(table, "pct_gulf", "pct_vietnam", "total_pct")

But when I run this function I get this error message:

ValueError: Wrong number of items passed 2, placement implies 1

--- Alternatively---

I have made my addProportions function like this:

def addProportions(table, col1, col2, new_col):
    table[new_col] = 0
    for row, index in table.iterrows():
        table[new_col] = ((table[col1] + table[col2])/100)
    return(table)

And I get this output, which seems like a step in the right direction.

      state veteran_pop pct_gulf pct_vietnam total_pct
0    Alaska       70458     20.0        31.2       NaN
1   Arizona      532634      8.8        15.8       NaN
2  Colorado      395350     10.1        20.8       NaN
3   Georgia      693809     10.8        21.8       NaN
4      Iowa      234659      7.1        13.7       NaN

But the problem is when I use type() on the two columns I try to add it comes up as a dataframe and that's why I think I'm getting NaN.

---- Table Info ----

t.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 4 columns):
(state,)          55 non-null object
(veteran_pop,)    55 non-null int64
(pct_gulf,)       55 non-null float64
(pct_vietnam,)    55 non-null float64
dtypes: float64(2), int64(1), object(1)
memory usage: 1.8+ KB

t.index

RangeIndex(start=0, stop=55, step=1)

t.columns

MultiIndex(levels=[[u'pct_gulf', u'pct_vietnam', u'state', u'veteran_pop']], codes=[[2, 3, 0, 1]])

You don't need a loop. You only need (table is the name of your dataframe):

table.columns=table.columns.droplevel()
table['total_pct']=(table['pct_gulf']+table['pct_vietnam'])/100
print(table)

I think the problem is that you have a MultiIndex.

My DataFrame, when I construct one from your info, looks like this:

    table = pd.DataFrame(data={"state":["Alaska", "Arizona", "Colorado", 
    "Georgia", "Iowa"], 

    "veteran_pop":[70458, 532634, 395350, 693809, 234659],

    "pct_gulf": [20.0, 8.8, 10.1, 10.8, 7.1],

    "pct_vietnam": [31.2, 15.8, 20.8, 21.8, 13.7]})

And table.info() shows this:

    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 5 entries, 0 to 4
    Data columns (total 5 columns):
    state          5 non-null object
    veteran_pop    5 non-null int64
    pct_gulf       5 non-null float64
    pct_vietnam    5 non-null float64
    total_pct      5 non-null float64
    dtypes: float64(3), int64(1), object(1)
    memory usage: 280.0+ bytes

If I construct a MultiIndex, I get an error:

    multi = pd.DataFrame(data={("state",):["Alaska", "Arizona", "Colorado", "Georgia", "Iowa"], 

    ("veteran_pop",):[70458, 532634, 395350, 693809, 234659],

    ("pct_gulf",): [20.0, 8.8, 10.1, 10.8, 7.1],

    ("pct_vietnam",): [31.2, 15.8, 20.8, 21.8, 13.7]})

If I run addProportions(table) on my regular DataFrame, I get the right answer:

    state        veteran_pop    pct_gulf    pct_vietnam total_pct
     0  Alaska    70458         20.0        31.2        0.512
    1   Arizona   532634        8.8         15.8        0.246
    2   Colorado  395350        10.1        20.8        0.309
    3   Georgia   693809        10.8        21.8        0.326
    4   Iowa      234659        7.1         13.7        0.208

but running it on the MultiIndex throws an error.

    TypeError: addProportions() missing 3 required positional arguments: 
    'col1', 'col2', and 'new_col'

Somehow, you ended up with a MultiIndex in your columns, even though you don't have hierarchical categories here. (You'd only want it if you were breaking down percentages, for example, by year:

    columns = pd.MultiIndex.from_product([["percentage","veteran_pop"], ["army","navy"], ["2010", "2015"]])
    pd.DataFrame( columns=columns, index=pd.RangeIndex(start=0, stop=5))

    percentage  veteran_pop
    army    navy    army    navy
    2010    2015    2010    2015    2010    2015    2010    2015
    0   NaN NaN NaN NaN NaN NaN NaN NaN
    1   NaN NaN NaN NaN NaN NaN NaN NaN
    ...

You'll need to reshape your DataFrame to use the function you've written. The function works, but you have the wrong kind of index in your columns.

If you want to keep the data as multi-index, change the function to:

def addProportions(table, col1, col2, new_col):

    table[new_col] = ((table[(col1,)] + table[(col2,)])/100)
    # you can enable the return line if it is in need
    # return table

If you want to reshape the data into normal data:

def addProportions(table, col1, col2, new_col):

    table[new_col] = ((table[col1] + table[col2])/100)
    # you can enable the return line if it is in need
    # return table

# shape a new df without the multi-index 
new_col = [i[0] for i in multi.columns]
new_df = pd.DataFrame(multi.values, columns = new_col)

# call funtion
addProportions(new_df, "pct_gulf", "pct_vietnam", "total_pct")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM