繁体   English   中英

尝试使用 function 创建新的 dataframe 列时出现值错误

[英]Value error when trying to create a new dataframe column with a function

尝试在 dataframe 中创建新列时遇到值错误。 它看起来像这样:

      state  veteran_pop  pct_gulf  pct_vietnam
0    Alaska        70458      20.0         31.2
1   Arizona       532634       8.8         15.8
2  Colorado       395350      10.1         20.8
3   Georgia       693809      10.8         21.8
4      Iowa       234659       7.1         13.7

所以我有一个看起来像这样的 function:

def addProportions(table, col1, col2, new_col):

    for row, index in table.iterrows():
        table[new_col] = ((table[col1] + table[col2])/100)
    return(table)

其中table是上面的表, col1 = "pct_gulf"col2 = "pct_vietnam"new_col = "pct_total"像这样:

addProportions(table, "pct_gulf", "pct_vietnam", "total_pct")

但是,当我运行此 function 时,我收到此错误消息:

ValueError: Wrong number of items passed 2, placement implies 1

- - 或者 - -

我已经让我的addProportions function 像这样:

def addProportions(table, col1, col2, new_col):
    table[new_col] = 0
    for row, index in table.iterrows():
        table[new_col] = ((table[col1] + table[col2])/100)
    return(table)

我得到了这个 output,这似乎是朝着正确方向迈出的一步。

      state veteran_pop pct_gulf pct_vietnam total_pct
0    Alaska       70458     20.0        31.2       NaN
1   Arizona      532634      8.8        15.8       NaN
2  Colorado      395350     10.1        20.8       NaN
3   Georgia      693809     10.8        21.8       NaN
4      Iowa      234659      7.1        13.7       NaN

但问题是当我在两列上使用type()时,我尝试将它添加为 dataframe,这就是我认为我得到 NaN 的原因。

---- 表信息 ----

t.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 4 columns):
(state,)          55 non-null object
(veteran_pop,)    55 non-null int64
(pct_gulf,)       55 non-null float64
(pct_vietnam,)    55 non-null float64
dtypes: float64(2), int64(1), object(1)
memory usage: 1.8+ KB

t.index

RangeIndex(start=0, stop=55, step=1)

t.columns

MultiIndex(levels=[[u'pct_gulf', u'pct_vietnam', u'state', u'veteran_pop']], codes=[[2, 3, 0, 1]])

你不需要循环。 您只需要(表是您的数据框的名称):

table.columns=table.columns.droplevel()
table['total_pct']=(table['pct_gulf']+table['pct_vietnam'])/100
print(table)

我认为问题在于你有一个 MultiIndex。

我的 DataFrame,当我根据您的信息构造一个时,看起来像这样:

    table = pd.DataFrame(data={"state":["Alaska", "Arizona", "Colorado", 
    "Georgia", "Iowa"], 

    "veteran_pop":[70458, 532634, 395350, 693809, 234659],

    "pct_gulf": [20.0, 8.8, 10.1, 10.8, 7.1],

    "pct_vietnam": [31.2, 15.8, 20.8, 21.8, 13.7]})

而 table.info() 显示了这一点:

    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 5 entries, 0 to 4
    Data columns (total 5 columns):
    state          5 non-null object
    veteran_pop    5 non-null int64
    pct_gulf       5 non-null float64
    pct_vietnam    5 non-null float64
    total_pct      5 non-null float64
    dtypes: float64(3), int64(1), object(1)
    memory usage: 280.0+ bytes

如果我构造一个 MultiIndex,我会收到一个错误:

    multi = pd.DataFrame(data={("state",):["Alaska", "Arizona", "Colorado", "Georgia", "Iowa"], 

    ("veteran_pop",):[70458, 532634, 395350, 693809, 234659],

    ("pct_gulf",): [20.0, 8.8, 10.1, 10.8, 7.1],

    ("pct_vietnam",): [31.2, 15.8, 20.8, 21.8, 13.7]})

如果我在我的常规 DataFrame 上运行 addProportions(table),我会得到正确的答案:

    state        veteran_pop    pct_gulf    pct_vietnam total_pct
     0  Alaska    70458         20.0        31.2        0.512
    1   Arizona   532634        8.8         15.8        0.246
    2   Colorado  395350        10.1        20.8        0.309
    3   Georgia   693809        10.8        21.8        0.326
    4   Iowa      234659        7.1         13.7        0.208

但在 MultiIndex 上运行它会引发错误。

    TypeError: addProportions() missing 3 required positional arguments: 
    'col1', 'col2', and 'new_col'

不知何故,您的列中最终出现了 MultiIndex,即使您在这里没有分层类别。 (只有在分解百分比时才需要它,例如,按年份:

    columns = pd.MultiIndex.from_product([["percentage","veteran_pop"], ["army","navy"], ["2010", "2015"]])
    pd.DataFrame( columns=columns, index=pd.RangeIndex(start=0, stop=5))

    percentage  veteran_pop
    army    navy    army    navy
    2010    2015    2010    2015    2010    2015    2010    2015
    0   NaN NaN NaN NaN NaN NaN NaN NaN
    1   NaN NaN NaN NaN NaN NaN NaN NaN
    ...

您需要重塑您的 DataFrame 以使用您编写的 function。 function 有效,但是您的列中的索引类型错误。

如果要将数据保留为多索引,请将 function 更改为:

def addProportions(table, col1, col2, new_col):

    table[new_col] = ((table[(col1,)] + table[(col2,)])/100)
    # you can enable the return line if it is in need
    # return table

如果要将数据重塑为普通数据:

def addProportions(table, col1, col2, new_col):

    table[new_col] = ((table[col1] + table[col2])/100)
    # you can enable the return line if it is in need
    # return table

# shape a new df without the multi-index 
new_col = [i[0] for i in multi.columns]
new_df = pd.DataFrame(multi.values, columns = new_col)

# call funtion
addProportions(new_df, "pct_gulf", "pct_vietnam", "total_pct")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM