[英]Value error when trying to create a new dataframe column with a function
尝试在 dataframe 中创建新列时遇到值错误。 它看起来像这样:
state veteran_pop pct_gulf pct_vietnam
0 Alaska 70458 20.0 31.2
1 Arizona 532634 8.8 15.8
2 Colorado 395350 10.1 20.8
3 Georgia 693809 10.8 21.8
4 Iowa 234659 7.1 13.7
所以我有一个看起来像这样的 function:
def addProportions(table, col1, col2, new_col):
for row, index in table.iterrows():
table[new_col] = ((table[col1] + table[col2])/100)
return(table)
其中table
是上面的表, col1 = "pct_gulf"
, col2 = "pct_vietnam"
和new_col = "pct_total"
像这样:
addProportions(table, "pct_gulf", "pct_vietnam", "total_pct")
但是,当我运行此 function 时,我收到此错误消息:
ValueError: Wrong number of items passed 2, placement implies 1
- - 或者 - -
我已经让我的addProportions
function 像这样:
def addProportions(table, col1, col2, new_col):
table[new_col] = 0
for row, index in table.iterrows():
table[new_col] = ((table[col1] + table[col2])/100)
return(table)
我得到了这个 output,这似乎是朝着正确方向迈出的一步。
state veteran_pop pct_gulf pct_vietnam total_pct
0 Alaska 70458 20.0 31.2 NaN
1 Arizona 532634 8.8 15.8 NaN
2 Colorado 395350 10.1 20.8 NaN
3 Georgia 693809 10.8 21.8 NaN
4 Iowa 234659 7.1 13.7 NaN
但问题是当我在两列上使用type()
时,我尝试将它添加为 dataframe,这就是我认为我得到 NaN 的原因。
---- 表信息 ----
t.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 4 columns):
(state,) 55 non-null object
(veteran_pop,) 55 non-null int64
(pct_gulf,) 55 non-null float64
(pct_vietnam,) 55 non-null float64
dtypes: float64(2), int64(1), object(1)
memory usage: 1.8+ KB
t.index
RangeIndex(start=0, stop=55, step=1)
t.columns
MultiIndex(levels=[[u'pct_gulf', u'pct_vietnam', u'state', u'veteran_pop']], codes=[[2, 3, 0, 1]])
你不需要循环。 您只需要(表是您的数据框的名称):
table.columns=table.columns.droplevel()
table['total_pct']=(table['pct_gulf']+table['pct_vietnam'])/100
print(table)
我认为问题在于你有一个 MultiIndex。
我的 DataFrame,当我根据您的信息构造一个时,看起来像这样:
table = pd.DataFrame(data={"state":["Alaska", "Arizona", "Colorado",
"Georgia", "Iowa"],
"veteran_pop":[70458, 532634, 395350, 693809, 234659],
"pct_gulf": [20.0, 8.8, 10.1, 10.8, 7.1],
"pct_vietnam": [31.2, 15.8, 20.8, 21.8, 13.7]})
而 table.info() 显示了这一点:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
state 5 non-null object
veteran_pop 5 non-null int64
pct_gulf 5 non-null float64
pct_vietnam 5 non-null float64
total_pct 5 non-null float64
dtypes: float64(3), int64(1), object(1)
memory usage: 280.0+ bytes
如果我构造一个 MultiIndex,我会收到一个错误:
multi = pd.DataFrame(data={("state",):["Alaska", "Arizona", "Colorado", "Georgia", "Iowa"],
("veteran_pop",):[70458, 532634, 395350, 693809, 234659],
("pct_gulf",): [20.0, 8.8, 10.1, 10.8, 7.1],
("pct_vietnam",): [31.2, 15.8, 20.8, 21.8, 13.7]})
如果我在我的常规 DataFrame 上运行 addProportions(table),我会得到正确的答案:
state veteran_pop pct_gulf pct_vietnam total_pct
0 Alaska 70458 20.0 31.2 0.512
1 Arizona 532634 8.8 15.8 0.246
2 Colorado 395350 10.1 20.8 0.309
3 Georgia 693809 10.8 21.8 0.326
4 Iowa 234659 7.1 13.7 0.208
但在 MultiIndex 上运行它会引发错误。
TypeError: addProportions() missing 3 required positional arguments:
'col1', 'col2', and 'new_col'
不知何故,您的列中最终出现了 MultiIndex,即使您在这里没有分层类别。 (只有在分解百分比时才需要它,例如,按年份:
columns = pd.MultiIndex.from_product([["percentage","veteran_pop"], ["army","navy"], ["2010", "2015"]])
pd.DataFrame( columns=columns, index=pd.RangeIndex(start=0, stop=5))
percentage veteran_pop
army navy army navy
2010 2015 2010 2015 2010 2015 2010 2015
0 NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN
...
您需要重塑您的 DataFrame 以使用您编写的 function。 function 有效,但是您的列中的索引类型错误。
如果要将数据保留为多索引,请将 function 更改为:
def addProportions(table, col1, col2, new_col):
table[new_col] = ((table[(col1,)] + table[(col2,)])/100)
# you can enable the return line if it is in need
# return table
如果要将数据重塑为普通数据:
def addProportions(table, col1, col2, new_col):
table[new_col] = ((table[col1] + table[col2])/100)
# you can enable the return line if it is in need
# return table
# shape a new df without the multi-index
new_col = [i[0] for i in multi.columns]
new_df = pd.DataFrame(multi.values, columns = new_col)
# call funtion
addProportions(new_df, "pct_gulf", "pct_vietnam", "total_pct")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.