[英]How to output pd.crosstab function in python using survey weight?
I am trying to run a weighted crosstab in pandas/python as follows:我正在尝试在 pandas/python 中运行加权交叉表,如下所示:
import pandas as pd
pd.crosstab(df.income1, df.benefits1,
values=df.survey_weight, aggfunc=sum)
However, I'm receiving the following error message:但是,我收到以下错误消息:
pd.crosstab(df.income1, df.benefits1,
values=df.survey_weight, aggfunc=sum)
File "<ipython-input-57-6e8cfb6762b2>", line 1
pd.crosstab(df.income1, df.benefits1,
^
SyntaxError: invalid character in identifier
Any suggestions, please?有什么建议吗? I can output the crosstab when I run the two first expressions within the bracket.当我在括号内运行前两个表达式时,我可以 output 交叉表。
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6471 entries, 0 to 11549
Data columns (total 3 columns):
survey_weight 6471 non-null float64
income1 3703 non-null float64
benefits1 588 non-null category
dtypes: category(1), float64(2)
memory usage: 467.8 KB
Turns out the whitespace issue is caused by copying and pasting the code... I typed it out and it worked.原来空白问题是由复制和粘贴代码引起的......我输入了它并且它起作用了。 Thanks to those that posted.感谢那些发帖的人。
Adding an example as this comes up in google searches, for a single weighted crosstab the following works:在谷歌搜索中添加一个示例,对于单个加权交叉表,以下工作:
dt = pd.DataFrame(
{"a": [1, 1, 1, 1, 2, 2, 2, 2], "b": [1, 2, 2, 2, 1, 1, 2, 2]}
).assign(weight=1)
data:数据:
| a | b | weight |
|----:|----:|---------:|
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 2 | 1 |
| 1 | 2 | 1 |
| 2 | 1 | 1 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 2 | 1 |
Compute crosstab:计算交叉表:
pd.crosstab(dt["a"], dt["b"], dt["weight"], aggfunc=sum)
outputs:输出:
| a | 1 | 2 |
|----:|----:|----:|
| 1 | 1 | 3 |
| 2 | 2 | 2 |
For percentages use normalize - see https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html对于百分比,请使用标准化 - 请参阅https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.