简体   繁体   English

output pd.crosstab function in python 如何使用调查权重?

[英]How to output pd.crosstab function in python using survey weight?

I am trying to run a weighted crosstab in pandas/python as follows:我正在尝试在 pandas/python 中运行加权交叉表,如下所示:

import pandas as pd
pd.crosstab(df.income1, df.benefits1, 
            values=df.survey_weight, aggfunc=sum)

However, I'm receiving the following error message:但是,我收到以下错误消息:

pd.crosstab(df.income1, df.benefits1, 
            values=df.survey_weight, aggfunc=sum)
  File "<ipython-input-57-6e8cfb6762b2>", line 1
    pd.crosstab(df.income1, df.benefits1,
                                                            ^
SyntaxError: invalid character in identifier

Any suggestions, please?有什么建议吗? I can output the crosstab when I run the two first expressions within the bracket.当我在括号内运行前两个表达式时,我可以 output 交叉表。

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6471 entries, 0 to 11549
Data columns (total 3 columns):
survey_weight     6471 non-null float64
income1       3703 non-null float64
benefits1       588 non-null category
dtypes: category(1), float64(2)
memory usage: 467.8 KB

Turns out the whitespace issue is caused by copying and pasting the code... I typed it out and it worked.原来空白问题是由复制和粘贴代码引起的......我输入了它并且它起作用了。 Thanks to those that posted.感谢那些发帖的人。

Adding an example as this comes up in google searches, for a single weighted crosstab the following works:在谷歌搜索中添加一个示例,对于单个加权交叉表,以下工作:

dt = pd.DataFrame(
    {"a": [1, 1, 1, 1, 2, 2, 2, 2], "b": [1, 2, 2, 2, 1, 1, 2, 2]}
).assign(weight=1)

data:数据:

|   a |   b |   weight |
|----:|----:|---------:|
|   1 |   1 |        1 |
|   1 |   2 |        1 |
|   1 |   2 |        1 |
|   1 |   2 |        1 |
|   2 |   1 |        1 |
|   2 |   1 |        1 |
|   2 |   2 |        1 |
|   2 |   2 |        1 |

Compute crosstab:计算交叉表:

pd.crosstab(dt["a"], dt["b"], dt["weight"], aggfunc=sum)

outputs:输出:

|   a |   1 |   2 |
|----:|----:|----:|
|   1 |   1 |   3 |
|   2 |   2 |   2 |

For percentages use normalize - see https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html对于百分比,请使用标准化 - 请参阅https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM