简体   繁体   English

Python-如何通过使用循环使此重复代码更短?

[英]Python - How can I make this repetitive code shorter by using loop?

I'm very new to Python. 我是Python的新手。 This following is an example of my data: 以下是我的数据示例:

Category    May  June  July
Product1    32   41    43
Product2    74   65    65
Product3    17   15    18
Product4    14   13    14

I have many sets of data and I'd like to calculate Chi-square for each set. 我有很多数据集,我想为每组计算卡方。 The code is as follow: 代码如下:

Product1 = [32,41,43]
chi2, p = scipy.stats.chisquare(Product1)
print('Product1')
if p > 0.05:
    print('Same')
else:
    print('Different')

Product2 = [74,65,65]
chi2, p = scipy.stats.chisquare(Product2)
print('Product2')
if p > 0.05:
    print('Same')
else:
    print('Different')

Product3 = [17,15,18]
chi2, p = scipy.stats.chisquare(Product3)
print('Product3')
if p > 0.05:
    print('Same')
else:
    print('Different')

Product4 = [14,13,14]
chi2, p = scipy.stats.chisquare(Product4)
print('Prokduct4')
if p > 0.05:
    print('Same')
else:
    print('Different')

I used "df = pd.read_excel" to insert the data table and it comes with index and I don't know how to call each row to calculate. 我使用“ df = pd.read_excel”插入数据表,它带有索引,但我不知道如何调用每一行进行计算。

How can I make this repetitive code shorter by using loop and pull the data from the table? 如何通过使用循环并从表中提取数据来缩短此重复代码? Thank you so much for your help. 非常感谢你的帮助。

You could use a loop to repeat the steps above, but you might as well leverage scipy 's ability to deal with pandas dataframes! 可以使用循环来重复上述步骤,但是您也可以利用scipy处理pandas数据帧的能力! You can apply the chisquare test over all rows of a dataframe using axis=1 . 您可以使用axis=1chisquare检验应用于数据chisquare所有行。 For exmample: 例如:

from scipy.stats import chisquare

df['p'] = chisquare(df[['May', 'June', 'July']], axis=1)[1]

df['same_diff'] = np.where(df['p'] > 0.05, 'same', 'different')

>>> df
   Category  May  June  July         p same_diff
0  Product1   32    41    43  0.411506      same
1  Product2   74    65    65  0.672294      same
2  Product3   17    15    18  0.869358      same
3  Product4   14    13    14  0.975905      same

Now your dataframe has your p values as a column, and whether they are "same" or "different" as a column 现在,您的数据框将p值作为一列,将它们的“相同”或“不同”作为一列

I will start after the data is loaded into pandas data frame: 我将数据加载到pandas数据框中后开始:

在此处输入图片说明

Then, you can do: 然后,您可以执行以下操作:

for row in df.iterrows():
    product = row[1][0]
    chi, p = scipy.stats.chisquare(row[1][1:])
    print(product, ":", "same" if p > 0.05 else "different")

This will print: 这将打印:

Product1 : same
Product2 : same
Product3 : same
Product4 : same

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM