简体   繁体   English

条件Sum / Average / etc ... Python中的CSV文件

[英]Conditional Sum/Average/etc… CSV file in Python

First off, I've found similar articles, but I haven't been able to figure out how to translate the answers from those questions to my own problem. 首先,我发现了类似的文章,但我无法弄清楚如何将这些问题的答案翻译成我自己的问题。 Secondly, I'm new to python, so I apologize for being a noob. 其次,我是python的新手,所以我为一个菜鸟而道歉。

Here's my question: I want to perform conditional calculations (average/proportion/etc..) on values within a text file 这是我的问题: 我想对文本文件中的值执行条件计算(平均值/比例/等等)

More concretely, I have a file that looks a little something like below 更具体地说,我有一个看起来像下面的文件

0    Diamond    Correct
0    Cross      Incorrect
1    Diamond    Correct
1    Cross      Correct

Thus far, I am able to read in the file and collect all of the rows. 到目前为止,我能够读入文件并收集所有行。

import pandas as pd
fileLocation = r'C:/Users/Me/Desktop/LogFiles/SubjectData.txt'
df = pd.read_csv(fileLocation, header = None, sep='\t', index_col = False,
                 name = ["Session Number", "Image", "Outcome"])

I'm looking to query the file such that I can ask questions like: 我正在查询文件,以便我可以提出以下问题:

--What is the proportion of "Correct" values in the 'Outcome' column when the first column ('Session Number') is 0? - 当第一列('会话编号')为0时,'结果'列中“正确”值的比例是多少? So this would be 0.5, because there is one "Correct" and one "Incorrect". 所以这将是0.5,因为有一个“正确”和一个“不正确”。

I have other calculations I'd like to perform, but I should be able to figure out where to go once I know how to do this, hopefully simple, command. 我还有其他想要执行的计算,但是一旦我知道如何执行此操作,我应该能够找出去哪里,希望简单,命令。

Thanks! 谢谢!

# getting the total number of rows
total = len(df)  

# getting the number of rows that have 'Correct' for 'Outcome' and 0 for 'Session Number'
correct_and_session_zero = len(df[(df['Outcome'] == 'Correct') & 
                                  (df['Session Number'] == 0)])

# if you're using python 2 you might need to convert correct_and_session_zero  or total
# to float so you won't lose precision
print(correct_and_session_zero / total)

you can also do it this way: 你也可以这样做:

In [467]: df.groupby('Session#')['Outcome'].apply(lambda x: (x == 'Correct').sum()/len(x))
Out[467]:
Session#
0    0.5
1    1.0
Name: Outcome, dtype: float64

it'll group your DF by Session# and calculate Ratio of correct Outcomes for each group ( Session# ) 它会按Session#您的DF进行分组,并计算每组Ratio of correct OutcomesSession#

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM