简体   繁体   English

反复平均特定列表元素?

[英]Averaging specific list elements iteratively?

Say I have a dataset with a variable, lines, that looks like this: 假设我有一个数据集,其中包含变量,线,如下所示:

lines = ['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']

How do I, if and only if lines[0] == lines[0] , meaning only if the first element of the list is the exact same, average specific values in the rest of the list, and combine that into one, averaged list? 如何且仅当lines[0] == lines[0]时才有意义,这意味着仅当列表的第一个元素完全相同时,才对列表其余部分的平均特定值求平均值,然后将其组合成一个平均值清单? Of course, I will have to convert all numbers into floats. 当然,我将必须将所有数字转换为浮点数。

In the specific example, I want a singular list, where all the numeric values besides lines[1] and lines[-1] are averaged. 在特定示例中,我需要一个单数列表,其中除line [1]和lines [-1]以外的所有数值均取平均值。 Any easy way? 有什么简单的方法吗?

Expected output 预期产量

['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, avg_of_var, avg_of_var, avg, , '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']

Basically - and I see now that my example data is unfortunate as all values are the same - but I want a singular list containing an average of the numeric values of the four lines in the example. 基本上-现在我看到我的示例数据很不幸,因为所有值都相同-但我想要一个单数列表,其中包含示例中四行数字的平均值。

will this simple python snippet works 这个简单的python代码段会工作吗

# I am assuming lines is a list of line
lines = [['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6'],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6'],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6'],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']]


# I am gonna use dict to distinct line[0] as key
# will keep adding to dict , if first time
# otherwise add all the values to corresponding index
# also keep track of number of lines to find out avg at last
average = {}
for line in lines:
    # first time just enter data to dict
    # and initialise qty as 1
    if line[0] not in average:
        average[line[0]] = {
            'data': line,
            'qty' : 1
        }

        continue

    add column data after type conversion to float
    i = 1
    while i < len(line):
        average[line[0]]['data'][i] = float(average[line[0]]['data'][i]) + float(line[i])
        i+=1

    average[line[0]]['qty'] += 1;

# now create another list of required lines
merged_lines = []
for key in average:
    line = []
    line.append(key)
    # this is to calculate average
    for element in average[key]['data'][1:]:
        line.append(element/average[key]['qty'])

    merged_lines.append(line)

print merged_lines

You can use pandas to create a dataframe. 您可以使用熊猫创建数据框。 You can then group by lines[0] and then aggregate by mean (for desired columns only). 然后,您可以按行[0]分组,然后按均值聚合(仅适用于所需的列)。 However, you also need to specify aggregation method for other columns as well. 但是,您还需要为其他列指定聚合方法。 I will assume, you also need the mean for these columns. 我假设,您还需要这些列的均值。

import pandas as pd
from numpy import mean

lines = [['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9, 
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6],
     ['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9, 
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6],
     ['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9, 
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6],
     ['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9, 
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6]]
# I have removed the quotes around numbers for simplification but this can also be handled by pandas.

# create a data frame and give names to your fields.
# Here 'KEY' is the name of the first field we will use for grouping 
df = pd.DataFrame(lines,columns=['KEY','a','b','c','d','e','f','g','h','i','j','k','l','m','n'])

This yields something like this: 这将产生如下内容:

    KEY                                             a   b   c   d   e   f   g   h   i   j   k   l   m   n
0   QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=    1   10  38  0.0 9   20050407    20170319    0   0   0   0   1   1   281.6
1   QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=    1   10  38  0.0 9   20050407    20170319    0   0   0   0   1   1   281.6
2   QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=    1   10  38  0.0 9   20050407    20170319    0   0   0   0   1   1   281.6
3   QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=    1   10  38  0.0 9   20050407    20170319    0   0   0   0   1   1   281.6

This is the operation you are looking for: 这是您要查找的操作:

data = df.groupby('KEY',as_index=False).aggregate(mean)

This yields: 这样产生:

    KEY                                             a   b   c   d   e   f   g   h   i   j   k   l   m   n
0   QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=    1   10  38  0.0 9   20050407    20170319    0   0   0   0   1   1   281.6

You can specify the aggregation type by field by using a dictionary (assuming 'mean' for every field): 您可以使用字典来按字段指定聚合类型(假设每个字段为“均值”):

data = df.groupby('KEY',as_index=False).aggregate({'a':mean,'b':mean,'c':mean,'d':mean,'e':mean,'f':mean,'g':mean,'h':mean,'i':mean,'j':mean,'k':mean,'l':mean,'m':mean,'n':mean})

More information about groupby can be found here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.agg.html 有关groupby的更多信息,请参见: http : //pandas.pydata.org/pandas-docs/stable/generation/pandas.core.groupby.DataFrameGroupBy.agg.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM