繁体   English   中英

反复平均特定列表元素?

[英]Averaging specific list elements iteratively?

假设我有一个数据集,其中包含变量,线,如下所示:

lines = ['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']

如何且仅当lines[0] == lines[0]时才有意义,这意味着仅当列表的第一个元素完全相同时,才对列表其余部分的平均特定值求平均值,然后将其组合成一个平均值清单? 当然,我将必须将所有数字转换为浮点数。

在特定示例中,我需要一个单数列表,其中除line [1]和lines [-1]以外的所有数值均取平均值。 有什么简单的方法吗?

预期产量

['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, avg_of_var, avg_of_var, avg, , '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']

基本上-现在我看到我的示例数据很不幸,因为所有值都相同-但我想要一个单数列表,其中包含示例中四行数字的平均值。

这个简单的python代码段会工作吗

# I am assuming lines is a list of line
lines = [['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6'],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6'],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6'],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']]


# I am gonna use dict to distinct line[0] as key
# will keep adding to dict , if first time
# otherwise add all the values to corresponding index
# also keep track of number of lines to find out avg at last
average = {}
for line in lines:
    # first time just enter data to dict
    # and initialise qty as 1
    if line[0] not in average:
        average[line[0]] = {
            'data': line,
            'qty' : 1
        }

        continue

    add column data after type conversion to float
    i = 1
    while i < len(line):
        average[line[0]]['data'][i] = float(average[line[0]]['data'][i]) + float(line[i])
        i+=1

    average[line[0]]['qty'] += 1;

# now create another list of required lines
merged_lines = []
for key in average:
    line = []
    line.append(key)
    # this is to calculate average
    for element in average[key]['data'][1:]:
        line.append(element/average[key]['qty'])

    merged_lines.append(line)

print merged_lines

您可以使用熊猫创建数据框。 然后,您可以按行[0]分组,然后按均值聚合(仅适用于所需的列)。 但是,您还需要为其他列指定聚合方法。 我假设,您还需要这些列的均值。

import pandas as pd
from numpy import mean

lines = [['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9, 
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6],
     ['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9, 
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6],
     ['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9, 
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6],
     ['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9, 
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6]]
# I have removed the quotes around numbers for simplification but this can also be handled by pandas.

# create a data frame and give names to your fields.
# Here 'KEY' is the name of the first field we will use for grouping 
df = pd.DataFrame(lines,columns=['KEY','a','b','c','d','e','f','g','h','i','j','k','l','m','n'])

这将产生如下内容:

    KEY                                             a   b   c   d   e   f   g   h   i   j   k   l   m   n
0   QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=    1   10  38  0.0 9   20050407    20170319    0   0   0   0   1   1   281.6
1   QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=    1   10  38  0.0 9   20050407    20170319    0   0   0   0   1   1   281.6
2   QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=    1   10  38  0.0 9   20050407    20170319    0   0   0   0   1   1   281.6
3   QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=    1   10  38  0.0 9   20050407    20170319    0   0   0   0   1   1   281.6

这是您要查找的操作:

data = df.groupby('KEY',as_index=False).aggregate(mean)

这样产生:

    KEY                                             a   b   c   d   e   f   g   h   i   j   k   l   m   n
0   QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=    1   10  38  0.0 9   20050407    20170319    0   0   0   0   1   1   281.6

您可以使用字典来按字段指定聚合类型(假设每个字段为“均值”):

data = df.groupby('KEY',as_index=False).aggregate({'a':mean,'b':mean,'c':mean,'d':mean,'e':mean,'f':mean,'g':mean,'h':mean,'i':mean,'j':mean,'k':mean,'l':mean,'m':mean,'n':mean})

有关groupby的更多信息,请参见: http : //pandas.pydata.org/pandas-docs/stable/generation/pandas.core.groupby.DataFrameGroupBy.agg.html

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM