[英]Averaging specific list elements iteratively?
假設我有一個數據集,其中包含變量,線,如下所示:
lines = ['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
如何且僅當lines[0] == lines[0]
時才有意義,這意味着僅當列表的第一個元素完全相同時,才對列表其余部分的平均特定值求平均值,然后將其組合成一個平均值清單? 當然,我將必須將所有數字轉換為浮點數。
在特定示例中,我需要一個單數列表,其中除line [1]和lines [-1]以外的所有數值均取平均值。 有什么簡單的方法嗎?
預期產量
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, avg_of_var, avg_of_var, avg, , '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']
基本上-現在我看到我的示例數據很不幸,因為所有值都相同-但我想要一個單數列表,其中包含示例中四行數字的平均值。
這個簡單的python代碼段會工作嗎
# I am assuming lines is a list of line
lines = [['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6'],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6'],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6'],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', '1', '10', '38', '0.0', '9', '20050407', '20170319', '0', '0', '0', '0', '1', '1', '281.6']]
# I am gonna use dict to distinct line[0] as key
# will keep adding to dict , if first time
# otherwise add all the values to corresponding index
# also keep track of number of lines to find out avg at last
average = {}
for line in lines:
# first time just enter data to dict
# and initialise qty as 1
if line[0] not in average:
average[line[0]] = {
'data': line,
'qty' : 1
}
continue
add column data after type conversion to float
i = 1
while i < len(line):
average[line[0]]['data'][i] = float(average[line[0]]['data'][i]) + float(line[i])
i+=1
average[line[0]]['qty'] += 1;
# now create another list of required lines
merged_lines = []
for key in average:
line = []
line.append(key)
# this is to calculate average
for element in average[key]['data'][1:]:
line.append(element/average[key]['qty'])
merged_lines.append(line)
print merged_lines
您可以使用熊貓創建數據框。 然后,您可以按行[0]分組,然后按均值聚合(僅適用於所需的列)。 但是,您還需要為其他列指定聚合方法。 我假設,您還需要這些列的均值。
import pandas as pd
from numpy import mean
lines = [['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9,
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9,
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9,
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6],
['QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ=', 1, 10, 38, 0.0, 9,
20050407, 20170319, 0, 0, 0, 0, 1, 1, 281.6]]
# I have removed the quotes around numbers for simplification but this can also be handled by pandas.
# create a data frame and give names to your fields.
# Here 'KEY' is the name of the first field we will use for grouping
df = pd.DataFrame(lines,columns=['KEY','a','b','c','d','e','f','g','h','i','j','k','l','m','n'])
這將產生如下內容:
KEY a b c d e f g h i j k l m n
0 QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ= 1 10 38 0.0 9 20050407 20170319 0 0 0 0 1 1 281.6
1 QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ= 1 10 38 0.0 9 20050407 20170319 0 0 0 0 1 1 281.6
2 QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ= 1 10 38 0.0 9 20050407 20170319 0 0 0 0 1 1 281.6
3 QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ= 1 10 38 0.0 9 20050407 20170319 0 0 0 0 1 1 281.6
這是您要查找的操作:
data = df.groupby('KEY',as_index=False).aggregate(mean)
這樣產生:
KEY a b c d e f g h i j k l m n
0 QA7uiXy8vIbUSPOkCf9RwQ3FsT8jVq2OxDr8zqa7bRQ= 1 10 38 0.0 9 20050407 20170319 0 0 0 0 1 1 281.6
您可以使用字典來按字段指定聚合類型(假設每個字段為“均值”):
data = df.groupby('KEY',as_index=False).aggregate({'a':mean,'b':mean,'c':mean,'d':mean,'e':mean,'f':mean,'g':mean,'h':mean,'i':mean,'j':mean,'k':mean,'l':mean,'m':mean,'n':mean})
有關groupby的更多信息,請參見: http : //pandas.pydata.org/pandas-docs/stable/generation/pandas.core.groupby.DataFrameGroupBy.agg.html
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.