简体   繁体   English

使用python在csv文件中查找列的最大值

[英]find max of a column in a csv file using python

I am trying to find max of below colm in csv 我正在尝试在csv中找到低于colm的最大值

list['1154293', '885773', '-448704', '563679', '555394', '631974', '957395', '1104047', '693464', '454932', '727272', '125016', '339251', '78523', '977084', '1158718', '332681', '-341227', '173826', '742611', '1189806', '607363', '-1172384', '587993', '295198', '-300390', '468995', '698452', '967828', '-454873', '375723', '1140526', '83836', '413189', '551363', '1195111', '657081', '66659', '803301', '-953301', '883934'] 列表['1154293','885773','-448704','563679','555394','631974','957395','1104047','693464','454932','727272','125016' ,'339251','78523','977084','1158718','332681','-341227','173826','742611','1189806','607363','-1172384','587993' ,'295198','-300390','468995','698452','967828','-454873','375723','1140526','83836','413189','551363','1195111' ,“ 657081”,“ 66659”,“ 803301”,“-953301”,“ 883934”]

I ran the code i wrote 我运行了我编写的代码

  for row in csvReader:


        Revenue.append(row[1])
        max_revenue=max(Revenue)
        print("max revenue"+str(max_revenue))

But somhow its not fetching max value , output am getting is 但是它不获取最大值,输出得到的是

        max revenue 977084

Please advice , 请指教 ,

The problem here is that you're building a list of the column-1 strings, but then expecting to find the max as a number, not as a string. 这里的问题是,您正在构建一列列字符串的列表,但随后希望将最大值作为数字而不是字符串来查找。

You could fix that by building a list of the column-1 strings mapped to integers, as other answers show: 您可以通过构建映射到整数的列1字符串的列表来解决此问题,如其他答案所示:

for row in csvReader:
    Revenue.append(int(row[1]))
max_revenue=max(Revenue)

But another way is to use a key function for max : 但是另一种方法是将键函数用于max

for row in csvReader:
    Revenue.append(row[1])
max_revenue = max(Revenue, key=int)

Even better, you can use the same idea to not need that whole separate Revenue list: 更好的是,您可以使用相同的想法,而不需要整个单独的Revenue列表:

max_revenue_row = max(csvReader, key=lambda row: int(row[1]))

This means you get the whole original row, not just the integer value. 这意味着您将获得整个原始行,而不仅仅是整数值。 So, if, say, column 2 is the username that goes with the revenue in column 1, you can do this: 因此,例如,如果第2列是与第1列的收入一起使用的用户名,则可以执行以下操作:

max_revenue_row = max(csvReader, key=lambda row: int(row[1]))
best_salesman_name = max_revenue_row[2]

This also avoids building a whole extra giant list in memory; 这也避免了在内存中建立一个额外的巨型列表。 it just reads each row into memory one at a time and then discards them, and only remembers the biggest one. 它只是一次将每一行读入内存,然后丢弃它们,只记住最大的一行。

Which is usually great, but it has one potential problem: if you actually need to scan the values two or more times instead of just once, the first time already consumed all the rows, so the second time won't find any. 这通常很好,但是存在一个潜在的问题:如果您实际上需要扫描两次或多次而不是一次来扫描值,则第一次已经消耗了所有行,因此第二次将找不到任何行。 For example, this will raise an exception in the second call: 例如,这将在第二个调用中引发异常:

max_revenue_row = max(csvReader, key=lambda row: int(row[1]))
min_revenue_row = min(csvReader, key=lambda row: int(row[1]))

The ideal solution is to reorganize your code to only scan the rows once. 理想的解决方案是重新组织代码,使其仅扫描行一次。 For example, if you understand how min and max work, you could build your own min_and_max function that does both at the same time, and then use it like this: 例如,如果您了解minmax工作原理,则可以构建自己的min_and_max函数,该函数同时执行这两项操作,然后像这样使用它:

min_revenue_row, max_revenue_row = 
    min_and_max(csvReader, key=lambda row: int(row[1]))

But sometimes that's not possible, or at least not possible in a way you can figure out how to write readably. 但是有时这是不可能的,或者至少是不可能的,因为您可以弄清楚如何以可读的方式编写。 I'll assume you don't know how to write min_and_max . 我假设您不知道如何编写min_and_max So, what can you do? 所以,你可以做什么?

You have two less than ideal, but often still acceptable, options: Either read the entire file into memory, or read the file multiple times. 您有两个不太理想的选择,但通常仍然可以接受:要么将整个文件读入内存,要么多次读取文件。 Here's both. 都来了


rows = list(csvReader) # now it's in memory, so we can reuse it
max_revenue_row = max(rows, key=lambda row: int(row[1]))
min_revenue_row = min(rows, key=lambda row: int(row[1]))

with open(csvpath) as f:
    csvReader = csv.reader(f)
    max_revenue_row = max(csvReader, key=lambda row: int(row[1]))
with open(csvpath) as f:
    # whole new reader, so it doesn't matter that we used up the first
    csvReader = csv.reader(f)
    min_revenue_row = min(csvReader, key=lambda row: int(row[1]))

In your case, if the CSV file is as small at it seems, it doesn't really matter that much, but I'd probably do the first one. 在您的情况下,如果CSV文件看上去很小,那么它实际上并不重要,但是我可能会做第一个。

This should work. 这应该工作。 Since the elements of your array are string, you need to convert them to int using map(int,a) first. 由于数组的元素是字符串,因此需要首先使用map(int,a)将它们转换为int。

a=['1154293', '885773', '-448704', '563679', '555394', '631974', '957395', '1104047', '693464', '454932', '727272', '125016', '339251', '78523', '977084', '1158718', '332681', '-341227', '173826', '742611', '1189806', '607363', '-1172384', '587993', '295198', '-300390', '468995', '698452', '967828', '-454873', '375723', '1140526', '83836', '413189', '551363', '1195111', '657081', '66659', '803301', '-953301', '883934']
print(max(map(int, a)))

I think the problem is with the data type. 我认为问题出在数据类型上。 As your numbers are with '', they are interpreted as strings and thus give the maximum value considering that. 由于您的数字以''表示,因此它们将被解释为字符串,因此考虑到这一点就可以提供最大值。

You may want to cast each string to an integer. 您可能需要将每个字符串转换为整数。 Like this: 像这样:

new_list = [int(number) for number in old_list]

Hope this helps. 希望这可以帮助。

Thank you all 谢谢你们

I converted to int 我转换为int

Revenue.append(int(row[1]))

Now it works fine. 现在工作正常。

Thanks gain 谢谢收获

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM