python无法将函数结果存储在变量中

Question

我编写了以下代码，以帮助我抓取文件中的重复行并列出每条重复行的行号。

该代码不在函数中时起作用。 但是，当我将代码放入如下所示的函数中时，它的行为不像我期望的那样。

我希望将“ getallDups”函数的值存储在变量数据中。

#!/usr/bin/env python

filename = '/tmp/test.txt'
f = open(filename, "r")
contentAslist = f.read().splitlines()
def getallDups():
    lc = 0
    mystring = ""
    for eitem in contentAslist:
        lc += 1
        if contentAslist.count(eitem) > 1:
            mystring = lc,eitem
            return(mystring)

data = getallDups()
print data

上面的代码仅存储第一条重复的行。 它不会列出所有重复的行。

如何修改此代码以精确地执行我想要的操作？ 如何修改将定义的函数的值存储在变量“ data”中，然后可以使用它。

Answer 1

您的麻烦在于，您将在循环中返回，这意味着您将永远无法获取剩余的数据。 您可以解决此问题，只需将return换成yield并将检索调用更改为：

data = list(getallDups())

这将使您的循环完全完成。

Answer 2

您将return语句放在函数内部的循环中：return导致函数在其第一次迭代时结束...可能的方法是返回列表（并在循环中收集字符串）或将函数更改为生成器。

返回列表：

filename = '/tmp/test.txt'
f = open(filename, "r")
contentAslist = f.read().splitlines()
def getallDups():
    mylist = []
    lc = 0
    for eitem in contentAslist:
        lc += 1
        if contentAslist.count(eitem) > 1:
            mylist.append((lc, eitem))      # append the duplicated line to a list
    return mylist                           # return the fully populated list

data = getallDups()
print data

发电机版本：

filename = '/tmp/test.txt'
f = open(filename, "r")
contentAslist = f.read().splitlines()
def getallDups():
    mylist = []
    lc = 0
    for eitem in contentAslist:
        lc += 1
        if contentAslist.count(eitem) > 1:
            yield (lc, eitem)    # yield duplicate lines one at a time

data = list(getallDups())        # build a list from the generator values
print data

Answer 3

如果希望它返回更多结果，则需要计算更多结果。 您无需返回找到的第一个匹配项，而是将其添加到列表中并返回列表：

contentAslist = [
    "abcd",
    "efgh",
    "abcd",
    "ijk",
    "lmno",
    "ijk",
    "lmno",
    "ijk",
]

def getallDups():
    lc = 0
    result = []
    for eitem in contentAslist:
        lc += 1
        if contentAslist.count(eitem) > 1:
            result.append((lc, eitem))
    return result

data = getallDups()
print data

但是，这是一种效率很低的方法O（N ^ 2），因为list.count（）方法对于列表中的N个项目是O（N），我们称它为N次。

更好的方法是使用哈希。 请注意，此处的返回类型非常不同，但可能更有用，并且可以轻松转换为原始格式。

import collections
contentAslist = [
    "abcd",
    "efgh",
    "abcd",
    "ijk",
    "lmno",
    "ijk",
    "lmno",
    "ijk",
]
def getallDups():
    lc = 1
    # OrderedDict is same as "{}" except that when we iterate them later they're in the order that we added them.
    lhash = collections.OrderedDict()
    for line in contentAslist:
        # get list of line numbers matching this line, or empty list if it's the first
        line_numbers = lhash.get(line, [])
        # add this line number to the list
        line_numbers.append(lc)
        # Store the list of line numbers matching this line in the hash
        lhash[line] = line_numbers
        lc += 1

    return lhash

data = getallDups()

for line, line_numbers in data.iteritems():
    if len(line_numbers) > 1:
        print line, ":",
        for ln in line_numbers:
            print ln,
        print

上面的解决方案是O（N）。

输入样例：

abcd
efgh
abcd
ijk
lmno
ijk
lmno
ijk

输出：

abcd : 1 3
ijk : 4 6 8
lmno : 5 7

python无法将函数结果存储在变量中

问题描述

3 个解决方案

解决方案1
1 2018-06-11 21:24:53

解决方案2
1 已采纳 2018-06-11 21:25:42

解决方案3
1 2018-06-11 21:40:52

python无法将函数结果存储在变量中

问题描述

3 个解决方案

解决方案1 1 2018-06-11 21:24:53

解决方案2 1 已采纳 2018-06-11 21:25:42

解决方案3 1 2018-06-11 21:40:52

解决方案1
1 2018-06-11 21:24:53

解决方案2
1 已采纳 2018-06-11 21:25:42

解决方案3
1 2018-06-11 21:40:52