在csv文件中找到所有可能的路徑

Question

我有一個csv文件，其格式如下：

header1,header2,header3,header4
1,4,2,5
1,4,0,5
0,4,2,5

我的問題的相關信息僅在第1列和第3列中。我試圖在此csv文件中查找其中兩個值（如果在同一行中）連接在一起的所有可能路徑。 例如，在以上數據中：

1 is connected to 2
1 is connected to 0
0 is connected to 2

然后所有可能的路徑是：

[1,2]
[1,0,2]
[0,2]

借助於在線資源（特別是this ），我已經能夠找到指定起始節點和結束節點的所有路徑。 以下是我的代碼：

import csv  
def main():
   inputFile = "file_directory"
   a =[]
   with open(inputFile) as csvfile:
      reader = csv.reader(csvfile)
      next(reader)
      for line in reader:
         a.append([line[0], line[2]])
   # This will print all the paths starting with 1 and ending with 2
   print(str(getAllSimplePaths('1', '2', a)))


def getAllSimplePaths(originNode, targetNode, a):
      return helpGetAllSimplePaths(targetNode,
                         [originNode],
                         set(originNode),
                         a,
                         list())


def helpGetAllSimplePaths(targetNode, currentPath, usedNodes, a, answerPaths):
  lastNode = currentPath[-1]
  if lastNode == targetNode:
    answerPaths.append(list(currentPath))
  else:
    for elem in a:
      if elem[0] == lastNode:
        if elem[1] not in usedNodes:
          currentPath.append(elem[1])
          usedNodes.add(elem[1])
          helpGetAllSimplePaths(targetNode,currentPath,usedNodes,a,answerPaths)
          usedNodes.remove(elem[1])
          currentPath.pop()
  return answerPaths               


if __name__ == '__main__':
   main()

當我運行它時，我正確地得到以下結果：

[['1', '2'], ['1', '0', '2']]

但是，我真正想做的是能夠遍歷csv文件第二列中的所有元素，並找到每個元素的所有可能路徑。 我已經為此工作了好幾天，但我想不出辦法。 我的csv文件大約有2000行。 任何幫助/建議將不勝感激！ 謝謝！

更新：額外信息

csv文件中的每一行已經是兩個元素之間的路徑。 因此，我擁有的路徑數將等於我的csv文件中具有的行數。 現在，從我的問題示例的第一行開始，將1連接到2，因此['1'，'2']是一條路徑。 對於每一行，我想通過查看同一行的第三列（elem2）查找第一列中的元素（elem1），然后在csv文件的所有行中搜索第一列中的elem2。 如果它在第一行中存在某一行，則必須將elem2連接到同一行第三列（elem3）中的相應元素。 在這種情況下，我們的路徑為[elem1，elem2，elem3]。 類似地，對於elem3，我將必須瀏覽所有行以查看其是否存在於第一列中。 如果它不存在，那么我已經完成了第一個路徑。 接下來，我繼續第二條路徑。

上面的示例所需的輸出如下所示：

[['1','2'], ['1', '0', '2'], ['0', '2'], ['1','0']]

我正在使用Python 3.5.1。

Answer 1

已編輯

這是我已經優化的版本。 在非常大的csv文件上使用它之前，建議您刪除它所做的部分/大部分打印-這不會影響最終結果。

import csv
from pprint import pprint, pformat

def main():
    inputFile = "paths.csv"
    with open(inputFile, newline='') as csvfile:
       reader = csv.reader(csvfile)
       next(reader)
       a = [[row[0], row[2]] for row in reader]
    print('a:\n', pformat(a))

    # construct an adjacency *dictionary*
    nodeToNodes = {}
    for src, dst in a:
        nodeToNodes.setdefault(src, []).append(dst)
    print('\nnodeToNodes:\n', pformat(nodeToNodes))

    print('\ngathering results:')
    all_paths = []
    for src, dst in a:
        print('  {} <-> {}'.format(src, dst))
        more_paths = getAllSimplePaths(dst, [src], {src}, nodeToNodes, [])
        print('    {}'.format(pformat(more_paths)))
        all_paths.extend(more_paths)

    print('\nall paths: {}'.format(pformat(all_paths)))

def getAllSimplePaths(targetNode, currentPath, usedNodes, nodeToNodes, answerPaths):
    lastNode = currentPath[-1]
    if lastNode == targetNode:
        answerPaths.append(currentPath[:])
    elif lastNode in nodeToNodes:
        for neighbor in nodeToNodes[lastNode]:
            if neighbor not in usedNodes:
                currentPath.append(neighbor)
                usedNodes.add(neighbor)
                getAllSimplePaths(targetNode, currentPath, usedNodes, nodeToNodes,
                                  answerPaths)
                usedNodes.remove(neighbor)
                currentPath.pop()

    return answerPaths

if __name__ == '__main__':
   main()

輸出：

a:
 [['1', '2'], ['1', '0'], ['0', '2']]

nodeToNodes:
 {'0': ['2'], '1': ['2', '0']}

gathering results:
  1 <-> 2
    [['1', '2'], ['1', '0', '2']]
  1 <-> 0
    [['1', '0']]
  0 <-> 2
    [['0', '2']]

all paths: [['1', '2'], ['1', '0', '2'], ['1', '0'], ['0', '2']]

在csv文件中找到所有可能的路徑

問題描述

1 個解決方案

解決方案1
1 已采納 2017-01-08 20:26:37

在csv文件中找到所有可能的路徑

問題描述

1 個解決方案

解決方案1 1 已采納 2017-01-08 20:26:37

解決方案1
1 已采納 2017-01-08 20:26:37