在Python中優化文件和數字行數

Question

我得到了一個包含許多文件夾，文件（.css，.py，.yml等）和代碼行的python項目。 對於這個項目，我制作了一個名為“統計信息”的工具，可為我提供有關整個項目的信息，例如：

全球統計：

整個項目:: 32329行
項目主文件（.py，.yml）:: 8420行
沒有供應商零件的項目:: 1070行
核心（src目錄）:: 394行
與項目主文件相比的核心:: 5％Kraken Framework（vendor / *。py）:: 7350行
主文件Python代碼:: 93％
供應商Python代碼:: 87％
整個項目規模：： 37M

為了獲得所有這些數字，我主要使用兩個函數：

def count_folder_lines(self, path):
    files = glob.glob(path, recursive=True)
    number = 0
    for file in files:
        num_lines = sum(1 for line in open(file))
        number += num_lines
    return number

和

def count_number_of_files(self, path):
    files = glob.glob(path, recursive=True)
    return len(files)

第一個用於計算文件夾中的行數，第二個用於計算特定文件的數量（例如：src / *。py）。 但是要獲得項目的統計數據，需要花費4.9到5.3秒，這是很多時間。

有什么方法可以使其更快？ 並行編程或使用Cython會改變某些東西嗎？

祝你有美好的一天，謝謝。

Answer 1

終於找到了對我來說最有效的解決方案：我正在使用多處理模塊來並行計算每個文件的行數。

def count_folder_lines(self, path):
    """ 
        Use a buffer to count the number of line of each file among path.
        :param path: string pattern of a file type
        :return: number of lines in matching files
    """
    files = glob.glob(path, recursive=True)
    number = 0
    for file in files:
        f = open(file, 'rb')
        bufgen = takewhile(lambda x: x,
                           (f.raw.read(1024 * 1024) for _ in repeat(None)))
        number += sum(buf.count(b'\n') for buf in bufgen if buf)
    return number

def count_number_of_files(self, path):
    """
        Count number of files for a string pattern
        :param path: files string pattern
        :return: number of files matching the pattern
    """
    files = glob.glob(path, recursive=True)
    return len(files)

def multiproc(self):
    """
        Multiprocessing to launch several processes to count number of
        lines of each string pattern in self.files
        :return: List of number of files per string pattern
                    (list of int).
    """
    pool = mp.Pool()
    asyncResult = pool.map_async(self.count_folder_lines, self.files)
    return asyncResult.get()

使用此解決方案，計數所需的時間約為1.2s，而之前約為5s。

祝你有美好的一天！

在Python中優化文件和數字行數

問題描述

1 個解決方案

解決方案1
0 已采納 2017-12-06 08:15:18

在Python中優化文件和數字行數

問題描述

1 個解決方案

解決方案1 0 已采納 2017-12-06 08:15:18

解決方案1
0 已采納 2017-12-06 08:15:18