[英]How do I count the number of line in a FTP file without downloading it locally while using Python
因此,我需要能夠從FTP服務器讀取和計數行數,而無需在使用Python的情況下將其下載到本地計算機上。
我知道連接到服務器的代碼:
ftp = ftplib.FTP('example.com') //Object ftp set as server address
ftp.login ('username' , 'password') // Login info
ftp.retrlines('LIST') // List file directories
ftp.cwd ('/parent folder/another folder/file/') //Change file directory
我也知道計算行數的基本代碼( 如果已經在本地下載/存儲的話):
with open('file') as f:
... count = sum(1 for line in f)
... print (count)
我只需要知道如何連接這兩段代碼,而不必將文件下載到本地系統。
任何幫助表示贊賞。 謝謝
據我所知,FTP沒有提供任何類型的功能來讀取文件內容而不進行實際下載。 但是,您可以嘗試使用類似的方法, 是否可以不使用Python編寫而直接讀取FTP文件? (您尚未指定要使用的python)
#!/usr/bin/env python
from ftplib import FTP
def countLines(s):
print len(s.split('\n'))
ftp = FTP('ftp.kernel.org')
ftp.login()
ftp.retrbinary('RETR /pub/README_ABOUT_BZ2_FILES', countLines)
請將此代碼僅供參考
有一種方法:我修改了為“動態”處理csv文件創建的代碼。 是由生產者-消費者問題方法實施的。 應用此模式可使我們將每個任務分配給一個線程(或進程),並顯示巨大的遠程文件的部分結果。 您可以使其適應ftp請求。
下載流被保存在隊列中並被“即時”使用 。 無需HDD多余的空間,並且存儲效率高。 在Fedora Core 25 x86_64上的Python 3.5.2(vanilla)中進行了測試。
這是適用於ftp(通過http)檢索的源:
from threading import Thread, Event
from queue import Queue, Empty
import urllib.request,sys,csv,io,os,time;
import argparse
FILE_URL = 'http://cdiac.ornl.gov/ftp/ndp030/CSV-FILES/nation.1751_2010.csv'
def download_task(url,chunk_queue,event):
CHUNK = 1*1024
response = urllib.request.urlopen(url)
event.clear()
print('%% - Starting Download - %%')
print('%% - ------------------ - %%')
'''VT100 control codes.'''
CURSOR_UP_ONE = '\x1b[1A'
ERASE_LINE = '\x1b[2K'
while True:
chunk = response.read(CHUNK)
if not chunk:
print('%% - Download completed - %%')
event.set()
break
chunk_queue.put(chunk)
def count_task(chunk_queue, event):
part = False
time.sleep(5) #give some time to producer
M=0
contador = 0
'''VT100 control codes.'''
CURSOR_UP_ONE = '\x1b[1A'
ERASE_LINE = '\x1b[2K'
while True:
try:
#Default behavior of queue allows getting elements from it and block if queue is Empty.
#In this case I set argument block=False. When queue.get() and queue Empty ocurrs not block and throws a
#queue.Empty exception that I use for show partial result of process.
chunk = chunk_queue.get(block=False)
for line in chunk.splitlines(True):
if line.endswith(b'\n'):
if part: ##for treat last line of chunk (normally is a part of line)
line = linepart + line
part = False
M += 1
else:
##if line not contains '\n' is last line of chunk.
##a part of line which is completed in next interation over next chunk
part = True
linepart = line
except Empty:
# QUEUE EMPTY
print(CURSOR_UP_ONE + ERASE_LINE + CURSOR_UP_ONE)
print(CURSOR_UP_ONE + ERASE_LINE + CURSOR_UP_ONE)
print('Downloading records ...')
if M>0:
print('Partial result: Lines: %d ' % M) #M-1 because M contains header
if (event.is_set()): #'THE END: no elements in queue and download finished (even is set)'
print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
print('The consumer has waited %s times' % str(contador))
print('RECORDS = ', M)
break
contador += 1
time.sleep(1) #(give some time for loading more records)
def main():
chunk_queue = Queue()
event = Event()
args = parse_args()
url = args.url
p1 = Thread(target=download_task, args=(url,chunk_queue,event,))
p1.start()
p2 = Thread(target=count_task, args=(chunk_queue,event,))
p2.start()
p1.join()
p2.join()
# The user of this module can customized one parameter:
# + URL where the remote file can be found.
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('-u', '--url', default=FILE_URL,
help='remote-csv-file URL')
return parser.parse_args()
if __name__ == '__main__':
main()
用法
$ python ftp-data.py -u <ftp-file>
例:
python ftp-data-ol.py -u 'http://cdiac.ornl.gov/ftp/ndp030/CSV-FILES/nation.1751_2010.csv'
The consumer has waited 0 times
RECORDS = 16327
Github上的CSV版本: https : //github.com/AALVAREZG/csv-data-onthefly
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.