python：從 URL 讀取文件

Question

從互聯網讀取文本文件的正確方法是什么。 例如此處的文本文件https://gist.githubusercontent.com/deekayen/4148741/raw/01c6252ccc5b5fb307c1bb899c95989a8a284616/1-1000.txt

下面的代碼有效，但在每個單詞前面產生了額外'b

from urllib.request import urlopen
#url = 'https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english.txt'
url = 'https://gist.githubusercontent.com/deekayen/4148741/raw/01c6252ccc5b5fb307c1bb899c95989a8a284616/1-1000.txt'
#data = urlopen(url)
#print('H w')

# it's a file like object and works just like a file
l = set()
data = urlopen(url)
for line in data:  # files are iterable
    word = line.strip()
    print(word)
    l.add(word)

print(l)

Answer 1

您必須將每個字節object 解碼為unicode 。 為此，您可以使用方法decode('utf-8') 。 這是代碼：

from urllib.request import urlopen
url = 'https://gist.githubusercontent.com/deekayen/4148741/raw/01c6252ccc5b5fb307c1bb899c95989a8a284616/1-1000.txt'

l = set()
data = urlopen(url)
for line in data:  # files are iterable
    word = line.strip().decode('utf-8') # decode the line into unicode
    print(word)
    l.add(word)

print(l)

Answer 2

使用 pandas 很簡單。 只需執行

import pandas as pd
pd.read_csv('https://gist.githubusercontent.com/deekayen/4148741/raw/01c6252ccc5b5fb307c1bb899c95989a8a284616/1-1000.txt')

你們都准備好了:)

python：從 URL 讀取文件

問題描述

2 個解決方案

解決方案1
-1 已采納 2019-10-12 16:40:20

解決方案2
-1 2019-10-12 17:02:55

python：從 URL 讀取文件

問題描述

2 個解決方案

解決方案1 -1 已采納 2019-10-12 16:40:20

解決方案2 -1 2019-10-12 17:02:55

解決方案1
-1 已采納 2019-10-12 16:40:20

解決方案2
-1 2019-10-12 17:02:55