[英]python: reading file from URL
What is the proper way to read text file from internet.从互联网读取文本文件的正确方法是什么。 For example text file here https://gist.githubusercontent.com/deekayen/4148741/raw/01c6252ccc5b5fb307c1bb899c95989a8a284616/1-1000.txt
例如此处的文本文件https://gist.githubusercontent.com/deekayen/4148741/raw/01c6252ccc5b5fb307c1bb899c95989a8a284616/1-1000.txt
Code below works but produces extra 'b
in front of each word下面的代码有效,但在每个单词前面产生了额外
'b
from urllib.request import urlopen
#url = 'https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english.txt'
url = 'https://gist.githubusercontent.com/deekayen/4148741/raw/01c6252ccc5b5fb307c1bb899c95989a8a284616/1-1000.txt'
#data = urlopen(url)
#print('H w')
# it's a file like object and works just like a file
l = set()
data = urlopen(url)
for line in data: # files are iterable
word = line.strip()
print(word)
l.add(word)
print(l)
You have to decode each byte object to unicode .您必须将每个字节object 解码为unicode 。 For that you can use the method
decode('utf-8')
.为此,您可以使用方法
decode('utf-8')
。 Here's the code:这是代码:
from urllib.request import urlopen
url = 'https://gist.githubusercontent.com/deekayen/4148741/raw/01c6252ccc5b5fb307c1bb899c95989a8a284616/1-1000.txt'
l = set()
data = urlopen(url)
for line in data: # files are iterable
word = line.strip().decode('utf-8') # decode the line into unicode
print(word)
l.add(word)
print(l)
It's simple using pandas.使用 pandas 很简单。 Just execute
只需执行
import pandas as pd
pd.read_csv('https://gist.githubusercontent.com/deekayen/4148741/raw/01c6252ccc5b5fb307c1bb899c95989a8a284616/1-1000.txt')
and you are all set:)你们都准备好了:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.