[英]How to read all csv files from web page in a pandas data frame
I would like to load all the csv files from the following webpage to a data frame 我想将以下网页中的所有csv文件加载到数据框中
https://s3.amazonaws.com/tripdata/index.html https://s3.amazonaws.com/tripdata/index.html
I tried with glob as for loading all files from a directory without success: 我尝试使用glob来从目录加载所有文件而没有成功:
import glob
path ='https://s3.amazonaws.com/tripdata' # use your path
allFiles = glob.glob(path + "/*citibike-tripdata.csv.zip")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
df = pd.read_csv(file_, index_col=None, header=0)
list_.append(df)
frame = pd.concat(list_)
Any suggestions? 有什么建议么?
The module glob
is used for finding pathnames matching patterns on the same system as Python is running in, and there is no way for it to index arbitrary file hosting web servers (which isn't even possible a priori). 模块
glob
用于在与运行Python的系统相同的系统上查找与模式匹配的路径名,并且它无法索引任意文件托管Web服务器(甚至不可能)。 In your case, since https://s3.amazonaws.com/tripdata/ provides the desired index, you could parse that to get the relevant files: 在您的情况下,由于https://s3.amazonaws.com/tripdata/提供了所需的索引,因此您可以解析该索引以获得相关文件:
import re
import requests
url = 'https://s3.amazonaws.com/tripdata/'
t = requests.get(url).text
filenames = re.findall('[^>]+citibike-tripdata\.csv\.zip', t)
frame = pd.concat(pd.read_csv(url + f) for f in filenames)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.