简体   繁体   English

python-分析抓取数据时的最佳方法

[英]python - best approach when analysing scraped data

Newbie here. 新手在这里。 I have managed to put together a script which scrapes some information from a website. 我设法整理了一个脚本,从网站上抓取了一些信息。 This happens daily, and the data is saved on a csv file. 这每天都会发生,并且数据保存在一个csv文件中。 content of each file is similar to this: 每个文件的内容与此类似:

date, ticker, company name, momentum indicator, other ratios....
2016-08-19, GSK, GlaxoSmithKline, 42, ....
2016-08-19, RDSB, Royal Dutch Shell, 98, .....
....

I have accumulated 3 months worth of daily data, so around 80 files. 我已经积累了3个月的每日数据,因此大约有80个文件。 (Every row in the file has the same date and then the different shares). (文件中的每一行具有相同的日期,然后具有不同的份额)。 What I would like to do now is to check, on a share by share basis, the evolution of the momentum indicator and other ratios. 我现在想做的是逐个检查动量指标和其他比率的变化。

for example, I think I should end up with a series of lists such as 例如,我想我应该以一系列列表结尾,例如

GSK_momentum_indicator = (42, 43, 38, 47,...) 
RDSB_momentum_indicator = (98, 91, 77, 79,...)

Now, as a newbie, I have 2 questions: 1) what do you think is the best approach for this? 现在,作为一个新手,我有两个问题:1)您认为对此最好的方法是什么? Is it using lists, dictionaries, anything else? 它使用列表,字典还是其他东西吗? 2) how did you decide the above? 2)您是如何决定以上几点的? are there guidelines for which strategy to use? 是否有使用哪种策略的指南? is there a good resource I can read as a newbie to learn more about this subject? 作为新手,我可以阅读很多有用的资源以了解有关此主题的更多信息吗?

thanks! 谢谢!

PS. PS。 in case it makes a difference, I'm using python 3.5.2. 以防万一,我正在使用python 3.5.2。

In order to process the data you've collected, you could use one of the python modules, csv or pandas . 为了处理您收集的数据,可以使用python模块之一, csvpandas The csv module is used to read/write data from/to csv files and then you can convert the data into python lists and dictionaries and use accordingly. csv模块用于从csv文件读取数据/向csv文件写入数据,然后您可以将数据转换为python列表和字典并相应地使用。 For detailed docs go here . 有关详细文档,请转到此处

But if you have large dataset then you should go for pandas which a specialized tool for data analysis. 但是,如果您有大型数据集,则应该选择pandas ,这是一种用于数据分析的专用工具。 The pandas.read_csv function takes the name of the csv file as argument and returns a DataFrame object on which you can perform various operation. pandas.read_csv函数将csv文件的名称作为参数,并返回一个DataFrame对象,可以在其上执行各种操作。 For detailed docs go here . 有关详细文档,请转到此处

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM