python-分析抓取数据时的最佳方法

Question

Newbie here. 新手在这里。 I have managed to put together a script which scrapes some information from a website. 我设法整理了一个脚本，从网站上抓取了一些信息。 This happens daily, and the data is saved on a csv file. 这每天都会发生，并且数据保存在一个csv文件中。 content of each file is similar to this: 每个文件的内容与此类似：

date, ticker, company name, momentum indicator, other ratios....
2016-08-19, GSK, GlaxoSmithKline, 42, ....
2016-08-19, RDSB, Royal Dutch Shell, 98, .....
....

I have accumulated 3 months worth of daily data, so around 80 files. 我已经积累了3个月的每日数据，因此大约有80个文件。 (Every row in the file has the same date and then the different shares). （文件中的每一行具有相同的日期，然后具有不同的份额）。 What I would like to do now is to check, on a share by share basis, the evolution of the momentum indicator and other ratios. 我现在想做的是逐个检查动量指标和其他比率的变化。

for example, I think I should end up with a series of lists such as 例如，我想我应该以一系列列表结尾，例如

GSK_momentum_indicator = (42, 43, 38, 47,...) 
RDSB_momentum_indicator = (98, 91, 77, 79,...)

Now, as a newbie, I have 2 questions: 1) what do you think is the best approach for this? 现在，作为一个新手，我有两个问题：1）您认为对此最好的方法是什么？ Is it using lists, dictionaries, anything else? 它使用列表，字典还是其他东西吗？ 2) how did you decide the above? 2）您是如何决定以上几点的？ are there guidelines for which strategy to use? 是否有使用哪种策略的指南？ is there a good resource I can read as a newbie to learn more about this subject? 作为新手，我可以阅读很多有用的资源以了解有关此主题的更多信息吗？

thanks! 谢谢！

PS. PS。 in case it makes a difference, I'm using python 3.5.2. 以防万一，我正在使用python 3.5.2。

Answer 1

In order to process the data you've collected, you could use one of the python modules, csv or pandas . 为了处理您收集的数据，可以使用python模块之一， csv或pandas 。 The csv module is used to read/write data from/to csv files and then you can convert the data into python lists and dictionaries and use accordingly. csv模块用于从csv文件读取数据/向csv文件写入数据，然后您可以将数据转换为python列表和字典并相应地使用。 For detailed docs go here . 有关详细文档，请转到此处。

But if you have large dataset then you should go for pandas which a specialized tool for data analysis. 但是，如果您有大型数据集，则应该选择pandas ，这是一种用于数据分析的专用工具。 The pandas.read_csv function takes the name of the csv file as argument and returns a DataFrame object on which you can perform various operation. pandas.read_csv函数将csv文件的名称作为参数，并返回一个DataFrame对象，可以在其上执行各种操作。 For detailed docs go here . 有关详细文档，请转到此处。

python-分析抓取数据时的最佳方法

问题描述

1 个解决方案

解决方案1
1 2016-08-22 11:29:46

python-分析抓取数据时的最佳方法

问题描述

1 个解决方案

解决方案1 1 2016-08-22 11:29:46

解决方案1
1 2016-08-22 11:29:46