[英]Data Scraping With ImportHTML in Apps Script & Google Sheets
Goal : I am trying to pull data from a website and use it to create a big table.目标:我正在尝试从网站中提取数据并使用它来创建一个大表。 I can tell that I'm very close to getting this to work, but I've reached a roadblock.
我可以说我非常接近让它发挥作用,但我已经遇到了障碍。
Background : I have a google sheet with three pages.背景:我有一个三页的谷歌表。 (1) Titled "tickers" is a list of every ticker in the S&P 500, in rows A1-A500.
(1) 标题为“tickers”的是标普 500 中每个股票代码的列表,位于 A1-A500 行。 (2) Titled actionField is just a blank page used during the script.
(2) Titled actionField 只是脚本中使用的空白页。 (3) Titled resultField will hold the results.
(3) 有标题的 resultField 将保存结果。 The website I am pulling from is ( http://www.reuters.com/finance/stocks/companyOfficers?symbol=V ) Though, I want the script to work (with minor modification) for any data accessible through importHtml.
我从中提取的网站是 ( http://www.reuters.com/finance/stocks/companyOfficers?symbol=V ) 不过,我希望脚本能够(稍作修改)处理通过 importHtml 访问的任何数据。
Script : The script I currently have is as follows:脚本:我目前拥有的脚本如下:
function populateData() {
var googleSheet = SpreadsheetApp.getActive();
// Reading Section
var sheet = googleSheet.getSheetByName('tickers');
var tickerArray = sheet.getDataRange().getValues();
var arrayLength = tickerArray.length;
var blankSyntaxA = 'ImportHtml("http://www.reuters.com/finance/stocks/companyOfficers?symbol=';
var blankSyntaxB = '", "table", 1)';
// Writing Section
for (var i = 0; i < arrayLength; i++)
{
var sheet = googleSheet.getSheetByName('actionField');
var liveSyntax = blankSyntaxA+tickerArray[i][0]+blankSyntaxB;
sheet.getRange('A1').setFormula(liveSyntax);
Utilities.sleep(5000);
var importedData = sheet.getDataRange().getValues();
var sheet = googleSheet.getSheetByName('resultField');
sheet.appendRow(importedData)
}
}
This successfully grabs the ticker from the tickers page.这成功地从股票代码页面抓取了股票代码。 Calls importHtml.
调用 importHtml。 Copies the data.
复制数据。 And appends SOMETHING to the right page.
并将某些内容附加到右侧页面。 It loops through and does this for each item in the ticker list.
它循环并为股票列表中的每个项目执行此操作。
However, the data being appended is as follows:但是,附加的数据如下:
[Ljava.lang.Object;@42782e7c
[Ljava.lang.Object;@2de9f184
[Ljava.lang.Object;@4b86a4d0
That displays across many columns, for as many rows as there are iterations in the loop.它显示在许多列中,对于与循环中的迭代一样多的行。
How do I successfully append the data?如何成功追加数据?
(And any advice on improving this script?) (以及有关改进此脚本的任何建议?)
The appendRow method is not suitable here. appendRow 方法不适合这里。 As it only appends one row, its argument is expected to be a 1D array of values.
因为它只附加一行,所以它的参数应该是一个一维数组。
What you get from getValues
is normally a 2D array of values, like [[a,b], [c,d]]
.您从
getValues
获得的通常是一个二维值数组,例如[[a,b], [c,d]]
。 Even if it's just one row, getValues
will return [[a,b]]
.即使只有一行,
getValues
也会返回[[a,b]]
。 The only exception is a single-cell range, for which you get just the value in that cell.唯一的例外是单单元格范围,您只能获得该单元格中的值。 It's never a 1D array.
它永远不是一维数组。
If just one row is needed, use, eg, appendRow(importedData[0])
.如果只需要一行,请使用例如
appendRow(importedData[0])
。
Otherwise, insert the required number of rows and assign the 2D array of values to them.否则,插入所需数量的行并将值的二维数组分配给它们。
var sheet = googleSheet.getSheetByName('resultField');
var lastRow = sheet.getLastRow();
sheet.insertRowsAfter(lastRow, importedData.length);
sheet.getRange(lastRow + 1, 1, importedData.length, importedData[0].length)
.setValues(importedData);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.