简体   繁体   English

Ruby newb:如何提取子字符串?

[英]Ruby newb: how do I extract a substring?

I'm trying to scrape a CBS sports page for shot data in the NBA. 我试图在CBS体育页面上抓取NBA的射门数据。 Here is the page I'm starting out with and using as a sample: http://www.cbssports.com/nba/gametracker/shotchart/NBA_20131115_MIL@IND 这是我开始使用并用作示例的页面: http : //www.cbssports.com/nba/gametracker/shotchart/NBA_20131115_MIL@IND

In the source, I found a string that contains all the data that I need. 在源代码中,我找到了一个字符串,其中包含我需要的所有数据。 This string, in the webpage source code, is directly under var CurrentShotData = new. 在网页源代码中,该字符串直接位于var CurrentShotData = new下。

What I want is to turn this string in the source into a string I can use in ruby. 我想要的是将源中的此字符串转换为可在ruby中使用的字符串。 However, I'm having some trouble with the syntax. 但是,我在语法上遇到了麻烦。 Here's what I have. 这就是我所拥有的。

require 'nokogiri'
require 'mechanize'

a = Mechanize.new
a.get('http://www.cbssports.com/nba/gametracker/shotchart/NBA_20131114_HOU@NY') do    |page|
shotdata = page.body.match(/var currentShotData = new String\(\"(.*)\"\)\; var  playerDataHomeString/m)[1].strip
print shotdata
end

I know I must be doing this wrong... it seems so needlessly complex and on top of that it isn't working for me. 我知道我一定做错了……似乎不必要的复杂,最重要的是它对我没有用。 Could someone enlighten me on the simple way to get this string into Ruby? 有人能启发我将字符串转换成Ruby的简单方法吗?

Try to replace: 尝试更换:

shotdata = page.body.match(/var currentShotData = new String\(\"(.*)\"\)\; var  playerDataHomeString/m)[1].strip

with: 与:

shotdata = page.body.match(/var currentShotData = new String\(\"(.*?)\"\)\; var  playerDataHomeString/m)[1].strip

changing the (.*) with (.*?) will cause a lazy evaluation (matching of minimal number of characters) of the string which is the behavior you want. (.*?)更改(.*)将导致对字符串的惰性计算最小字符数匹配),这是您想要的行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM