简体   繁体   中英

REGEX Query in Google Sheets

Having a bit of a headache with a scraping scenario I'm trying in Google Sheets.

In a nutshell, we want to use Google Sheets with ImportXML to create scraped feed from clients' websites pulling product details.

Here is a link to the smaller version of the doc. https://docs.google.com/a/sprt.co.za/spreadsheets/d/1dSbglYniWa_cijb6yDty576j33CTk9Cf8J38a3VXHSU/edit?usp=sharing

Currently this specific client only has the Item Price, etc details in a text area in the code. So when I use =ImportXml($C$2, "//textarea") it gives me the entire text area across two cells. From these cells, actually only the second one I need to pull out details but I am pretty stuck on the Regex on a piece if data this big.

" { ""id"": ""061013AACI9"", ""productId"": ""061013AACI9"", ""name"": ""VANS MEN'S 
PERFORATED LEATHER ERA"", ""price"": ""R 799.00"", ""oldPrice"": """", ""brand"": 
""Vans"", ""brandURL"": ""/plp/vans/_/N-1z140je"", ""defaultImages"": [ ], 
""images"": [ { ""thumb"": 
""http://tfgsrv.wigroup.co/06/Thumbnail/31460739.jpg"", ""large"": 
""http://tfgsrv.wigroup.co/06/Detail/31460739.jpg"" } , { ""thumb"": 
""http://tfgsrv.wigroup.co/06/ThumbnailAlternative/31460739_01.jpg"", 
""large"": ""http://tfgsrv.wigroup.co/06/DetailAlternative/31460739_01.jpg"" } 
, { ""thumb"": 
""http://tfgsrv.wigroup.co/06/ThumbnailAlternative/31460739_02.jpg"", 
""large"": ""http://tfgsrv.wigroup.co/06/DetailAlternative/31460739_02.jpg"" } 
, { ""thumb"": 
""http://tfgsrv.wigroup.co/06/ThumbnailAlternative/31460739_03.jpg"", 
""large"": ""http://tfgsrv.wigroup.co/06/DetailAlternative/31460739_03.jpg"" } 
], ""transientProfile"": ""true"", ""wishListId"": ""anonymous"", ""colors"": [ { 
""id"": ""31460739"", ""name"": ""White"", ""path"": 
""http://tfgsrv.wigroup.co/06/ColourSwatch/31460739_SW.jpg"", ""activeColor"" : 
true, ""available"" : true } ], ""sizes"": [ { ""id"": ""31460740_06"", ""name"": 
""6"", ""available"": false } , { ""id"": ""31460741_06"", ""name"": ""7"", 
""available"": true } , { ""id"": ""31460742_06"", ""name"": ""8"", ""available"": true 
} , { ""id"": ""31460743_06"", ""name"": ""9"", ""available"": false } , { ""id"": 
""31460744_06"", ""name"": ""10"", ""available"": true } , { ""id"": ""31460745_06"", 
""name"": ""11"", ""available"": false } ], ""productType"" : ""ColourSize"" } "

I need to pull out the R 799.00 value from that mess. So if anyone is willing to help out. Because frankly my talent and skill has run it's course in trying to navigate that with RegEx.

Try this:

""price"":\s""([^"]+)""

Demo

Output:

MATCH 1
1.  [124-132]   `R 799.00`

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM