简体   繁体   中英

How to get HTTP status codes from URLs using Google Refine?

I have a file that contains a long list of URLs. I want to use Google Refine to get HTTP status codes that appear when each URL is open. The URLs are stored in 1 column, 1 URL per 1 cell. The HTTP status codes should be stored in a new column. There are 3 languages available in Google Refine: Clojure, Jython and GREL. I am pretty new in programming.

in Clojure to get a response code you can make a connection and then check the response code. Here is an example that uses only the built in java.net classes so you won't have to include any libraries (I don't know how easy that is from withing this program)

hello.core> (.. (java.net.URL. "http://google.com/index.html")
                openConnection
                getResponseCode)
200

It would be more normal for a clojure application to use an http library such as http-kit to do this more cleanly. So if you can easily include libraries I would take that route and save a couple lines of code.

PS: you may also want to close the connection after

hello.core> (let [connection (.openConnection (java.net.URL. "http://google.com/index.html"))
                  response (.getResponseCode connection)]
              (.. connection      ;; yep, java's strange
                  getInputStream  ;; closing the input stream closes it's conneection
                  close)          ;; so most people use http-kit
              response)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM