Scraping the web is fraught with peril. URLs die; networks get disrupted and best laid plans for building a corups from links can quickly go awry. Use this funtion to mitigate some of the pain of retrieving web resoures.

safePOST(url = NULL, config = list(), timeout = httr::timeout(5),
  ..., body = NULL, encode = c("multipart", "form", "json", "raw"),
  handle = NULL)

Arguments

url

the url of the page to retrieve

config

Additional configuration settings such as http authentication (authenticate), additional headers (add_headers), cookies (set_cookies) etc. See config for full details and list of helpers.

timeout

a call to httr::timeout(). Default timeout is 5 seconds.

...

Further named parameters, such as query, path, etc, passed on to modify_url. Unnamed parameters will be combined with config.

body

One of the following:

  • FALSE: No body. This is typically not used with POST, PUT, or PATCH, but can be useful if you need to send a bodyless request (like GET) with VERB().

  • NULL: An empty body

  • "": A length 0 body

  • upload_file("path/"): The contents of a file. The mime type will be guessed from the extension, or can be supplied explicitly as the second argument to upload_file()

  • A character or raw vector: sent as is in body. Use content_type to tell the server what sort of data you are sending.

  • A named list: See details for encode.

encode

If the body is a named list, how should it be encoded? Can be one of form (application/x-www-form-urlencoded), multipart, (multipart/form-data), or json (application/json).

For "multipart", list elements can be strings or objects created by upload_file. For "form", elements are coerced to strings and escaped, use I() to prevent double-escaping. For "json", parameters are automatically "unboxed" (i.e. length 1 vectors are converted to scalars). To preserve a length 1 vector as a vector, wrap in I(). For "raw", either a character or raw vector. You'll need to make sure to set the content_type() yourself.

handle

The handle to use with this request. If not supplied, will be retrieved and reused from the handle_pool based on the scheme, hostname and port of the url. By default httr requests to the same scheme/host/port combo. This substantially reduces connection time, and ensures that cookies are maintained over multiple requests to the same host. See handle_pool for more details.

Details

This is a thin wrapper for httr::GET() using purrr::safely() that will either return a httr response object or NULL if there was an error. If you need the reason for the error (e.g. Could not resolve host...) you should write your own wrapper.