Curl: Copying the Webpage contents to Memory buffer
The webpage contents can be copied to a file or standard output using curl. But in most of the cases, you will need to process the data after downloading the web contents. This can also be done using curl. The CURLOPT_WRITEDATA option must be pointed to the pointer where you want to copy the data. This option works along with the CURLOPT_WRITEFUNCTION meant to specify the function to copy the contents to a file/standard output or memory.
The following program shows how to copy the data of a webpage to memory. We are also printing the webpage contents
#include <curl/curl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define WEBPAGE_URL "http://localhost"
typedef struct {
char *contents;
int size;
} data;
/*Curl uses this function to write the contents of a webpage to a file/stdout*/
size_t write_data( void *ptr, size_t size, size_t nmeb, void *stream)
{
data *curl_output = (data *)stream;
int curl_output_size = size * nmeb;
curl_output->contents = (char *) realloc(curl_output->contents, curl_output->size + curl_output_size + 1);
if (curl_output->contents) {
memcpy(curl_output->contents, ptr, curl_output_size); /*Copying the contents*/
curl_output->size += curl_output_size;
curl_output->contents[curl_output->size] = 0;
}
}
int main()
{
data webpage;
webpage.contents = malloc(1);
webpage.size = 1;
CURL *handle = curl_easy_init();
curl_easy_setopt(handle,CURLOPT_URL,WEBPAGE_URL); /*Using the http protocol*/
curl_easy_setopt(handle,CURLOPT_WRITEFUNCTION, write_data); /*Setting up the function meant to copy data*/
curl_easy_setopt(handle,CURLOPT_WRITEDATA, &webpage); /*The data pointer to copy the data*/
curl_easy_perform(handle);
curl_easy_cleanup(handle);
int i;
printf("Contents: %s",webpage.contents);
}
Here we are using the localhost file to access the data. You can specify any other URL by setting WEBPAGE_URL accordingly.
Compile and execute the program.
The output is
Contents: <html><body><h1>It works!</h1> <p>This is the default web page for this server.</p> <p>The web server software is running but no content has been added, yet.</p> </body></html>