Curl: Copying the Webpage contents to Memory buffer

Posted by Joys of Programming on in C/C++, Curl, Web

The webpage contents can be copied to a file or standard output using curl. But in most of the cases, you will need to process the data after downloading the web contents. This can also be done using curl. The CURLOPT_WRITEDATA option must be pointed to the pointer where you want to copy the data. This option works along with the CURLOPT_WRITEFUNCTION meant to specify the function to copy the contents to a file/standard output or memory.

The following program shows how to copy the data of a webpage to memory. We are also printing the webpage contents


#include <curl/curl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define WEBPAGE_URL "http://localhost"

typedef struct {
 char *contents;
 int size;
} data;

/*Curl uses this function to write the contents of a webpage to a file/stdout*/
size_t write_data( void *ptr, size_t size, size_t nmeb, void *stream)
{
 data *curl_output = (data *)stream;
 int curl_output_size = size * nmeb;

 curl_output->contents = (char *) realloc(curl_output->contents, curl_output->size + curl_output_size + 1);
 if (curl_output->contents) {
 memcpy(curl_output->contents, ptr, curl_output_size); /*Copying the contents*/
 curl_output->size += curl_output_size;
 curl_output->contents[curl_output->size] = 0;
 }
}

int main()
{

 data webpage;
 webpage.contents = malloc(1);
 webpage.size = 1;

 CURL *handle = curl_easy_init();
 curl_easy_setopt(handle,CURLOPT_URL,WEBPAGE_URL); /*Using the http protocol*/

 curl_easy_setopt(handle,CURLOPT_WRITEFUNCTION, write_data); /*Setting up the function meant to copy data*/
 curl_easy_setopt(handle,CURLOPT_WRITEDATA, &webpage); /*The data pointer to copy the data*/
 curl_easy_perform(handle);
 curl_easy_cleanup(handle);
 int i;
 printf("Contents: %s",webpage.contents);
}

Here we are using the localhost file to access the data. You can specify any other URL by setting WEBPAGE_URL accordingly.

Compile and execute the program.

The output is


Contents: <html><body><h1>It works!</h1>
<p>This is the default web page for this server.</p>
<p>The web server software is running but no content has been added, yet.</p>
</body></html>


Tags: , , ,

Comments:

4 Comments

  • Thanks for this wonderful code, now I have something to start on after 2 days of of net browsing.

  • Joys of Programming says:

    @Peter Thanks

  • Prasanth Madhavan says:

    That really helped a lot. Thanks. Can u give a sample code in C++ using classes that does the same thing??

  • Alex says:

    A Useful example but you need to return the number of bytes written in the write_data function or you will only get one chunk.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Copyright © 2009-2012 Joys of Programming All rights reserved.
Desk Mess Mirrored v1.8.1 theme from BuyNowShop.com.