Last Updated: February 25, 2016
·
3.709K
· trakout

Parse info from an unlimited amount of Tumblr blogs - PHP

Let's say, by pure example, that we would want a front end to reflect that of a Tumblr theme. Only for whatever variety of reasons, we'd want to grab content from a dynamic and unlimited amount of Tumblr accounts.

Everything done here is fairly standard as far as XML parsing and MySQL connections go. Essentially we're grabbing each Tumblr account URL, accessing and storing the info from the entire account 20 posts at a time, and moving onto the next url. There's some pretty snazzy loop action going on here.

Only photo posts are stored in this example, which includes the main image, 400x400 thumbnail, caption, and associated tags. Also before each db insert, there is a check to prevent duplicate rows.

The code is a tad long - I would recommend C&P'ing into your favorite text editor to take a look. I haven't bothered to break it up as I personally appreciate code samples in their entirety, and it's already internally commented.

PHP

<?php
/*    
db info
===============================================
make sure the following tablenames exist with the following columns:
tablename - columnname[type], ...

tumblr - id[int11, primary, autoincrement], thumbnail[varchar255], image[varchar255], caption[varchar255], tags[varchar255]
===============================================
*/

$hostname = 'localhost';
$dbname = 'name';
$user = 'user';
$pass = 'pass';

/*
tumblr urls
===============================================
tumblr urls must go here using urls[x] array convention (in order from 0-x),
along with the following post vars:

urls[x] = http://username.tumblr.com/
===============================================
*/

$urls[0] = "http://username1.tumblr.com/";
$urls[1] = "http://username2.tumblr.com";
$urls[2] = "http://username3.tumblr.com/";

/*
===============================================
Let's begin
===============================================
*/


ini_set("user_agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
mysql_connect($hostname,$user,$pass);
@mysql_select_db($dbname) or die("Unable to select table");

$count = 0;
// resulting append looks like this: '/api/read/?num=20&type=photo'
$api = '/api/read/';
$num = '?num=20';
$phototype = '&type=photo';
$start = '&start=';

$startvar = 0; 
$realstart = $startvar;
$finish = 0;

//iterate through all of the tumblr url's given
foreach ($urls as $url) {

    //if '/' is at the end of url, clean it off
    if ((strripos($url, '/')) == ((strlen(utf8_decode($url))) - 1)) { $url = substr_replace($url ,"",-1); }

    while ($finish == 0) {
        $urly = $url . $api . $num . $phototype . $start . $startvar;
        $xml[$count] = simplexml_load_file($urly);

        //array of all posts
        $posts = $xml[$count]->posts;
        $thumbnail = $posts->post[1]->{'photo-url'}[2];

        //for each post
        for ($i = 0; $i < count($posts->post); $i++) {
            $thumbnail = $posts->post[$i]->{'photo-url'}[2];
            $image = $posts->post[$i]->{'photo-url'}[0];
            $caption = $posts->post[$i]->{'photo-caption'};

            $tag_temp = $posts->post[$i]->{'tag'};
            $tags = NULL;
            foreach ($tag_temp as $tag) {
                $tags .= $tag . ', ';
            }

            //We can't have any duplicates in our db, can we
            $duplicate = mysql_num_rows(mysql_query("SELECT * FROM tumblr WHERE thumbnail='$thumbnail'"));

            if ($duplicate == 0) {
                mysql_query("INSERT INTO tumblr VALUES ('','$thumbnail','$image','$caption','$tags')");
            }
        }

        //if the thumbnail is empty, we must be finished. Let's exit the while loop and move onto the next url!
        if ($thumbnail == '') {
            $finish = 1;
        }

        //this will help us grab 20 posts at a time.
        $startvar = $startvar + 20;
    }
    //reset things up
    $finish = 0;
    $startvar = $realstart;
    $count++;
}

mysql_close();

?>

DEMO
http://codepad.viper-7.com/ZEsoMO

Keep in mind that since this is a demo, it's modified to not use MySQL (It just prints everything out for you to see). Also it may take up to 20 seconds to run, as there is quite a bit of parsing action going on there.

If anyone has questions (and critiques), let me know!