A while back, we looked at Diffbot, the machine learning AI for processing web pages, as a means to extract SitePoint author portfolios. That tutorial focused on using the Diffbot UI only, and consuming the API created would entail pinging the API endpoint manually. Additionally, since then, the design of the pages we processed has changed, and thus the API no longer reliably works.

In this tutorial, apart from rebuilding the API so that it works again, we’ll use the official Diffbot client to build custom entities that correspond to the data we seek (author portfolios).

Diffbot logo

Bootstrapping

We’ll be using Homestead Improved as usual. The following few commands will bootstrap the Vagrant box, create the project folder, and install the Diffbot client.

git clone https://github.com/swader/homestead_improved hi_diffbot_authorfolio; cd hi_diffbot_authorfolio
./bin/folderfix.sh
vagrant up; vagrant ssh
mkdir -p Code/Laravel/public; cd Code/Laravel; touch public/index.php
composer require swader/diffbot-php-client

Additionally, we can install Symfony’s vardumper as a development requirement, just to get prettier debug outputs.

composer require symfony/var-dumper --dev

If we now give index.php the following content, provided we added homestead.app to our host machine’s /etc/hosts file, we should see “Hello world” if we visit http://homestead.app in our browser:

<?php
// index.php

require '../vendor/autoload.php';

echo "Hello World";

Diffbot Initialization

Note that to follow along, you’ll need a free Diffbot token – get one here.

define('TOKEN', 'token');
use SwaderDiffbotDiffbot;

$d = new Diffbot(TOKEN);

This is all we need to init Diffbot. Let’s test it on a sample article.

Continue reading %Powerful Custom Entities with the Diffbot PHP Client%

Source: SitePoint