[Drupal] How to display feeds from a given website programmatically?

| | 4 min read

In this article will be explaining how to get feeds from a site programmatically into a Drupal site by passing its site url. The advantage of this, is that we could get multiple feeds at a time by passing their url. You can also refer previous article How to create Quick Tab Programmatically?, so that you can display the feeds in custom blocks.

The steps in this process are

  • Get feeds url when we pass a website url.
  • Check whether feed url we get is valid xml or not.
  • Get Rss feeds

First step is to get the corresponding feed url when we pass a website url. Since we are displaying it in custom block, will call the function hook_block first.



function custom_module_block($op = 'list', $delta = 0, $edit = array()) {

  if ($op == 'list') {
    // $blocks = array();
    $blocks[0]['info'] = t('Feeds');
    $blocks[0]['cache'] = BLOCK_NO_CACHE;
    return $blocks;
  }
  elseif ($op == 'view') {
  $location = "http://www.zyxware.com/home";
  if($location != '') {
    $html = file_get_contents($location);
    $feedurl= custom_module_get_feed_url($html, $location);
  }
  //$block = array();  
  switch ($delta) {
      case 0:
        $block['subject'] = 'Feeds from '.$location.'';
        if(($feedurl != false) || $feedurl != '') {
            $block['content'] =  custom_module_get_feeds($feedurl);
        }
        break;
    }
    return $block;
  }
}

In this you can see we had called two functions

  • custom_module_get_feed_url($html, $location) - to get the feed url
  • custom_module_get_feeds($feedurl) - to get the feeds

Now lets start with the function to get feed url. In this function we are passing two arguments $html and $location. In variable $location we are passing the website url and in $html we are passing the contents of this web page. In this function it will either return feed url or will return false.


//function to get feed url from a given website url url
function custom_module_get_feed_url($html, $location) {
  if(!$html or !$location) {
    return false;
  }
  else {
    #search through the HTML, save all <link> tags
    # and store each link's attributes in an associative array
    preg_match_all('/<link\s+(.*?)\s*\/?>/si', $html, $matches);
    $links = $matches[1];
    $final_links = array();
    $link_count = count($links);
    for($n=0; $n<$link_count; $n++) {
      $attributes = preg_split('/\s+/s', $links[$n]);
      foreach($attributes as $attribute) {
        $att = preg_split('/\s*=\s*/s', $attribute, 2);
        if(isset($att[1])) {
          $att[1] = preg_replace('/([\'"]?)(.*)\1/', '$2', $att[1]);
          $final_link[strtolower($att[0])] = $att[1];
        }
      }
      $final_links[$n] = $final_link;
    }
    #now figure out which one points to the RSS file
    for($n=0; $n<$link_count; $n++) {
      if(strtolower($final_links[$n]['rel']) == 'alternate') {
        if(strtolower($final_links[$n]['type']) == 'application/rss+xml') {
          $href = $final_links[$n]['href'];
        }
        if(!$href and strtolower($final_links[$n]['type']) == 'text/xml') {
          #kludge to make the first version of this still work
          $href = $final_links[$n]['href'];
        }
        if($href) {
          if(strstr($href, "http://") !== false) { #if it's absolute
            $full_url = $href;
          }
          else { #otherwise, 'absolutize' it
            $url_parts = parse_url($location);
            #only made it work for http:// links. Any problem with this?
            $full_url = "http://$url_parts[host]";
            if(isset($url_parts['port'])) {
              $full_url .= ":$url_parts[port]";
            }
            if($href{0} != '/') { #it's a relative link on the domain
              $full_url .= dirname($url_parts['path']);
              if(substr($full_url, -1) != '/'){
                #if the last character isn't a '/', add it
                $full_url .= '/';
              }
            }
            $full_url .= $href;
          }
          return $full_url;
        }
      }
    }
    return false;
  }
}

Now will pass the feed url to the function custom_module_get_feeds. In this function will first call file_get_contents and will pass the result to custom_module_checkurl_validXML(). In this function will check whether the code passed is valid XML or not and if yes will return true. The function will return an output as a string if valid url is passed.


//function to display feeds
function custom_module_get_feeds($feed_url) {
	
  $content = file_get_contents($feed_url);
	$checkcontent = custom_module_checkurl_validXML($content);
	if($checkcontent == true) {
	  $x = new SimpleXmlElement($content);
	  $output = '';
	  $output .= "<ui>";
	  foreach($x->channel->item as $entry) {
	    $output .= "<li>";
		  $output .= '' . $entry->title . '';
      $output .= "</li>";
	  }
	  $output .= "</ul>";
    return $output; 
  }
}

//function to check whether the url is valid XML or not
function custom_module_checkurl_validXML($xmlContent) {
  libxml_use_internal_errors(true);
  $doc = new DOMDocument('1.0', 'utf-8');
  $doc->loadXML($xmlContent);
  $errors = libxml_get_errors();
  if (empty($errors)) {
    return true;
  }
  $error = $errors[ 0 ];
  if ($error->level < 3) {
    return true;
  }
  $lines = explode("r", $xmlContent);
  $line = $lines[($error->line)-1];
  $message = $error->message . ' at line ' . $error->line . ': ' . htmlentities($line);
  return false;
}

Now once its done we can enable this block from admin/build/block/list and displaye it in required region. We have enabled this feature in our website Top Drupal Sites.

Reference URL