Last Updated: February 25, 2016
·
1.025K
· blazeeboy

Scrap UK top 40 singles chart from BBC

this is small but useful script, it could be used as a plugin for sublimeText, CMS, news website to get the top 40 UK singles from BBC website.
script gets teh printable format of the chart and extract data then form them as array of Hash maps easy to iterate on and filter
the original full chart is here : http://www.bbc.co.uk/radio1/chart/singles and they got a printable format page here : http://www.bbc.co.uk/radio1/chart/singles/print so it was easier for us to scrap the printable version.
have fun using it.

Gist : https://github.com/blazeeboy/RubyScripts/tree/master/2014-4-30

#!/usr/bin/env ruby
# Author : Emad Elsaid (https://github.com/blazeeboy)

# it turns out that BBC website has a printable version of
# the top 40 UK singles chart, that made me jump of joy :D
require 'open-uri'

# get BBC singles chart printable version
page = open('http://www.bbc.co.uk/radio1/chart/singles/print').read
# result has data as table so we'll extract keys from TH tags
keys = page.scan(/<th>(.+)<\/th>/).map{ |k| k.first.downcase }
# extract cells from TD tags
cells = page.scan(/<td>(.*)<\/td>/).map{ |c| c.first }
# split cells to arrays each equal to keys
rows = cells.each_slice keys.size

# container to join data as Hash objects and push to it
data = []
# now iterate on each row and join keys with their
# respective values then convert them to arrays
rows.each do |row|
  data << Hash[ [keys, row].transpose ] # this is a good trick ;)
end

# show us what you got sir.
puts data

2 Responses
Add your response

Perhaps you want to avoid dependencies, but you might want to consider Nokogiri instead of regex. At the bottom you can also just use map instead of a side-effecting each:

data = rows.map do |row|
  Hash[ [keys, row].transpose ]
end
over 1 year ago ·

well about nokojiri, i wanted to avoid it as at some systems it needs XML library and requires some additional installation,
about the last block, yes you are right, i think this would be even better, thanks for this ^_^

over 1 year ago ·