uz2noa
Last Updated: May 29, 2016
·
3.806K
· tirkarthi
None

Fast Parallel downloads in Golang with Accept-Ranges and Goroutines

Downloading large files are always a tedious task due to latency, sequential nature of the program that downloads the file and broken downloads. HTTP has the header 'Accept-Ranges' and if its enabled by the server it lets us to download the data for a given range of bytes. So if we have a file of size 10000 bytes present in the server. The we can make a HEAD request to the server to check 'Accept-Ranges' header. The HEAD request also returns the 'Content-Length' without downloading the entire file. So we can download the bytes from '900-1000' by sending the 'Range = 900-1000' in the header of the GET request to get only that data. So we split up the requests with ranges as separate Go-routines and then write the data to temp files. So if we can to spawn up 5 Go-routines to download a 414 byte file with each one downloading 100 bytes. It will be as follows :

  • 1-100 bytes - 1
  • 101-200 bytes - 2
  • 201-300 bytes - 3
  • 301-400 bytes - 4
  • 400-414 bytes - 5

As each of the file is separately downloaded then we read all the files and write out to desired output file in sequence to get the file. The results turn out to be quite fast than using wget for certain factors of Go routines.


package main

import (
    "io/ioutil"
    "net/http"
    "strconv"
    "sync"
)

var wg sync.WaitGroup

func main() {
    res, _ := http.Head("http://localhost/rand.txt"); // 187 MB file of random numbers per line
    maps := res.Header
    length, _ := strconv.Atoi(maps["Content-Length"][0]) // Get the content length from the header request
    limit := 10 // 10 Go-routines for the process so each downloads 18.7MB
    len_sub := length / limit // Bytes for each Go-routine
    diff := length % limit // Get the remaining for the last request
    body := make([]string, 11) // Make up a temporary array to hold the data to be written to the file
    for i := 0; i < limit ; i++ {
        wg.Add(1)

        min := len_sub * i // Min range
        max := len_sub * (i + 1) // Max range

        if (i == limit - 1) {
            max += diff // Add the remaining bytes in the last request
        }

        go func(min int, max int, i int) {
            client := &http.Client {}
            req, _ := http.NewRequest("GET", "http://localhost/rand.txt", nil)  
            range_header := "bytes=" + strconv.Itoa(min) +"-" + strconv.Itoa(max-1) // Add the data for the Range header of the form "bytes=0-100"
            req.Header.Add("Range", range_header)
            resp,_ := client.Do(req)
            defer resp.Body.Close()
            reader, _ := ioutil.ReadAll(resp.Body)
            body[i] = string(reader)
            ioutil.WriteFile(strconv.Itoa(i), []byte(string(body[i])), 0x777) // Write to the file i as a byte array
            wg.Done()
            //          ioutil.WriteFile("new_oct.png", []byte(string(body)), 0x777)
        }(min, max, i)
    }
    wg.Wait()
}

/*

alias combine="perl -E 'say for 0..10' | xargs cat > output.txt"
alias clean-dir="ls -1 | perl -ne 'print if /^\d+$/' | xargs rm"
alias verify="diff /var/www/rand.txt output.txt"

Combine - read the files and append them to the text file
Clean - Remove all the files in the folder which are numbers. (Temp files)
Verify - Verify the diff of the files from the current directory to the original file
*/

/*

Results : 

Parallel : 

10 Go routines :

real    0m4.349s
user    0m0.484s
sys 0m0.356s

60 Go routines : 

real    0m0.891s
user    0m0.484s
sys 0m0.432s

Wget :

real    0m19.536s
user    0m4.652s
sys 0m0.580s

Combine files : perl -E 'say for 0..59' | xargs cat > output.txt

real    0m1.532s
user    0m0.000s
sys 0m0.244s


*/
Say Thanks
Respond

1 Response
Add your response

27679

This is a nice one. However, with the line

range_header := "bytes=" + strconv.Itoa(min) +"-" + strconv.Itoa(max-1)`,

isn't it going to skip the last byte? Shouldn't it be like

` if i != (threads - 1) {
rangeheader = "bytes=" + strconv.Itoa(min) +"-" + strconv.Itoa(max-1) // Add the data for the Range header of the form "bytes=0-100"
}else {
range
header = "bytes=" + strconv.Itoa(min) +"-" + strconv.Itoa(max) // Add the data for the Range header of the form "bytes=0-100"

}

`

over 1 year ago ·