This page demonstrates Python tips and tricks that I use in my everyday programming as an atmospheric science graduate student.
-Brian Blaylock

Friday, March 24, 2017

Multiprocessing vs Multithreading

I need to download files fast. I want to swamp my network cables to download as much as it can consume. In the past I've used multiprocessing because it was easy to split jobs to several processors. But it is more efficient to use multithreading.

Multithreading uses one processor, but you can set up a queue so that it will continuously run as much as it can handle.

As an example, here is some code...

# Brian Blaylock
# March 24, 2017                           Yesterday Salt Lake had lots of rain

"""
Fast download of HRRR grib2 files with threading
"""

from queue import Queue 
from threading import Thread
from datetime import datetime, timedelta
import numpy as np
import urllib2
import re
import os

var='TMP:2 m'

def download(URL):
    # Code to download something based on a URL
    # ...
def worker():
    while True:
        item = q.get()
        print "number:", item
        download(item)
        q.task_done()

num_of_threads = 10

q = Queue()
for i in range(num_of_threads):
    t = Thread(target=worker)
    t.daemon = True
    t.start()

# List of URL's to download from
URL_list = ['...','...','...']

timer1 = datetime.now()
for item in URL_list:
    q.put(item)

q.join()       # block until all tasks are done


There is speed up when downloading, and the copper wires are saturated with as much as it can handle with about 10 threads or ten processors.
Here is a visual of what I learned: