Makina Blog

Le blog Makina-corpus

Python tutorial : Understanding Python threading


As many others languages, Python provides a great and simple library to use threads. This library includes all required objects and functions that you will need to do parallel programming and manage concurrent data access between threads.

This article explains how to use python threads with a simple and common example based on a bank account. It will provide you the basic understanding of python thread in order to introduce you to the Python Global Interpreter Lock issue covered in another article.

Bank Account example

In this tutorial example we propose to discover threads using a bank account use case : we will provide a BankAccount class in charge of managing your money – deposit only !

Then we will create customers that will make deposit on your account using threads, all at the same time.
So quickly we should be facing data concurrent access issues that we will clearly show and then solve using locks.

Bank Account class

The first version of our bank account class will be this one :

class BankAccount():
  def __init__(self, initial_money=0, owner='Anonymous'):
    self.money = initial_money
    self.owner = owner
    # We will keep each write access to money in an history file
    # In order to understand what Python does with your money
    self.history_file = open('/tmp/%s' % (owner,), 'w')

  def execute_deposit(self, amount, by='A customer'):

    self.history_file.write('Customer %s is adding %s to bank account of %s containing %s\n' % (by, amount, self.owner, self.money) )
    self.money += amount
    self.history_file.write('Account money after %s deposit: %s\n' % (by, self.money) )

  def __del__(self):
    self.history_file.close()

 

my_account = BankAccount(1000, "WorldCompanyBigBoss")
my_account.execute_deposit(100, "First customer")
my_account.execute_deposit(200, "Second customer")

Then run it to check it is working fine :

$ python3 bank_account.py
$ cat /tmp/WorldCompanyBigBoss
Customer First customer is adding 100 to bank account of WorldCompanyBigBoss containing 1000
Account money after First customer deposit: 1100
Customer Second customer is adding 200 to bank account of WorldCompanyBigBoss containing 1100
Account money after Second customer deposit: 1300

Making deposit using threads

Now that our bank account is working fine, lets make deposit in parrallel, as we have many customers all around the world and our services are working 24/24 non stop.
To do this we will use threads by importing the threading module.

import threading

To create a thread we just have to instantiate an object from threading.Thread.

The Thread class accepts many arguments. The target argument represents the function/callable object to execute when the thread will be started. In our case this will be the execute_deposit method of our BankAccount class.
The Thread class also accepts args or kwargs arguments containing the attributes to pass to the target object when running the thread.
Finally the Thread.start() method will run a thread.

Below is the code to run 10 threads at once making a deposit :

for num_thread in range(1, 11):
  t = threading.Thread(target=BankAccount.execute_virement, args=(my_account, 5000,'Customer %d' % (num_thread,)))
  t.start()

print("All threads are started")

Then we can have a look at the corresponding bank account history :

$ python3 bank_account.py
All threads are started
$ cat /tmp/WorldCompanyBigBoss
Customer First customer is adding 100 to bank account of WorldCompanyBigBoss containing 1000
Account money after First customer deposit: 1100
Customer Second customer is adding 200 to bank account of WorldCompanyBigBoss containing 1100
Account money after Second customer deposit: 1300
Customer Customer 1 is adding 5000 to bank account of WorldCompanyBigBoss containing 1300
Account money after Customer 1 deposit: 6300
Customer Customer 2 is adding 5000 to bank account of WorldCompanyBigBoss containing 6300
Account money after Customer 2 deposit: 11300
Customer Customer 3 is adding 5000 to bank account of WorldCompanyBigBoss containing 11300
Account money after Customer 3 deposit: 16300
Customer Customer 4 is adding 5000 to bank account of WorldCompanyBigBoss containing 16300
Account money after Customer 4 deposit: 21300
Customer Customer 5 is adding 5000 to bank account of WorldCompanyBigBoss containing 21300
Account money after Customer 5 deposit: 26300
Customer Customer 6 is adding 5000 to bank account of WorldCompanyBigBoss containing 26300
Account money after Customer 6 deposit: 31300
Customer Customer 7 is adding 5000 to bank account of WorldCompanyBigBoss containing 31300
Account money after Customer 7 deposit: 36300
Customer Customer 8 is adding 5000 to bank account of WorldCompanyBigBoss containing 36300
Account money after Customer 8 deposit: 41300
Customer Customer 9 is adding 5000 to bank account of WorldCompanyBigBoss containing 41300
Account money after Customer 9 deposit: 46300
Customer Customer 10 is adding 5000 to bank account of WorldCompanyBigBoss containing 46300
Account money after Customer 10 deposit: 51300

By the way, you could also create the thread by using instance method in place of the class method, this way :

t = threading.Thread(target=my_account.execute_deposit, args=(5000,'Customer %d' % (num_thread,)))

This will provide exactly the same result.

Waiting for a thread to stop

If you want waiting until a thread stops its task, just write this :

my_thread.join() # Will wait for a thread until it finishes its task.

You can also provide a timeout parameter in seconds (real numbers accepted) to the join() method. This will prevent from waiting too long or block your program if a thread is in an undefined state .
So to wait until all the threads have finished their deposit, we could update our code like this :

list_threads = []

for num_thread in range(1, 11):
  #t = threading.Thread(target=BankAccount.execute_deposit, args=(my_account, 5000,'Customer %d' % (num_thread,)))
  # This syntax will do the same job as the line just above
  t = threading.Thread(target=my_account.execute_deposit, args=(5000,'Customer %d' % (num_thread,)))
  list_threads.append(t)
  t.start()

print("All threads are started")

for t in list_threads:
  t.join() # Wait until thread terminates its task

# Or write [t.join() for t in list_threads]

print("All threads completed")

Concurrent access issue

When managing many processes accessing and updating the same data at the same time we may see concurrent access issues.
In our case we could imagine that our system is fast enough or slow enough that two fresly new created thread will access my_account.money at the same time to update it.
So if at this time my_account.money contains 1000, each will read 1000 and add 5000 to it, giving a final result of 6000 instead of 11000 !

That is a concurrent access issue.

Before seing how to solve and prevent from these, we are going to try to create one.
When running our software, the deposit history shows that each thread access my_account in the order of their creation. It could have been in any order. This means that our system is too fast or the task so short that each thread execution is finished before the next thread creation and execution starts or before Python switch to another thread.

So we could try to overhead the system by running 100 or 1000 deposits at the same time :

for num_thread in range(1, 1001):

But no side effect appears on my computer. Task is certainly too fast.
So we can try to slow the deposit method by incrementing my_account.money by one until the final amount is reached :

def execute_deposit(self, amount, by='A customer'):

  self.history_file.write('Customer %s is adding %s to bank account of %s containing %s\n' % (by, amount, self.owner, self.money) )

  for ind in range (0, amount):
    self.money += 1

  self.history_file.write('Account money after %s deposit: %s\n' % (by, self.money) )

But, once again, on my computer : no side effect. This seems not enough to make many threads accessing my_account.money at the same time and generating a wrong update by crushing the data.

In fact, as we will see it more in details in a next article on Python Global Interpreter Lock, Python essentially switches from one thread to another on Input/Output operations (read/write). So including a log in our history file will help us incitating Python to switch to another thread while another is actually in the update process of my_account.money, thus generating a concurrent data issue.

So lets update the deposit method like this :

def execute_deposit(self, amount, by='A customer'):

  self.history_file.write('Customer %s is adding %s to bank account of %s containing %s\n' % (by, amount, self.owner, self.money) )

  for ind in range (0, amount):
    self.money += 1
    self.history_file.write('Customer %s as added 1 more to account having now value of: %s\n' % (by, self.money) )

  self.history_file.write('Account money after %s deposit: %s\n' % (by, self.money) )

And run it again !
This time it is clearly slower and in history logs we see that threads executions are mixed all together:

$ cat /tmp/WorldCompanyBigBoss
...
Customer Customer 7 as added 1 more to account having now value of: 51112
Account money after Customer 7 deposit: 51112
...
Customer Customer 9 as added 1 more to account having now value of: 51129
...
Customer Customer 6 as added 1 more to account having now value of: 51219
Account money after Customer 6 deposit: 51219
Customer Customer 9 as added 1 more to account having now value of: 51220
...
Customer Customer 5 as added 1 more to account having now value of: 51300
Account money after Customer 5 deposit: 51300

But, once again final result is good with a value of 51300 after the last thread execution.
So this may not be enough. The action money += 1 is maybe too atomic and too fast to create data crushing. So lets complicate a bit more this increment.

def execute_deposit(self, amount, by='A customer'):

  self.history_file.write('Customer %s is adding %s to bank account of %s containing %s\n' % (by, amount, self.owner, self.money) )

  for ind in range (0, amount):
    old_money = self.money
    self.history_file.write('Customer %s is about to add 1 more to account for a new value of: %s\n' % (by, old_money + 1) )
    self.money = old_money + 1

  self.history_file.write('Account money after %s deposit: %s\n' % (by, self.money) )

And run it !

Yes ! This time we have the data crushing :

$ cat /tmp/WorldCompanyBigBoss
...
Customer Customer 10 is about to add 1 more to account for a new value of: 6703
Account money after Customer 10 deposit: 6703

Final amount should have been of 51300 instead of 6703 !

This may be the first time I'm working so hard to make a program generating a bug !
But we've got it.

To check that we have data crushing, we can do a grep command on our file, which shows that 6 customers have read the value of 6499 and wanted to update account money to 6500 all of them, making disappear 5 units of our precious money by the way !

$ cat /tmp/WorldCompanyBigBoss | grep 6500
Customer Customer 5 is about to add 1 more to account for a new value of: 6500
Customer Customer 4 is about to add 1 more to account for a new value of: 6500
Customer Customer 9 is about to add 1 more to account for a new value of: 6500
Customer Customer 10 is about to add 1 more to account for a new value of: 6500
Customer Customer 7 is about to add 1 more to account for a new value of: 6500
Customer Customer 8 is about to add 1 more to account for a new value of: 6500

If it had been a withdrawal of money we may have choosen after a short concertation to classify the bug as a nonbloking issue, and decided to solve it a day or another, rather another.
But it is about deposit, so we need to solve this terrific issue with the highest emergency level !

But before solving it, just try to understand what appens here : in last version of execute_deposit() each thread running this method retrieve the money actually in the bank account, then write the value to history file. At this moment Python see an I/O access and thinks to itself that it is a good time to switch to another thread. So it does so. And in less than the time required to write it to the disk, all threads have readed the current account amount, all with the same value. Then, they all update this amount by one.
Thus, we will get 10 threads trying to update account money from 10 to 11 instead of from 10 to 11, then 12, then 13, until 20 ! 9 money units lost !

Preventing money crushing by stopping concurrent data access

To prevent from such issues, languages provides usefull tools like locks and semaphores.
We will use the threading.Lock class to lock account access while it is incremented by one.

A lock is an object that can be acquired and released. Once acquired no other code can acquired it until it is released. So any second code attempting to acquire an already acquired lock will be forced to wait on the lock.acquire instruction until the lock is released by other code.
Now, update our bank account class like this to use lock in our deposit method :

class BankAccount():
  def __init__(self, initial_money=0, owner='Anonymous'):

    self.money = initial_money
    self.owner = owner
    # We will keep each write access to money in an history file
    # In order to understand what Python does with our money
    self.history_file = open('/tmp/%s' % (owner,), 'w')
self.money_lock = threading.Lock()

  def execute_deposit(self, amount, by='A customer'):
    self.history_file.write('Customer %s is adding %s to bank account of %s containing %s\n' % (by, amount, self.owner, self.money) )
    for ind in range (0, amount):
self.money_lock.acquire()
      old_money = self.money
      self.history_file.write('Customer %s is about to add 1 more to account for a new value of: %s\n' % (by, old_money + 1) )
      self.money = old_money + 1
self.money_lock.release()

    self.history_file.write('Account money after %s deposit: %s\n' % (by, self.money) )

In this last version of code, the lock is acquired before accessing the bank account amount and released after it has been incremented by one. So this code cannot be executed at the same time by 2 distinct threads, preventing from crushing of our precious money.

Going further with python threads

 

And to go ahead about Python threading, you can access to Python threading documentation and having a look at the semaphore class not presented in this tutorial.

Finally, if you intend to manage a lot of threads and want to control their execution and make them safely exchange information, have a look at the Python queue library.

Or you can enroll in our Python training !

Formations associées

Formations Python

Formation Python avancé

Nantes Du 8 au 12 avril 2024

Voir la formation

Actualités en lien

Image
Formation alternance Léna
26/12/2023

Je vous raconte mon expérience en alternance chez Makina Corpus Formation

Arrivée en avril 2022 en stage chez Makina Corpus, j'ai saisi l'opportunité qui m'était offerte de continuer en alternance dans le centre de formation pour réaliser ma Licence Responsable du Développement Commercial.

Voir l'article
Image
Python
26/07/2023

La formation Python éligible au CPF est enfin arrivée

Makina Corpus propose un nouvelle formation Python éligible au CPF. Grâce à cette certification, cette formation peut être entièrement financée par votre compte Compte Personnel de Formation.

Voir l'article
Image
Canari
20/07/2023

CANARI Europe, un service climatique innovant pour adapter l'agriculture européenne

Après un lancement réussi de CANARI l'application de projections climatiques dédiée à l'agriculture en France, CANARI s’étend à toute L’Europe et au nord du Maghreb.

Voir l'article

Inscription à la newsletter

Nous vous avons convaincus