Makina Blog
Python tutorial : Understanding Python threading
As many others languages, Python provides a great and simple library to use threads. This library includes all required objects and functions that you will need to do parallel programming and manage concurrent data access between threads.
This article explains how to use python threads with a simple and common example based on a bank account. It will provide you the basic understanding of python thread in order to introduce you to the Python Global Interpreter Lock issue covered in another article.
Bank Account example
In this tutorial example we propose to discover threads using a bank account use case : we will provide a BankAccount class in charge of managing your money – deposit only !
Then we will create customers that will make deposit on your account using threads, all at the same time.
So quickly we should be facing data concurrent access issues that we will clearly show and then solve using locks.
Bank Account class
The first version of our bank account class will be this one :
class BankAccount(): def __init__(self, initial_money=0, owner='Anonymous'): self.money = initial_money self.owner = owner # We will keep each write access to money in an history file # In order to understand what Python does with your money self.history_file = open('/tmp/%s' % (owner,), 'w') def execute_deposit(self, amount, by='A customer'): self.history_file.write('Customer %s is adding %s to bank account of %s containing %s\n' % (by, amount, self.owner, self.money) ) self.money += amount self.history_file.write('Account money after %s deposit: %s\n' % (by, self.money) ) def __del__(self): self.history_file.close() my_account = BankAccount(1000, "WorldCompanyBigBoss") my_account.execute_deposit(100, "First customer") my_account.execute_deposit(200, "Second customer")
Then run it to check it is working fine :
$ python3 bank_account.py $ cat /tmp/WorldCompanyBigBoss Customer First customer is adding 100 to bank account of WorldCompanyBigBoss containing 1000 Account money after First customer deposit: 1100 Customer Second customer is adding 200 to bank account of WorldCompanyBigBoss containing 1100 Account money after Second customer deposit: 1300
Making deposit using threads
Now that our bank account is working fine, lets make deposit in parrallel, as we have many customers all around the world and our services are working 24/24 non stop.
To do this we will use threads by importing the threading module.
import threading
To create a thread we just have to instantiate an object from threading.Thread.
The Thread class accepts many arguments. The target argument represents the function/callable object to execute when the thread will be started. In our case this will be the execute_deposit method of our BankAccount class.
The Thread class also accepts args or kwargs arguments containing the attributes to pass to the target object when running the thread.
Finally the Thread.start() method will run a thread.
Below is the code to run 10 threads at once making a deposit :
for num_thread in range(1, 11): t = threading.Thread(target=BankAccount.execute_virement, args=(my_account, 5000,'Customer %d' % (num_thread,))) t.start() print("All threads are started")
Then we can have a look at the corresponding bank account history :
$ python3 bank_account.py All threads are started $ cat /tmp/WorldCompanyBigBoss Customer First customer is adding 100 to bank account of WorldCompanyBigBoss containing 1000 Account money after First customer deposit: 1100 Customer Second customer is adding 200 to bank account of WorldCompanyBigBoss containing 1100 Account money after Second customer deposit: 1300 Customer Customer 1 is adding 5000 to bank account of WorldCompanyBigBoss containing 1300 Account money after Customer 1 deposit: 6300 Customer Customer 2 is adding 5000 to bank account of WorldCompanyBigBoss containing 6300 Account money after Customer 2 deposit: 11300 Customer Customer 3 is adding 5000 to bank account of WorldCompanyBigBoss containing 11300 Account money after Customer 3 deposit: 16300 Customer Customer 4 is adding 5000 to bank account of WorldCompanyBigBoss containing 16300 Account money after Customer 4 deposit: 21300 Customer Customer 5 is adding 5000 to bank account of WorldCompanyBigBoss containing 21300 Account money after Customer 5 deposit: 26300 Customer Customer 6 is adding 5000 to bank account of WorldCompanyBigBoss containing 26300 Account money after Customer 6 deposit: 31300 Customer Customer 7 is adding 5000 to bank account of WorldCompanyBigBoss containing 31300 Account money after Customer 7 deposit: 36300 Customer Customer 8 is adding 5000 to bank account of WorldCompanyBigBoss containing 36300 Account money after Customer 8 deposit: 41300 Customer Customer 9 is adding 5000 to bank account of WorldCompanyBigBoss containing 41300 Account money after Customer 9 deposit: 46300 Customer Customer 10 is adding 5000 to bank account of WorldCompanyBigBoss containing 46300 Account money after Customer 10 deposit: 51300
By the way, you could also create the thread by using instance method in place of the class method, this way :
t = threading.Thread(target=my_account.execute_deposit, args=(5000,'Customer %d' % (num_thread,)))
This will provide exactly the same result.
Waiting for a thread to stop
If you want waiting until a thread stops its task, just write this :
my_thread.join() # Will wait for a thread until it finishes its task.
You can also provide a timeout parameter in seconds (real numbers accepted) to the join() method. This will prevent from waiting too long or block your program if a thread is in an undefined state .
So to wait until all the threads have finished their deposit, we could update our code like this :
list_threads = [] for num_thread in range(1, 11): #t = threading.Thread(target=BankAccount.execute_deposit, args=(my_account, 5000,'Customer %d' % (num_thread,))) # This syntax will do the same job as the line just above t = threading.Thread(target=my_account.execute_deposit, args=(5000,'Customer %d' % (num_thread,))) list_threads.append(t) t.start() print("All threads are started") for t in list_threads: t.join() # Wait until thread terminates its task # Or write [t.join() for t in list_threads] print("All threads completed")
Concurrent access issue
When managing many processes accessing and updating the same data at the same time we may see concurrent access issues.
In our case we could imagine that our system is fast enough or slow enough that two fresly new created thread will access my_account.money at the same time to update it.
So if at this time my_account.money contains 1000, each will read 1000 and add 5000 to it, giving a final result of 6000 instead of 11000 !
That is a concurrent access issue.
Before seing how to solve and prevent from these, we are going to try to create one.
When running our software, the deposit history shows that each thread access my_account in the order of their creation. It could have been in any order. This means that our system is too fast or the task so short that each thread execution is finished before the next thread creation and execution starts or before Python switch to another thread.
So we could try to overhead the system by running 100 or 1000 deposits at the same time :
for num_thread in range(1, 1001):
But no side effect appears on my computer. Task is certainly too fast.
So we can try to slow the deposit method by incrementing my_account.money by one until the final amount is reached :
def execute_deposit(self, amount, by='A customer'): self.history_file.write('Customer %s is adding %s to bank account of %s containing %s\n' % (by, amount, self.owner, self.money) ) for ind in range (0, amount): self.money += 1 self.history_file.write('Account money after %s deposit: %s\n' % (by, self.money) )
But, once again, on my computer : no side effect. This seems not enough to make many threads accessing my_account.money at the same time and generating a wrong update by crushing the data.
In fact, as we will see it more in details in a next article on Python Global Interpreter Lock, Python essentially switches from one thread to another on Input/Output operations (read/write). So including a log in our history file will help us incitating Python to switch to another thread while another is actually in the update process of my_account.money, thus generating a concurrent data issue.
So lets update the deposit method like this :
def execute_deposit(self, amount, by='A customer'): self.history_file.write('Customer %s is adding %s to bank account of %s containing %s\n' % (by, amount, self.owner, self.money) ) for ind in range (0, amount): self.money += 1 self.history_file.write('Customer %s as added 1 more to account having now value of: %s\n' % (by, self.money) ) self.history_file.write('Account money after %s deposit: %s\n' % (by, self.money) )
And run it again !
This time it is clearly slower and in history logs we see that threads executions are mixed all together:
$ cat /tmp/WorldCompanyBigBoss ... Customer Customer 7 as added 1 more to account having now value of: 51112 Account money after Customer 7 deposit: 51112 ... Customer Customer 9 as added 1 more to account having now value of: 51129 ... Customer Customer 6 as added 1 more to account having now value of: 51219 Account money after Customer 6 deposit: 51219 Customer Customer 9 as added 1 more to account having now value of: 51220 ... Customer Customer 5 as added 1 more to account having now value of: 51300 Account money after Customer 5 deposit: 51300
But, once again final result is good with a value of 51300 after the last thread execution.
So this may not be enough. The action money += 1 is maybe too atomic and too fast to create data crushing. So lets complicate a bit more this increment.
def execute_deposit(self, amount, by='A customer'): self.history_file.write('Customer %s is adding %s to bank account of %s containing %s\n' % (by, amount, self.owner, self.money) ) for ind in range (0, amount): old_money = self.money self.history_file.write('Customer %s is about to add 1 more to account for a new value of: %s\n' % (by, old_money + 1) ) self.money = old_money + 1 self.history_file.write('Account money after %s deposit: %s\n' % (by, self.money) )
And run it !
Yes ! This time we have the data crushing :
$ cat /tmp/WorldCompanyBigBoss ... Customer Customer 10 is about to add 1 more to account for a new value of: 6703 Account money after Customer 10 deposit: 6703
Final amount should have been of 51300 instead of 6703 !
This may be the first time I'm working so hard to make a program generating a bug !
But we've got it.
To check that we have data crushing, we can do a grep command on our file, which shows that 6 customers have read the value of 6499 and wanted to update account money to 6500 all of them, making disappear 5 units of our precious money by the way !
$ cat /tmp/WorldCompanyBigBoss | grep 6500 Customer Customer 5 is about to add 1 more to account for a new value of: 6500 Customer Customer 4 is about to add 1 more to account for a new value of: 6500 Customer Customer 9 is about to add 1 more to account for a new value of: 6500 Customer Customer 10 is about to add 1 more to account for a new value of: 6500 Customer Customer 7 is about to add 1 more to account for a new value of: 6500 Customer Customer 8 is about to add 1 more to account for a new value of: 6500
If it had been a withdrawal of money we may have choosen after a short concertation to classify the bug as a nonbloking issue, and decided to solve it a day or another, rather another.
But it is about deposit, so we need to solve this terrific issue with the highest emergency level !
But before solving it, just try to understand what appens here : in last version of execute_deposit() each thread running this method retrieve the money actually in the bank account, then write the value to history file. At this moment Python see an I/O access and thinks to itself that it is a good time to switch to another thread. So it does so. And in less than the time required to write it to the disk, all threads have readed the current account amount, all with the same value. Then, they all update this amount by one.
Thus, we will get 10 threads trying to update account money from 10 to 11 instead of from 10 to 11, then 12, then 13, until 20 ! 9 money units lost !
Preventing money crushing by stopping concurrent data access
To prevent from such issues, languages provides usefull tools like locks and semaphores.
We will use the threading.Lock class to lock account access while it is incremented by one.
A lock is an object that can be acquired and released. Once acquired no other code can acquired it until it is released. So any second code attempting to acquire an already acquired lock will be forced to wait on the lock.acquire instruction until the lock is released by other code.
Now, update our bank account class like this to use lock in our deposit method :
class BankAccount(): def __init__(self, initial_money=0, owner='Anonymous'): self.money = initial_money self.owner = owner # We will keep each write access to money in an history file # In order to understand what Python does with our money self.history_file = open('/tmp/%s' % (owner,), 'w') self.money_lock = threading.Lock() def execute_deposit(self, amount, by='A customer'): self.history_file.write('Customer %s is adding %s to bank account of %s containing %s\n' % (by, amount, self.owner, self.money) ) for ind in range (0, amount): self.money_lock.acquire() old_money = self.money self.history_file.write('Customer %s is about to add 1 more to account for a new value of: %s\n' % (by, old_money + 1) ) self.money = old_money + 1 self.money_lock.release() self.history_file.write('Account money after %s deposit: %s\n' % (by, self.money) )
In this last version of code, the lock is acquired before accessing the bank account amount and released after it has been incremented by one. So this code cannot be executed at the same time by 2 distinct threads, preventing from crushing of our precious money.
Going further with python threads
And to go ahead about Python threading, you can access to Python threading documentation and having a look at the semaphore class not presented in this tutorial.
Finally, if you intend to manage a lot of threads and want to control their execution and make them safely exchange information, have a look at the Python queue library.
Or you can enroll in our Python training !
Formations associées
Actualités en lien
Makina Corpus est sponsor de la PyConFR 2024
Le soutien de Makina Corpus à la PyConFR 2024, qui se tient du 31 octobre au 3 novembre 2024 à Strasbourg, reflète ses valeurs de partage et d’innovation, et son engagement envers la communauté dynamique et ouverte de Python.
Découvrez notre nouvelle formation Nuxt : créez une application web
Avec la nouvelle formation Nuxt découvrez la création d’une application web à partir de ce framework afin d’appréhender son architecture, ses modes de rendu (SSR/CSR/ESR…) et ses spécificités.
Nouvelle formation MapLibre : création de cartes web interactives
Cette formation MapLibre enseigne aux développeurs web à créer et personnaliser des cartes interactives pour leurs applications.