Category Archives: Software development

A simple plot with Python and Bokeh

Python is a simple but powerful language, and comes with a wealth of libraries. The chart above took just 9 lines of Python. All the hard work is done by the Bokeh library. It shows the chart in your browser, where you can zoom in and move around the chart.

Here is the annotated code. You can find the raw code at the end of this post or at the GitHub repository

Before installing Bokeh, to keep your Python version(s) clean, you may want to set up a virtual environment first

To install Bokeh: pip install bokeh

1. from bokeh.plotting import figure, show
Import part of bokeh, so we can create and show a figure

2. import math
We’ll use the math module to generate the points on the charts

3. x_values = range(0, 720)
The x axis contains the numbers from 0 to 719 (Python stops just before 720)

4. y_values = [math.sin(math.radians(x)) for x in x_values]
For each of the x values, the y value is sine of x. Python’s sin function expects the angle in radians rather than degrees. math.radians converts from degrees to radians.

We use something called ‘list comprehension’ here, to build up the list of y axis values. It creates a new list, which consists of the sine of each x (converted from degrees to radians) in the original list.

5. p = figure(title=’10 Sine waves’, x_axis_label=’x (degrees)’, y_axis_label=’y = sin(x)’, plot_width=850, plot_height=350)
Create an empty Bokeh figure, and set the title, labels, width and height

6. for i in range(10):
We’re drawing the same sine curve 10 times, at 10%, 20%, … 100% of the full height

For i is 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, do the following:

7.     factor = 1 – i/10
Calculate the scaling factor, as (1 – 0/10) = 0, (1 – 1/10) = 0.9, (2 – 2/10) = 0.8, 0.7, … 0.1

8.     p.line(x_values, [y * factor for y in y_values])
Add a line to the figure, using the original list of x_values, but scale down the y_values by the current factor

9. show(p)
Ask Bokeh to show the result in your browser

1. from bokeh.plotting import figure, show
2. import math
3. x_values = range(0, 720)
4. y_values = [math.sin(math.radians(x)) for x in x_values]
5. p = figure(title='10 Sine waves', x_axis_label='x (degrees)', y_axis_label='y = sin(x)', plot_width=850, plot_height=350)
6. for i in range(10):
7.     factor = 1 - i/10
8.     p.line(x_values, [y * factor for y in y_values])
9. show(p)

Use Python to update a spreadsheet

How would you like to grab a share price daily and store it in a spreadsheet? Or add a new column to dozens of spreadsheets – automatically?

Python is a simple but powerful language, and comes with a wealth of libraries. Its openpyxl library lets you easily open a spreadsheet and make some changes.

Here is an example which adds a new column (“Next age”) to all spreadsheets in the source_folder. The left side of the image above shows an original spreadsheet. The Python script opens this, adds a new column (Next age), then saves it to the target_folder. The right side of the image shows the result

Here is the annotated code. You can find the raw code at the GitHub repository

Before installing openpyxl, to keep your Python version(s) clean, you may want to set up a virtual environment first

To install openpyxl: pip install openpyxl

1. import openpyxl
2. import os
3. for name in os.listdir('source_files'):
4.     workbook = openpyxl.load_workbook(filename='source_files/' + name)
5.     sheet = workbook['Sheet1']
6.     sheet['C1'].value = 'Next age'
7.     for row in range(2, 100):
8.         if sheet[f'B{row}'].value:
9.             sheet[f'C{row}'].value = sheet[f'B{row}'].value + 1
10.     workbook.save(filename='target_files/' + name)

1. import openpyxl
Load the openpyxl library.

2. import os
Load the os library. We will use this list the files in a folder

3. for name in os.listdir(‘source_files’):
For each file in our ‘source_files’ folder. Note that this includes all files, regardless of whether it is a spreadsheet or not

4.     workbook = openpyxl.load_workbook(filename=’source_files/’ + name)
Open the workbook

5.     sheet = workbook[‘Sheet1’]
Take the worksheet called ‘Sheet1’

6.     sheet[‘C1’].value = ‘Next age’
Enter something in cell C1

7.     for row in range(2, 100):
For rows 2 – 99 (Python stops just before reaching 100), do the following:

8.          if sheet[f’B{row}’].value:
If cell B2, B3, B4, etc is not empty, do the following:

9.               sheet[f’C{row}’].value = sheet[f’B{row}’].value + 1
Take the age from column B, add one to it and store in the cell to the right, i.e. in column C

10. workbook.save(filename=’target_files/’ + name)
Save the updated workbook to the target_files folder, using the same name

1/2 + 1/3 = 1/6

Fractions in Python

When you ask your spreadsheet to calculate 1/2 + 1/3 you get something like this:
This is obviously an approximation. The 3’s after the decimal point repeat indefinitely.

The correct answer is:

  • 1/2 = 3/6
  • 1/3 = 2/6
  • 1/2 + 1/3 = 3/6 + 2/6 = 5/6

Python is a simple but powerful language, and comes with a wealth of libraries. Its Fractions library gives you the correct answer in a couple of lines

Here is the annotated code. You can find the raw code at the GitHub repository

1. from fractions import Fraction
Load the Fractions library

2. half = Fraction(‘1/2’)
3. third = Fraction(‘1/3’)
Create the two fractions

4. total = half + third
Add them up

5. print(half, ‘+’, third, ‘=’, total)
Show the result.
The more modern way is to use an “f-string”, which was introduced in Python 3.6, December 2016. This is often more readable, but not here. It would look like this:
print(f'{half} + {third} = {total}’)

Sample chart

Retrieve and display a data set

(First part of the “Practical Python in 10 lines or less” series)

Python is a simple but powerful language, and comes with a wealth of libraries. The chart above took just 10 lines of Python. All the hard work is done by the Pandas and MatPlotLib libraries.

The code

import pandas, matplotlib
data = pandas.read_csv('http://www.compassmentis.com/wp-content/uploads/2019/04/cereal.csv')
data = data.set_index('name')
data = data.calories.sort_values()[-10:]
ax = data.plot(kind='barh')
ax.set_xlabel('Calories per serving')
ax.set_ylabel('Cereal')
ax.set_title('Top 10 cereals by calories')
matplotlib.pyplot.subplots_adjust(left=0.45)
matplotlib.pyplot.show()

How it works

You will need Python and the Pandas and MatPlotLib libraries. See the installation instructions

Get started

1. import pandas, matplotlib
Grab the libraries we need to load, clean up and display the data.
The recommended approach (PEP 8) is to have two import statements on separate lines. To leave enough lines to make the chart look good, in this example I have combined them.

2. data = pandas.read_csv(‘http://www.compassmentis.com/wp-content/uploads/2019/04/cereal.csv’)
Load the csv data from a website. This gives us a pandas DataFrame, a two dimensional datastructure similar to a page in a spreadsheet.
I downloaded the data from https://www.kaggle.com/crawford/80-cereals/version/2, under Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) [https://creativecommons.org/licenses/by-sa/3.0/]

3. data = data.set_index(‘name’)
Set the row names (index) to the ‘name’ column. When we plot the data this becomes the data labels.

4. data = data.calories.sort_values()[-10:]
Take the ‘calories’ column, sort it and limit to the last 10 values. This gives us the 10 cereals with the highest calories per serving

5. ax = data.plot(kind=’barh’)
Plot the data as a horizontal bar chartax.set_xlabel(‘Calories per serving’)

6. ax.set_ylabel(‘Cereal’)
7. ax.set_title(‘Top 10 cereals by calories’)
8. ax.set_xlabel(‘Area in millions square kilometers’)

Set the label for the x and y axes and the title

9. matplotlib.pyplot.subplots_adjust(left=0.45)
Set the left margin (from the left of the image to the left of the chart area) to 45% to give enough space for the cereal names.

10. matplotlib.pyplot.show()
Show the chart

Getting started with Python for Scientific Computing

So you’d like to do some data analysis or other scientific computer with Python. How do you start?

The Anaconda distribution

A Python ‘distribution’ is a bundle of Python goodies, typically Python itself, a set of Python libraries and possibly an integrated development environment.

Anaconda is a Python distribution specifically for data science. It includes the most popular data science and machine learning Python packages, Jupyter for quick exploratory data analysis and Spyder for creating and running Python scripts.

For more information and to install Anaconda go to the Anaconda Distribution page

Jupyter Notebook

A Jupyter notebook lets you try out different Python commands and create a story which shows your steps and the results. For instance:

Once you have installed Anaconda, or otherwise installed Jupyter:

  1. Open a Terminal or Command Prompt
  2. jupyter notebook
  3. Jupyter will open in your browser
  4. Click on the ‘New’ button (right hand side), and select ‘Python 3’
  5. Start typing
  6. To execute a cell, hit Ctrl-Enter
  7. Jupyter automatically saves the notebook. Click on the title (top left hand corner, next to Jupyter logo) to give it a sensible name

Getting started with Python

So you’d like to give Python a go. How do you start?

(If you are going to be using Python for Scientific Computing, including Data Analysis, have a look at this article instead)

Installing Python

Make sure you install Python 3, which is the modern version of Python. There is also a legacy version of Python, Python 2.7, but this is being phased out and should not be used for new projects.

You can find installation files for Windows and Mac OSX at https://www.python.org/downloads/. When you start the installation on Windows there will be an option to add Python to the system path. I recommend you select this option, as it makes it easier to run your Python scripts. I have not tried this on Mac OSX; it may have the same option.

For Linux you can use your software package manager, such as aptitude, yum or zypper to install ‘python3’. This will give you Python 3

Running Python – REPL/Console

For trying out some simple Python commands you can use the Python Console. This is also called the REPL (Read, Execute, Print Loop). To start the Python Console, just run Python. This will give you something like this:

Have a little play with this. For instance:

When you are done, press ^Z (Windows) or ^D (Mac OSX and Linux). Or enter ‘exit()’

Running Python – IDLE editor

The console is great for quick experiments. For anything more permanent it is better to create a script, a text file which contains Python code. When you installed Python it came with IDLE, a very simple integrated development environment.

Start IDLE from your operating system’s menu. You will see something like this:

Now select File, New File. Enter some Python commands, like:

Hit ‘F5’ to run the program. You will be prompted to save the file first, so give it a name and save it. You will see the result of your script in the original (shell) window:

Running a Python script from the command line

Say you’ve written a Python script, or someone else has given you a script. How do you run it?

  1. Start a Terminal or (as Windows calls it) a Command Prompt.
  2. Use the ‘cd <path to folder>’ command to go to the folder which contains the script
  3. Enter: ‘python <scriptname>.py’. For instance: ‘python test.py’

Other editors

IDLE is great for getting you started quickly, but for any serious Python development I suggest you use a professional text editor or IDE (Integrated Development Environment). Both a text editor and an IDE let you create and edit text files. An IDE can also run, debug, test and more. For instance:

  • PyCharm. My favourite IDE. It gives you so much power to write, run, debug and test your scripts, I don’t know where to start. Just check it out at …. Start with the free Community edition.
  • Visual Studio Code. I hear good things about this IDE, and it recently became more popular than PyCharm, so it must be doing something right.
  • Sublime Text. An excellent text editor

Data Analysis with Python

Python is a very popular tool for data extraction, clean up, analysis and visualisation. I’ve recently done some work in this area, and would love to do some more. I particularly enjoy using my maths background and creating pretty, clear and helpful visualisations

  • Short client project, analysing sensor data. I took readings from two accelerometers and rotated the readings to get the relative movement between them. Using NumPy, Pandas and MatplotLib, I created a number of different charts, looking for a correlation between the equipment’s setting and the movement. Unfortunately the sensors aren’t sensitive enough to return usable information. Whilst not the outcome they were hoping for, the client told me “You’ve been really helpful and I’ve learned a lot”
  • At PyCon UK (Cardiff, September 2018) I attended 14 data analysis sessions. It was fascinating to see the range of tools and applications in Python data analytics. At a Bristol PyData MeetUp I summarised the sessions in a 5 minute lightening talk. This made me pay extra attention and keep useful notes during the conference
  • Short client project, researching best way to import a large data set, followed by implementation. The client regularly accesses large datasets using a folder hierarchy\to structure that data. They were looking to replace this with a professional database, i.e. PostgreSQL. I analysed their requirements, researched the different storage methods in PostgreSQL, reported my findings and created an import script.

Django Rest Framework API Microservice

I recently completed a small project for Zenstores. They simplify the shipping process for ecommerce sites. Their online service lets online businesses use multiple shipping companies for deliveries.

Each shipping companies offers a different own API, for booking shipments, etc. My client uses a separate microservice for each shipping company. These microservices listen to requests from the main system and translate them to the shipping company’s standard.

My client asked me to use Django Rest Framework to create a microservice which supports a new shipping company. DRF is a popular and powerful library to create RESTful APIs using Django.

The supplier provided me with a sandbox API and extensive documentation. The documentation was somewhat incomplete and out of date. Fortunately their support contact was very helpful all along.

I used Test Driven Design for complex functions where I understood the functionality well. For the rest I used a more experimental approach and added unit tests afterwards. Testing coverage was over 90{d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb}.

The client has integrated the microservice within their system and the first test shipments have gone through.

A couple of Python coding dojo’s

As Joe Wright puts it

A Coding Dojo is a programming session based around a simple coding challenge. Programmers of different skill levels are invited to engage in deliberate practice as equals. The goal is to learn, teach and improve with fellow software developers in a non-competitive setting.”

There is something quite satisfying about having a brief period to create something, by yourself or with others. So I recently went to a couple of coding dojo’s

PyCon UK 2018, Cardiff, September 2018

On the third evening of the conference, about 60 people took on the challenge of using Pygame Zero to create something on the theme of “Four seasons”

We hit on the idea of combining the four seasons of the year with a pizza quattro stagioni (four season pizza). This became an infinite scrolling background of the four seasons and a ‘rolling’ four season in the foreground

We used a peer coding approach, to simplify code sharing. And, with it being quite a simple concept to implement, we didn’t need to code in parallel. So, despite being one of the more experienced developers on the team, I sourced and prepared the assets (i.e. the pictures), whilst supporting my team mate who was behind the keyboard.

The end result was quite well received

You can find all the submissions at https://github.com/PyconUK/dojo18. Ours is under “shaunsfinger”.

CodeHub Python Coding DoJo MeetUp, October 2018

About 15 developers got together for this meetup, and took on the challenge of creating a “TypeRacer”

As far as I could tell, this meant typing as fast as possible. This probably referred to the TypeRacer website. I had not seen this before, but did know something similar, the space shooting typing game ZType

I imagined our game as a car which moves when you type the next correct character. After a brief discussion, we agreed to use PyGame. I have used it for a number of personal projects, and my two fellow team mates were interested in trying it out

We roughly divided the tasks between us, and my team mate set up a shared GitHub repo. I quickly found an image of a racing track as the background and a couple of cool looking racing cars. Starting from some simple sample PyGame code, I created the first version – showing the background image, and a car which moved a little on every tick of the game loop. In the meantime, my team mates showed the text and responded to the keyboard.

We brought this all together, did a bit more polishing, and finished just in time

Our game worked very well, and was exactly as I’d envisaged it. Our fellow Code-dojo-ers seemed to like it too

As it was for an informal coding exercise, not for public consumption or publication, and because of the time constraints, I decided to use copyrighted images. I have now replaced these with copyright-free images, from CraftPix and OpenGameArt

The final result is currently in a private repo. I have asked my team mate to make it public, and will update this post once this is done

With thanks to Katja Durrani and Eleni Lixourioti for organising this. It was well organised, with plenty of snacks and drinks, and a friendly atmosphere. And thanks to my team mates Andrew Chan and Eleni Lixourioti. It was a pleasure working with both of them

Grafana, InfluxDB and Python, simple sample

I recently came across an interesting contract position which uses Grafana and InfluxDB. I’d had a play with ElasticSearch before, and done some work with KairosDB, so was already familiar with time series and json-based database connections. Having manually created a dashboard, Grafana looked rather interesting. So I thought I’d do a quick trial – generate some random data, store it in InfluxDB and show it with Grafana

Starting with a clean virtual machine:

InfluxDB

  1. Set up InfluxDB
    1. I followed InfluxDB’s installation instructions, which worked first time without any problems
    2. Start it
      sudo /etc/init.d/influxdb start
      
  2. Test InfluxDB
    influx
    &amp;amp;gt; create database mydb
    &amp;amp;gt; show databases
    name: databases
    ---------------
    name
    _internal
    mydb
    
    &amp;amp;gt; use mydb
    &amp;amp;gt; INSERT cpu,host=serverA,region=us_west value=0.64
    &amp;amp;gt; SELECT host, region, value FROM cpu
    name: cpu
    ---------
    time            host    region  value
    1466603916401121705 serverA us_west 0.64
    
  3. Set up and test influxdb-python, so we can access InfluxDB using Python
    sudo apt-get install python-pip
    pip install influxdb
    python
    &amp;amp;gt;&amp;amp;gt;&amp;amp;gt; import influxdb
    &amp;amp;gt;&amp;amp;gt;&amp;amp;gt;
    
  4. Run through this example of writing and reading some InfluxDB data using Python
    &amp;amp;gt;&amp;amp;gt;&amp;amp;gt; from influxdb import InfluxDBClient
    &amp;amp;gt;&amp;amp;gt;&amp;amp;gt; json_body = [
    ...     {
    ...         "measurement": "cpu_load_short",
    ...         "tags": {
    ...             "host": "server01",
    ...             "region": "us-west"
    ...         },
    ...         "time": "2009-11-10T23:00:00Z",
    ...         "fields": {
    ...             "value": 0.64
    ...         }
    ...     }
    ... ]
    &amp;amp;gt;&amp;amp;gt;&amp;amp;gt; client = InfluxDBClient('localhost', 8086, 'root', 'root', 'example')
    &amp;amp;gt;&amp;amp;gt;&amp;amp;gt; client.switch_database('mydb')
    &amp;amp;gt;&amp;amp;gt;&amp;amp;gt; client.write_points(json_body)
    True
    &amp;amp;gt;&amp;amp;gt;&amp;amp;gt; print client.query('select value from cpu_load_short;')
    ResultSet({'(u'cpu_load_short', None)': [{u'value': 0.64, u'time': u'2009-11-10T23:00:00Z'}]})
    
  5. Create some more data, using a slimmed down version of this tutorial script
    import argparse
    
    from influxdb import InfluxDBClient
    from influxdb.client import InfluxDBClientError
    import datetime
    import random
    import time
    
    
    USER = 'root'
    PASSWORD = 'root'
    DBNAME = 'mydb'
    
    
    def main():
        host='localhost'
        port=8086
    
        nb_day = 15  # number of day to generate time series
        timeinterval_min = 5  # create an event every x minutes
        total_minutes = 1440 * nb_day
        total_records = int(total_minutes / timeinterval_min)
        now = datetime.datetime.today()
        metric = "server_data.cpu_idle"
        series = []
    
        for i in range(0, total_records):
            past_date = now - datetime.timedelta(minutes=i * timeinterval_min)
            value = random.randint(0, 200)
            hostName = "server-{d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb}d" {d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb} random.randint(1, 5)
            # pointValues = [int(past_date.strftime('{d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb}s')), value, hostName]
            pointValues = {
                    "time": past_date.strftime ("{d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb}Y-{d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb}m-{d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb}d {d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb}H:{d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb}M:{d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb}S"),
                    # "time": int(past_date.strftime('{d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb}s')),
                    "measurement": metric,
                    'fields':  {
                        'value': value,
                    },
                    'tags': {
                        "hostName": hostName,
                    },
                }
            series.append(pointValues)
        print(series)
    
        client = InfluxDBClient(host, port, USER, PASSWORD, DBNAME)
    
        print("Create a retention policy")
        retention_policy = 'awesome_policy'
        client.create_retention_policy(retention_policy, '3d', 3, default=True)
    
        print("Write points #: {0}".format(total_records))
        client.write_points(series, retention_policy=retention_policy)
    
        time.sleep(2)
    
        query = 'SELECT MEAN(value) FROM "{d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb}s" WHERE time &amp;amp;gt; now() - 10d GROUP BY time(500m);' {d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb} (metric)
        result = client.query(query, database=DBNAME)
        print (result)
        print("Result: {0}".format(result))
    
    if __name__ == '__main__':
        main()
    
  6. Save as create_sample_data.py, run and test it
    python create_sample_data.py
    ......
    influx
    Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
    Connected to http://localhost:8086 version 0.13.0
    InfluxDB shell version: 0.13.0
    &amp;gt; use database mydb
    &amp;gt; SELECT MEAN(value) FROM "server_data.cpu_idle" WHERE time &amp;gt; now() - 10d GROUP BY time(500m)
    time			mean
    1466280000000000000	94.03846153846153
    1466310000000000000	98.47
    1466340000000000000	95.43
    1466370000000000000	104.3
    1466400000000000000	104.01
    1466430000000000000	114.18
    1466460000000000000	106.19
    1466490000000000000	96.67
    1466520000000000000	107.77
    1466550000000000000	103.08
    1466580000000000000	100.53
    1466610000000000000	94
    

Grafana

  1. Install Grafana using the installation instructions:
    $ wget https://grafanarel.s3.amazonaws.com/builds/grafana_3.0.4-1464167696_amd64.deb
    $ sudo apt-get install -y adduser libfontconfig
    $ sudo dpkg -i grafana_3.0.4-1464167696_amd64.deb
    
  2. Start the server and automatically start the server on boot up
    sudo service grafana-server start
    sudo systemctl enable grafana-server.service
    
  3. Test
    1. In your browser, go to localhost:3000
    2. Log in as (user) admin, (password) admin
  4. Connect to the InfluxDB database
    1. I followed the Instructions at http://docs.grafana.org/datasources/influxdb/
    2. Click on the Grafana icon
    3. Select “Data Sources”
    4. Click on “+ Add data source”
      1. Name: demo data
      2. Type: InfluxDB
      3. URL: http://localhost:8086
      4. Database: mydb
      5. User: root
      6. Password: root
      7. Click on “Save and Test”
    5. Create a new Dashboard
      1. Click on the Grafana icon
      2. Select “Dashboards”
      3. Click on “New”
    6. Define a metric (graph)
      1. Click on the row menu, i.e. the green icon (vertical bar) to the left of the row
      2. Select “Add Panel”
      3. Select “Graph”
      4. On the Metrics tab (selected by default)
        1. Click on the row just below the tab, starting with “> A”
        2. Click on “select measurement” and select “server_data.cpu_idle”
          1. You should now see a chart
        3. Close this, by clicking on the cross, top right hand corner of the Metrics panel
    7. Save the dashboard
      1. Click on the save icon (top of the screen)
      2. Click on the yellow star, next to the dashboard name (“New dashboard”)
    8. Test it
      1. In a new browser tab or window, go to http://localhost:3000/
      2. Log in (admin, admin)
      3. The “New dashboard” will now show up in the list of starred dashboards (and probably also under “Recently viewed dashboards”)
      4. Click on “New dashboard” to see the chart

You should now see something like this:

Grafana InfluxDB