Category Archives: PostgreSQL

Data Analysis with Python

Python is a very popular tool for data extraction, clean up, analysis and visualisation. I’ve recently done some work in this area, and would love to do some more. I particularly enjoy using my maths background and creating pretty, clear and helpful visualisations

  • Short client project, analysing sensor data. I took readings from two accelerometers and rotated the readings to get the relative movement between them. Using NumPy, Pandas and MatplotLib, I created a number of different charts, looking for a correlation between the equipment’s setting and the movement. Unfortunately the sensors aren’t sensitive enough to return usable information. Whilst not the outcome they were hoping for, the client told me “You’ve been really helpful and I’ve learned a lot”
  • At PyCon UK (Cardiff, September 2018) I attended 14 data analysis sessions. It was fascinating to see the range of tools and applications in Python data analytics. At a Bristol PyData MeetUp I summarised the sessions in a 5 minute lightening talk. This made me pay extra attention and keep useful notes during the conference
  • Short client project, researching best way to import a large data set, followed by implementation. The client regularly accesses large datasets using a folder hierarchy\to structure that data. They were looking to replace this with a professional database, i.e. PostgreSQL. I analysed their requirements, researched the different storage methods in PostgreSQL, reported my findings and created an import script.

Namepy step 7 – Bringing it all together

(This is part of the namepy project. Start at Namepy – on the shoulders of giants)

Time to show some real results on a web page.

  1. Extend the API to show the letter scoring tables, no pagination, in add:
    manager.create_api(models.Set, methods=['GET'], results_per_page=0) 
  2. Rename helloworld.html to index.html
  3. At the end of, update the template name to index.html, and stop passing in ‘names’ since this is now done through the API, and rename the endpoint function to ‘index’:
    def index(): 
        return render_template('index.html') 

That’s it for the changes to the back end. The rest of the changes will all be in the front end, in index.html

  1. Rename the app from HelloWorldApp to NamePyApp
  2. Rename the controller from HelloWorldController to NamePyController
  3. Load the letter scoring table, and simplify it for faster lookup
    $scope.sets = [];
    angular.forEach(, function(set, index) {
        scores = {};
        angular.forEach(set.scores, function(score, index) {
            scores[score.letter] = score.score;
        $scope.sets.push({ name:, scores: scores});
  4. Calculate the score for each of the sets
    angular.forEach($scope.sets, function(set, index) {
        var total = 0;
        var error = false;
        angular.forEach(name.split(''), function(character, index2) {
            if (character in set.scores) {
                total += set.scores[character];
            } else {
                error = true;
        if (error == false) {
            result.push([, total]);
        $scope.sort_on_element(result, 1);
        $scope.scores = result;
  5. Show the result on the page, using Highcharts. For the code see the source code, function “showLetterScores”

Show baby name distribution

  1. Get data for entered name
    var filters = [{ name: 'name', 
        op: 'ilike', 
        val: $scope.visitor_name}];
        method: 'GET',
        url: 'api/name',
        params: {"q": JSON.stringify({"filters": filters})}
            function(response) {            
  2. Restructure the results for Highcharts
    var boy_frequency = [];
    var girl_frequency = [];
    var boys_found = false;
    var girls_found = false;
        function(frequency) {
                Date.UTC(frequency.year, 1, 1), 
                Date.UTC(frequency.year, 1, 1), 
            if (frequency.boys_count) boys_found = true;
            if (frequency.girls_count) girls_found = true;
    $scope.sort_on_element(boy_frequency, 0);
    $scope.sort_on_element(girl_frequency, 0);
  3. Show the results using Highcharts. See the source code, function “show_name_distribution”


Done Done

This is the final blog post for this little project. I hope you found it useful.

Namepy step 6 – Load the data into the database

(This is part of the namepy project. Start at Namepy – on the shoulders of giants)

We will need the following data in the database:

  • Name frequencies – baby names by year
  • Scrabble™ letter values, by (country) Scrabble™ set

Name frequencies

  1. Download the data from
  2. Unzip it in <project_root>/raw_data/yob1880.txt, etc
  3. You may want to add raw_data to .gitignore, so it doesn’t get stored in your git repo
  4. Create some code to read files and store in PostgreSQL –
    def read_frequencies_from_file(filename, names):
        year = int(filename[3:7])
        year_frequencies = {}
        for name in names:
            year_frequencies[name] = {'F': 0, 'M': 0}
        with open('raw_data/%s' % filename) as file:
            for line in file.readlines():
                    name_text, sex, count = line.split(",")
                    print("Couldn't parse line")
                if name_text not in names:
                    name = Name(name=name_text)
                    names[name_text] =
                    year_frequencies[name_text] = {'F': 0, 'M': 0}
                year_frequencies[name_text][sex] = int(count)
            for name, name_frequency in year_frequencies.iteritems():
                if name_frequency['F'] + name_frequency['M']:
                    name_id = names[name]
                    frequency_record = NameFrequency(name_id=name_id,
    def read_name_frequencies():
        # Start with an empty list
        print("Deleting any previous data")
        names = {}
        # Get file list
        for filename in listdir('raw_data'):
            if filename[:3] == 'yob':
                read_frequencies_from_file(filename, names)
  5. Run the code. Note that this may take a while to run. On my development machine it took about 8 minutes
  6. Check this in the database, for instance with phpPgAdmin or pgAdmin

Scrabble™ letter values

For a list of Scrabble™ letter values by Scrabble™ set see this Wikipedia entry. The following code will grab this page, extract the letter values and save this in the database

  1. Add the new tables to
    class Set(db.Model):
        id = db.Column(db.Integer, primary_key=True)
        name = db.Column(db.String())
        scores = db.relationship('LetterScore', backref='set', lazy='dynamic')
    class LetterScore(db.Model):
        id = db.Column(db.Integer, primary_key=True)
        set_id = db.Column(db.Integer, db.ForeignKey(''))
        score = db.Column(db.Integer)
        letter = db.Column(db.String(1))
  2. Write some code to parse this page and store the results in the database. See the source code in my GitHub repo
  3. Run the code
  4. Check the results in the database. See above (end of name frequencies section) for some suggested tools



Time to pull it all together and show some real charts

Continue to Step 7 – Bringing it all together

Namepy step 5 – Flask-Restless

(This is part of the namepy project. Start at Namepy – on the shoulders of giants)

We will need Angular to use an Ajax call to request the data from Flask, using a REST-style request, and show it in Highcharts.

The two main Flask libraries for creating a REST API are Flask-Restful and Flask-Restless. We will be using Flask-Restless, because it is particularly suited for what we’re trying to do: “Flask-Restless provides simple generation of ReSTful APIs for database models defined using SQLAlchemy (or Flask-SQLAlchemy)” – from the Flask-Restless documentation.

Create and test the REST API

  1. Install flask-restless
    (virtualenv) pip install flask-restless
  2. import at at the start of __init__py:
    import flask.ext.restless
  3. Create the API endpoint, add following to end of
    manager = flask.ext.restless.APIManager(app, flask_sqlalchemy_db=db) 
    manager.create_api(models.Name, methods=['GET']) 
  4. Test this – python; point your browser to, this should show a JSON structure with the names and frequencies

Use Angular to request and process the REST data from the back end system

  1. Create a new function which takes a response object, extracts the json data, formats it, and passes it to Highcharts
                        $scope.showChart = function(response_data) {
                            chart_data = []
                            angular.forEach(response_data.objects, function(name_object, key) {
                                boys_count = []
                                angular.forEach(name_object.frequencies, function(frequency, key) {
                                chart_data.push({ name:, data: boys_count });
                                chart: {
                                    type: 'column'
                                title: {
                                    text: 'Name frequencies'
                                series: chart_data

    Note that this doesn’t quite make sense, for instance the year isn’t being shown in the chart. We’ll fix all of that later. For now the aim is to get the infrastructure set up – database, REST API, Angular, etc.

  2. Use Angular’s $http.get() function to call the api, and pass the response object to the showChart function upon completion
                        	.then(function(response) { 
  3. Test: Make sure the Flask app is running and go to You should still see the name frequencies chart



That completes the technical set up, for now. We’re ready to do some real coding, starting with getting the data into the database.

Continue to Step 6 – Load the data into the database

Namepy step 4 – PostgreSQL, SQLAlchemy and Flask-SQLAlchemy

(This is part of the namepy project. Start at Namepy – on the shoulders of giants)

I’ve used PostgreSQL for a number of different projects. It is the database of choice for Django developers, with MySQL a close second. It is fast, stable, able to hand large databases and open source.

For a small project like this PostgreSQL is overkill, and SQLite would be easier to set up. However, I wanted to show how to integrate the key elements of a serious Flask/etc project, which has to include a production-grade database.

  1. Create a new database
    • For instructions on creating a new database in PostgreSQL you’ll find many helpful posts online, including my own blogpost
    • Make a note of the database name, user name and password
  2. Grab the required libraries (psycopg2 is needed for using flask-sqlalchemy with PostgreSQL, but not automatically installed with it)
    (virtualenv) pip install flask-sqlalchemy
    (virtualenv) pip install psycopg2
  3. Import flask-sqlalchemy
    from flask_sqlalchemy import SQLAlchemy
  4. Configure it. Replace <user name>, etc, in the text below with the actual user name, etc
    app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://<user name>:<password>@localhost/<database>'
  5. Define the models in <project_root>/
    from hello import db
    class Name(db.Model):
        id = db.Column(db.Integer, primary_key=True)
        name = db.Column(db.String())
        frequencies = db.relationship('NameFrequency', backref='name', lazy='dynamic')
    class NameFrequency(db.Model):
        id = db.Column(db.Integer, primary_key=True)
        name_id = db.Column(db.Integer, db.ForeignKey(''))
        year = db.Column(db.Integer)
        boys_count = db.Column(db.Integer)
        girls_count = db.Column(db.Integer)
  6. Test the database and create the tables
    >>> from hello import db
    >>> db.create_all()

    As long as you see no error message this has worked

  7. Create a script for adding some test data. In add:
    def create_test_data():
        for name in ('Fred', 'Sue'):
            new_name = Name(name=name)
            for year in range(1990, 1996):
                new_frequency = NameFrequency(
                    boys_count=random.randint(50, 100),
                    girls_count=random.randint(50, 100))
  8. And run the script
    >>> from hello import create_test_data
    >>> create_test_data()
  9. Test this:
    >>> from models import Name
    >>> names = Name.query.all()
    >>> names[0].name
    >>> names[0].frequencies[0].boys_count

    Or some other number between 50 and 100

Whilst this works so far, there are a couple of problems. The database details, including password, are in the main source code. And when you try running it as a Flask script (python you get a circular import error. Let’s tidy this up

  1. Create a new file, <project_root>\, containing the configuration (with the correct login and database details):
    app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://&lt;username&gt;:&lt;password&gt;@localhost/&lt;database name&gt;'
  2. If you’re using git, make sure this isn’t under version control. Create/update .gitignore to include the following:
  3. Create a new file,, containing create_test_data from, plus necessary imports, and the code to run the create_test_data function when called from the command line
  4. Create
    import os
    import sys
    CURRENT_FILE = os.path.abspath(__file__)
    CURRENT_DIR = os.path.dirname(CURRENT_FILE)
    PROJECT_DIR = os.path.dirname(CURRENT_DIR)
    sys.path.append(PROJECT_DIR + '/src/')
    sys.path.append(PROJECT_DIR + '/virtualenv/lib/python3.5/site-packages/')
    from app import app as application
    if __name__ == '__main__':
  5. Create an “app” folder in the project_root folder
  6. In <project_root>/app create
    from flask import Flask
    from flask.ext.sqlalchemy import SQLAlchemy
    app = Flask(__name__)
    db = SQLAlchemy(app)
    from app import views, models
  7. Move into the app folder and change the first line to:
    from app import db
  8. In the app folder, create, containing @app.route(“/”) and hello() function from, plus necessary imports
  9. Test this. You should get

    Welcome, today is June 7th 2016

Let’s show some data

  1. In, add:
    from models import Name
  2. In render_template, pass in the Name objects:
    return render_template('helloworld.html', names=Name.query.all())
  3. In the helloworld.html template, get the series data from the Name objects:
                        series: [
                            {% for name in names %}
                                    name: '{{ }}',
                                    data: [
                                        {% for frequency in name.frequencies %}
                                            {{ frequency.boys_count }},
                                        {% endfor %}
                            {% endfor %}
  4. The chart title no longer makes sense. Change this to “Name frequencies”
  5. Test this. You should see another chart



Continue to Step 5 – Flask-Restless

Namepy – on the shoulders of giants

Whilst my core skill/tool is Python, I’m always learning new things, either inside or outside the Python ecosystem. I recently had the pleasure of working with Angular and Python/Flask. Here is a playful application based on these, plus Highcharts.

Going through “Python for Data Analysis”, some of the examples use a database of frequency of (US) baby names since 1880. I thought I’d combine this with a bit of Scrabble™.

In the Python world it’s common to add “py” to a word when making up names, so I’m calling this project “namepy”.

Since I’ll be using various frameworks and libraries, all created by others, I’ve subtitled this “On the shoulders of giants”.

Taking small steps often results in faster progress, so that’s what I’m be doing here.

Technical set up

The source code is at, with one branch per step.

Many production sites Content Delivery Networks for serving Javascript frameworks and libraries, usually minified, which helps to take the load of the server and may speed up first page load. To keep things simple and stable over time, I’m using full-sized, downloaded, copies.

I’m using WebFaction (affiliate link) as the host, since they make it easy to create Flask, Django and similar projects. And, as a popular host for developers, you’ll find lots of helpful documentation for developers online.

Getting started

Create a project folder

mkdir namepy
cd namepy

At the start of each of the steps

cd (my folder for personal projects)
cd namepy
git clone -b step1 step1

Note: “-b step1” specifies the name of the branch to clone. The second “step1” is the target folder, i.e. namepy/step1.


Continue to Step 1 – Angular “Hello World”

Investment Tracking System – Django/Python

My client, a start up with a lot of experience in their field, had identified an important gap in the market. Large sums of money were being invested, with very long payback periods, without access to effective performance tracking tools.

They designed a tool to cover the gap and asked me to create a demonstration system in preparation for generating interest and raising capital.

I developed the system in Django, Python, PostgreSQL and Javascript. The front end uses a dashboard template based on Bootstrap and jQuery. Graphs are created using the excellent Highcharts charting library.

The resulting system imports the base data and generates monthly cost and revenue forecasts, taking into account seasonal variations, tax allowances and more.


The main management screen gives quick access to some key performance indicators.


Constraints can be defined, and potential investments can be checked against them.


Actual results can be compared against the projections.


Different heat maps show absolute or relative performance by state or county.


This was an eight month intensive project, resulting in a demo site which generated a lot of interest in the industry and allowed the client to achieve their first round of funding.

Flask and Angular on Heroku

I am working my way through this excellent tutorial, covering Python3, Flask, Angular, Heroku, SQLAlchemy, Alembic, requests, Beautiful Soup, NLTK, Redis and D3. Here are some extra notes

  • To stop me from blindly copying/pasting the code, I printed off the tutorial and worked from the paper version.
  • I had some problems installing Virtualenvwrapper (on Linux Mint 17.2), until I followed these instructions
  • I had some clashes with Anaconda
    • Virtualenvwrapper’s deactivate clashed with Anaconda’s deactivate. Prompted by these instructions I renamed ~/anaconda/bin/activate and ~/anaconda/bin/deactivate
    • “pip install psycopg2” resulted in:
      Error: “setuptools must be installed to install from a source distribution”
      After much experimentation I guessed that this might be due to Anaconda. I created a new virtual machine (without Anaconda) and re-started the tutorial. This fixed the psycopg2 problem

Part 1, set up Heroku

  • I used a free Heroku account. Between a dedicated server, a WebFaction account and a HotDrupal account I’m already paying enough for hosting
  • “heroku create wordcounts-pro” gave me an error “Name is already taken”. According to this Heroku page,  app names are in the global namespace, so I guess I’m not the first one to follow this tutorial. To work around this, I prepended the app name with my initials, i.e. “heroku create cdg-wordcounts-pro”, etc
  • So I can push the changes to heroku I set up public key access
  • Before running “git push stage/pro master”, make sure to check in the changes to git (git add, git commit)

Part 2, set up databases

  • To create the Postgres database:
    • sudo su — postgres
    • psql
      • # CREATE DATABASE wordcount_dev;
      • # CREATE USER ‘<your user name>’
      • # GRANT ALL PRIVILEGES ON wordcount_dev TO <your user name>;
  • After running “heroku run python db upgrade …” I got the error message:
    No such file or directory: ‘/app/migrations/versions’

    • Locally I had an empty directory <app folder>/migrations/versions. However, git ignores empty directories. This is why I could run “.. db upgrade” locally but not on heroku
    • Oops, I’d forgotten to run
      python db migrate
      Now it worked fine
    • If you make the same mistake, remember to propagate the changes to heroku and then re-run db migrate on heroku

Part 3, requests, Beautiful Soup and NLTK

  • At one stage I got a server error. To sort this I looked at the heroku log:
    heroku logs –app <heroku app name>
  • When I ran the nltk downloader I didn’t get the usual gui but a “tui” (text user interface). It was fairly simple to navigate, but I didn’t bother to specify the location of the tokenizers. Instead I used the default (~/nltk_data) and then moved nltk_data into my app folder
  • The links to Bootstrap and jQuery didn’t work, either because I mistyped them or because they are out of date. The Bootstrap and jQuery websites give you up-to-date CDN links, so use those instead

Part 4, Redis task queue

  • I used these instructions to install Redis on Linux Mint
  • Apart from the inevitable few typing mistakes, everything worked remarkably smoothly. Nothing else to add

Part 5, Adding in Angular

  • It all went well, until I added the getWordCount function to the controller. I’d put the function inside the main.js file, but outside of the controller. When poller got called, none of the dependencies were included, so it couldn’t find $http (first line of poller function)
    • The error was: $http not defined
    • Despite comparing my version with the author’s GitHub one, I couldn’t see the difference. In the end I used the author’s version (of main.js) instead of mine. That worked fine. It took another line by line comparison to find the problem
  • The word/frequency list is no longer sorted. jsonify loses the order

Part 6, Staging the changes, including Redis

  • So far I’ve been using a free account. When I tried to add on Redis, heroku tells me: Please verify your account to install this add-on plan (please enter a credit card)
    • If I understand it correctly, it is still free (but don’t take my word for it – and don’t come back to me if you end up getting charged)
    • I entered my credit card details for my Heroku again. Now I can add Redis
  • “heroku addons:add redistogo –app” gave a warning to say that “addons:add” has been deprecated.
    • I used “addons:create” instead


Using PostgreSQL with Django

Here is how go set up Django to work with PostgreSQL

  1. Install necessary libraries, etc
    1. sudo apt-get install libpq-dev
    2. sudo apt-get install python-dev
    3. sudo apt-get install postgresql-contrib
  2. Create a new database and user
    1. sudo su – postgres
    2. createdb djangodev
    3. createuser -P djangodev
      1. Enter password, twice
    4. psql
      1. postgres=#  GRANT ALL PRIVILEGES ON DATABASE djangodev TO djangodev;
      2. \q
  3. Make sure you have your virtual environment activated
  4. pip install psycopg2
  5. Open up the project’s and change the DATABASES to:
    ‘default’: {
    ‘ENGINE’: ‘django.db.backends.postgresql_psycopg2’, # Add ‘postgresql_psycopg2’, ‘mysql’, ‘sqlite3’ or ‘oracle’.
    ‘NAME’: ‘djangodev’, # Or path to database file if using sqlite3.
    ‘USER’: ‘djangodev’,
    ‘PASSWORD’: ‘904ojioe_=3D’,
    ‘HOST’: ‘localhost’, # Empty for localhost through domain sockets or ‘’ for localhost through TCP.
    ‘PORT’: ”, # Set to empty string for default.
  6. python syncdb

Installing PostgreSQL

PostgreSQL seems to be the most popular DBMS (database management system) with Django developers, although MySQL is also used a lot

To install PostgreSQL, I used Linux Mint’s Software Manager (search for “postgresql”)

I also installed pgAdmin III, “a database design and management application for use with PostgreSQL” using the same method

I used to work on a system which used PostgreSQL, but that is a long time ago, so I had to ask the Internet to remind me how to get it going. Here is how to get it started

  1.  Using the Software Manager, install PostgreSQL and pgadmin3
  2. Set the PostgreSQL password, for the postgres user:
    1. sudo -u postgres psql
    2. postgres=#      \password postgress
    3. (set the password)
    4. \q
  3. Start Programming -> pgAdmin III
    1. Click on “Server Groups”
    2. Click on the plug icon (top left hand corner)
    3. Name: Local DBMS (or whatever you want to call it)
    4. Host: localhost
    5. Port, Service, Maintenance DB: leave as is
    6. Username: postgres
    7. Password: the password you set in the step above
    8. Click on “Ok”

You should now be able to view your PostgreSQL server in pgAdmin, and use it to manage users, databases, etc