Category Archives: PostgreSQL

Data Analysis with Python

Python is a very popular tool for data extraction, clean up, analysis and visualisation. I’ve recently done some work in this area, and would love to do some more. I particularly enjoy using my maths background and creating pretty, clear and helpful visualisations

  • Short client project, analysing sensor data. I took readings from two accelerometers and rotated the readings to get the relative movement between them. Using NumPy, Pandas and MatplotLib, I created a number of different charts, looking for a correlation between the equipment’s setting and the movement. Unfortunately the sensors aren’t sensitive enough to return usable information. Whilst not the outcome they were hoping for, the client told me “You’ve been really helpful and I’ve learned a lot”
  • At PyCon UK (Cardiff, September 2018) I attended 14 data analysis sessions. It was fascinating to see the range of tools and applications in Python data analytics. At a Bristol PyData MeetUp I summarised the sessions in a 5 minute lightening talk. This made me pay extra attention and keep useful notes during the conference
  • Short client project, researching best way to import a large data set, followed by implementation. The client regularly accesses large datasets using a folder hierarchy\to structure that data. They were looking to replace this with a professional database, i.e. PostgreSQL. I analysed their requirements, researched the different storage methods in PostgreSQL, reported my findings and created an import script.

Namepy – on the shoulders of giants

Whilst my core skill/tool is Python, I’m always learning new things, either inside or outside the Python ecosystem. I recently had the pleasure of working with Angular and Python/Flask. Here is a playful application based on these, plus Highcharts.

Going through “Python for Data Analysis”, some of the examples use a database of frequency of (US) baby names since 1880. I thought I’d combine this with a bit of Scrabble™.

In the Python world it’s common to add “py” to a word when making up names, so I’m calling this project “namepy”.

Since I’ll be using various frameworks and libraries, all created by others, I’ve subtitled this “On the shoulders of giants”.

Taking small steps often results in faster progress, so that’s what I’m be doing here.

Technical set up

The source code is at https://github.com/CoachCoen/namepy, with one branch per step.

Many production sites Content Delivery Networks for serving Javascript frameworks and libraries, usually minified, which helps to take the load of the server and may speed up first page load. To keep things simple and stable over time, I’m using full-sized, downloaded, copies.

I’m using WebFaction (affiliate link) as the host, since they make it easy to create Flask, Django and similar projects. And, as a popular host for developers, you’ll find lots of helpful documentation for developers online.

Getting started

Create a project folder

mkdir namepy
cd namepy

At the start of each of the steps

cd (my folder for personal projects)
cd namepy
git clone https://github.com/CoachCoen/namepy.git -b step1 step1

Note: “-b step1” specifies the name of the branch to clone. The second “step1” is the target folder, i.e. namepy/step1.

Next

Continue to Step 1 – Angular “Hello World”

Investment Tracking System – Django/Python

My client, a start up with a lot of experience in their field, had identified an important gap in the market. Large sums of money were being invested, with very long payback periods, without access to effective performance tracking tools.

They designed a tool to cover the gap and asked me to create a demonstration system in preparation for generating interest and raising capital.

I developed the system in Django, Python, PostgreSQL and Javascript. The front end uses a dashboard template based on Bootstrap and jQuery. Graphs are created using the excellent Highcharts charting library.

The resulting system imports the base data and generates monthly cost and revenue forecasts, taking into account seasonal variations, tax allowances and more.

Selection_005

The main management screen gives quick access to some key performance indicators.

Selection_006

Constraints can be defined, and potential investments can be checked against them.

Selection_007

Actual results can be compared against the projections.

Selection_008

Different heat maps show absolute or relative performance by state or county.

Selection_009

This was an eight month intensive project, resulting in a demo site which generated a lot of interest in the industry and allowed the client to achieve their first round of funding.

Flask and Angular on Heroku

I am working my way through this excellent tutorial, covering Python3, Flask, Angular, Heroku, SQLAlchemy, Alembic, requests, Beautiful Soup, NLTK, Redis and D3. Here are some extra notes

  • To stop me from blindly copying/pasting the code, I printed off the tutorial and worked from the paper version.
  • I had some problems installing Virtualenvwrapper (on Linux Mint 17.2), until I followed these instructions
  • I had some clashes with Anaconda
    • Virtualenvwrapper’s deactivate clashed with Anaconda’s deactivate. Prompted by these instructions I renamed ~/anaconda/bin/activate and ~/anaconda/bin/deactivate
    • “pip install psycopg2” resulted in:
      Error: “setuptools must be installed to install from a source distribution”
      After much experimentation I guessed that this might be due to Anaconda. I created a new virtual machine (without Anaconda) and re-started the tutorial. This fixed the psycopg2 problem

Part 1, set up Heroku

  • I used a free Heroku account. Between a dedicated server, a WebFaction account and a HotDrupal account I’m already paying enough for hosting
  • “heroku create wordcounts-pro” gave me an error “Name is already taken”. According to this Heroku page,  app names are in the global namespace, so I guess I’m not the first one to follow this tutorial. To work around this, I prepended the app name with my initials, i.e. “heroku create cdg-wordcounts-pro”, etc
  • So I can push the changes to heroku I set up public key access
  • Before running “git push stage/pro master”, make sure to check in the changes to git (git add, git commit)

Part 2, set up databases

  • To create the Postgres database:
    • sudo su — postgres
    • psql
      • # CREATE DATABASE wordcount_dev;
      • # CREATE USER ‘<your user name>’
      • # GRANT ALL PRIVILEGES ON wordcount_dev TO <your user name>;
  • After running “heroku run python manage.py db upgrade …” I got the error message:
    No such file or directory: ‘/app/migrations/versions’

    • Locally I had an empty directory <app folder>/migrations/versions. However, git ignores empty directories. This is why I could run “.. manage.py db upgrade” locally but not on heroku
    • Oops, I’d forgotten to run
      python manage.py db migrate
      Now it worked fine
    • If you make the same mistake, remember to propagate the changes to heroku and then re-run db migrate on heroku

Part 3, requests, Beautiful Soup and NLTK

  • At one stage I got a server error. To sort this I looked at the heroku log:
    heroku logs –app <heroku app name>
  • When I ran the nltk downloader I didn’t get the usual gui but a “tui” (text user interface). It was fairly simple to navigate, but I didn’t bother to specify the location of the tokenizers. Instead I used the default (~/nltk_data) and then moved nltk_data into my app folder
  • The links to Bootstrap and jQuery didn’t work, either because I mistyped them or because they are out of date. The Bootstrap and jQuery websites give you up-to-date CDN links, so use those instead

Part 4, Redis task queue

  • I used these instructions to install Redis on Linux Mint
  • Apart from the inevitable few typing mistakes, everything worked remarkably smoothly. Nothing else to add

Part 5, Adding in Angular

  • It all went well, until I added the getWordCount function to the controller. I’d put the function inside the main.js file, but outside of the controller. When poller got called, none of the dependencies were included, so it couldn’t find $http (first line of poller function)
    • The error was: $http not defined
    • Despite comparing my version with the author’s GitHub one, I couldn’t see the difference. In the end I used the author’s version (of main.js) instead of mine. That worked fine. It took another line by line comparison to find the problem
  • The word/frequency list is no longer sorted. jsonify loses the order

Part 6, Staging the changes, including Redis

  • So far I’ve been using a free account. When I tried to add on Redis, heroku tells me: Please verify your account to install this add-on plan (please enter a credit card)
    • If I understand it correctly, it is still free (but don’t take my word for it – and don’t come back to me if you end up getting charged)
    • I entered my credit card details for my Heroku again. Now I can add Redis
  • “heroku addons:add redistogo –app” gave a warning to say that “addons:add” has been deprecated.
    • I used “addons:create” instead

 

Using PostgreSQL with Django

Here is how go set up Django to work with PostgreSQL

  1. Install necessary libraries, etc
    1. sudo apt-get install libpq-dev
    2. sudo apt-get install python-dev
    3. sudo apt-get install postgresql-contrib
  2. Create a new database and user
    1. sudo su – postgres
    2. createdb djangodev
    3. createuser -P djangodev
      1. Enter password, twice
    4. psql
      1. postgres=#  GRANT ALL PRIVILEGES ON DATABASE djangodev TO djangodev;
      2. \q
  3. Make sure you have your virtual environment activated
  4. pip install psycopg2
  5. Open up the project’s settings.py and change the DATABASES to:
    DATABASES = {
    ‘default’: {
    ‘ENGINE’: ‘django.db.backends.postgresql_psycopg2’, # Add ‘postgresql_psycopg2’, ‘mysql’, ‘sqlite3’ or ‘oracle’.
    ‘NAME’: ‘djangodev’, # Or path to database file if using sqlite3.
    ‘USER’: ‘djangodev’,
    ‘PASSWORD’: ‘904ojioe_=3D’,
    ‘HOST’: ‘localhost’, # Empty for localhost through domain sockets or ‘127.0.0.1’ for localhost through TCP.
    ‘PORT’: ”, # Set to empty string for default.
    }
    }
  6. python manage.py syncdb

Installing PostgreSQL

PostgreSQL seems to be the most popular DBMS (database management system) with Django developers, although MySQL is also used a lot

To install PostgreSQL, I used Linux Mint’s Software Manager (search for “postgresql”)

I also installed pgAdmin III, “a database design and management application for use with PostgreSQL” using the same method

I used to work on a system which used PostgreSQL, but that is a long time ago, so I had to ask the Internet to remind me how to get it going. Here is how to get it started

  1.  Using the Software Manager, install PostgreSQL and pgadmin3
  2. Set the PostgreSQL password, for the postgres user:
    1. sudo -u postgres psql
    2. postgres=#      \password postgress
    3. (set the password)
    4. \q
  3. Start Programming -> pgAdmin III
    1. Click on “Server Groups”
    2. Click on the plug icon (top left hand corner)
    3. Name: Local DBMS (or whatever you want to call it)
    4. Host: localhost
    5. Port, Service, Maintenance DB: leave as is
    6. Username: postgres
    7. Password: the password you set in the step above
    8. Click on “Ok”

You should now be able to view your PostgreSQL server in pgAdmin, and use it to manage users, databases, etc