Category Archives: Data Analysis

Data Analysis with Python

Python is a very popular tool for data extraction, clean up, analysis and visualisation. I’ve recently done some work in this area, and would love to do some more. I particularly enjoy using my maths background and creating pretty, clear and helpful visualisations

  • Short client project, analysing sensor data. I took readings from two accelerometers and rotated the readings to get the relative movement between them. Using NumPy, Pandas and MatplotLib, I created a number of different charts, looking for a correlation between the equipment’s setting and the movement. Unfortunately the sensors aren’t sensitive enough to return usable information. Whilst not the outcome they were hoping for, the client told me “You’ve been really helpful and I’ve learned a lot”
  • At PyCon UK (Cardiff, September 2018) I attended 14 data analysis sessions. It was fascinating to see the range of tools and applications in Python data analytics. At a Bristol PyData MeetUp I summarised the sessions in a 5 minute lightening talk. This made me pay extra attention and keep useful notes during the conference
  • Short client project, researching best way to import a large data set, followed by implementation. The client regularly accesses large datasets using a folder hierarchy\to structure that data. They were looking to replace this with a professional database, i.e. PostgreSQL. I analysed their requirements, researched the different storage methods in PostgreSQL, reported my findings and created an import script.

Python for Data Analysis and Natural Language Processing

As I’m making my way through Natural Language Processing with Python and Data Science from Scratch: First Principles with Python, the first step is to set up the development environment.

My first attempt was to install numpy, python, nltk, matplotlib, IPython, etc, one at a time. However, I hit a few clashes between Python versions, so switched to Anaconda instead:

  1. Download Anaconda
  2. From the download folder, execute
    1. sh <name of downloaded file>
    2. Accept the defaults, but answer yes to preprend to PATH
  3. To check it works, start Python
    1. import numpy
    2. import pandas
    3. import matplotlib
  4. Download the nltk assets. Start Python and enter:
    1. import nltk
    2. nltk.download()
    3. use GUI to download “all”