Book Review: Learning Python Testing

Learning Python Testing is a book that focuses on how one can go about testing
Python code. The book focuses on white-box testing using Python, though it
provides a brief overview of various types of testing.

The book starts of with an overview of various types of testing (Chapter-1) and
the next chapter (Chapter-2) provides an exhaustive coverage of testing using
doctest and how this module can be used to test python code when you are just
starting with development. Chapter-3 explains how one can get started with unit
tests using doctest module.

After providing an introduction to unit testing using doctest the next chapter
(Chapter-4) provides a gentle introduction to the concept of mocking and how
this can be achieved in Python using the mock module. Chapter-5 provides an
introduction to unittest, and Chapter-6 provides an introduction to
Nose. Chapter-7 provides unification of testing concepts introduced in the
earlier chapters, by providing a step-by-step guide for building a testsuite for
a sample project.

Chapter-8 provides an approach one can use to start with integration testing,
explaining how the unit testing based approach, along with making actual calls instead of
mocking would be used for integration testing.

Chapter-9 provides an overview other tools and techniques related to testing
like, running code coverage, writing hooks for version control systems to
automatically trigger test runs.

This book is a good introduction to get started with unit testing using Python,
and definitely of help if one is not already familiar with concepts like unit
testing, Test Driven Development (TDD) and Python modules like
unittest, doctest, and mock.

One main grouse I have about the books is the usage and focus on doctest
module for introducing unit testing concepts. I think it would have been better
if after an introductory chapter on doctest coverage of unit testing had been
done using the module unittest. In my experience usage of module unittest
is more prevalent than doctest. (The subsequent chapters in the book do provide a
fair coverage of unittest). Also it would have been better if there were some
details on additional python test frameworks like pytest and also some coverage
of python test automation framework like Robot Framework. These would have made the coverage of Python testing in this book to be a bit comprehensive.

(Full Disclosure: I had requested and had got a complementary copy of the book Learning Python Testing for review.)

Posted in Uncategorized | Tagged , , | 1 Comment

Paginated display of table data in PyQt

In GUI applications one comes across a situation where there is a need
to display a lot of items in a tabular or list format (for example
displaying large number of rows in a table). One way to increase the
GUI responsiveness is to load a few items when the screen is displayed
and defer loading of rest of the items based on user action. Qt
provides a solution to address this requirement of loading the data on
demand.

The class QAbstractItemModel class in Qt one of the
Model/View classes and is part of the Qt’s model/view framework. It
defines a standard interface that its subclasses must override to
interact with other components in the model/view framework. It
provides methods canFetchMore and fetchMore which
can be implemented in the model subclasses to achieve the goal of
loading the data on demand in the views.

To generate a tabular display in Qt it is quite common to create
an instance of QTableView class and specify a model for this class
which is an instance of QAbstractItemModel

Consider, the below sample code that displays name of a person along
with the city:

#!/usr/bin/env python
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *

class Person(object):
    """Name of the person along with his city"""
    def __init__(self,name,city):
        self.name = name
        self.city = city

class PersonDisplay(QMainWindow):
    def __init__(self,parent=None):
        super(PersonDisplay,self).__init__(parent)
        self.setWindowTitle('Person City')
        view = QTableView()
        tableData = PersonTableModel()
        view.setModel(tableData)
        self.setCentralWidget(view)

        tableData.addPerson(Person('Ramesh', 'Delhi'))
        tableData.addPerson(Person('Suresh', 'Chennai'))
        tableData.addPerson(Person('Kiran', 'Bangalore'))

class PersonTableModel(QAbstractTableModel):
    """Model class that drives the population of tabular display"""
    def __init__(self):
        super(PersonTableModel,self).__init__()
        self.headers = ['Name','City']
        self.persons  = []

    def rowCount(self,index=QModelIndex()):
        return len(self.persons)

    def addPerson(self,person):
        self.beginResetModel()
        self.persons.append(person)
        self.endResetModel()

    def columnCount(self,index=QModelIndex()):
        return len(self.headers)

    def data(self,index,role=Qt.DisplayRole):
        col = index.column()
        person = self.persons[index.row()]
        if role == Qt.DisplayRole:
            if col == 0:
                return QVariant(person.name)
            elif col == 1:
                return QVariant(person.city)
            return QVariant()

    def headerData(self,section,orientation,role=Qt.DisplayRole):
        if role != Qt.DisplayRole:
            return QVariant()

        if orientation == Qt.Horizontal:
            return QVariant(self.headers[section])
        return QVariant(int(section + 1))

def start():
    app  = QApplication(sys.argv)
    appWin = PersonDisplay()
    appWin.show()
    app.exec_()

if __name__ == "__main__":
    start()

When the number of persons whose details that needs to be displayed
are fewer and when there is not a lot of processing to be done to
populate the cell values, the details are displayed without a
significant delay. But in cases where there lot of rows or where some
amount of computation is needed to display each attribute of a cell
then it makes sense to load only a few rows to the table and load
remaining rows based on the user action (like when user scrolls
down). The pagination of data loads the corresponding GUI screen
faster providing a better user experience.

To paginate the table data only the code in the model class
(PersonTableModel in our example) needs to be modified, which
demonstrates the benefits of clear separation between model and view
class that Qt provides. The methods canFetchMore and
fetchMore needs to be overridden in the child class
PersonTableModel to paginate the table data.

Below is the example code which shows only the modified table model
class, rest of the code remains same as above code example:

class PersonTableModel(QAbstractTableModel):

    ROW_BATCH_COUNT = 15

    def __init__(self):
        super(PersonTableModel,self).__init__()
        self.headers = ['Name','City']
        self.persons  = []
        self.rowsLoaded = PersonTableModel.ROW_BATCH_COUNT

    def rowCount(self,index=QModelIndex()):
        if not self.persons:
            return 0

        if len(self.persons) <= self.rowsLoaded:
            return len(self.persons)
        else:
            return self.rowsLoaded

    def canFetchMore(self,index=QModelIndex()):
        if len(self.persons) > self.rowsLoaded:
            return True
        else:
            return False

    def fetchMore(self,index=QModelIndex()):
        reminder = len(self.persons) - self.rowsLoaded
        itemsToFetch = min(reminder,PersonTableModel.ROW_BATCH_COUNT)
        self.beginInsertRows(QModelIndex(),self.rowsLoaded,self.rowsLoaded+itemsToFetch-1)
        self.rowsLoaded += itemsToFetch
        self.endInsertRows()

    def addPerson(self,person):
        self.beginResetModel()
        self.persons.append(person)
        self.endResetModel()

    def columnCount(self,index=QModelIndex()):
        return len(self.headers)

    def data(self,index,role=Qt.DisplayRole):
        col = index.column()
        person = self.persons[index.row()]
        if role == Qt.DisplayRole:
            if col == 0:
                return QVariant(person.name)
            elif col == 1:
                return QVariant(person.city)
            return QVariant()

    def headerData(self,section,orientation,role=Qt.DisplayRole):
        if role != Qt.DisplayRole:
            return QVariant()

        if orientation == Qt.Horizontal:
            return QVariant(self.headers[section])
        return QVariant(int(section + 1))

In the above code the variable ROW_BATCH_COUNT specifies the
number of rows for initial display of table and batch size for
subsequent view refreshes. The variable self.rowsLoaded is
initialized with ROW_BATCH_COUNT and is incremented when
there is a user action to trigger display of more number of rows in
the table.

The method rowCount is adapted so that number of rows shows
the number of rows that has been currently loaded.

The method canFetchMore returns True if the number
of rows loaded is less than the number of person instances, indicating
that there is more data that needs to be displayed.

The method fetchMore is triggered in cases where
canFetchMore returns True and in this method number
of rows to be loaded is increased by ROW_BATCH_COUNT

As shown in the example code, with a clear separation between
model,view and overriding two methods pagination can be achieved in Qt
with much ease.

Posted in GUI Programming | Tagged , , | 5 Comments

Skipping tests in unittests

While writing unittests one might comes across situations where a test
method takes a considerable amount of time (say you want to to parse a
huge XML file to ensure that your code can handle huge files). At the
same time you don’t want to run this particular time consuming method
during all the invocations of your unittests. In a nutshell you want
to selectively enable/disable running some of the unit tests. Starting
from Python 2.7, the unittest module provides following decorators to
skip the tests:

  1. @unittest.skip
  2. @unittest.skipIf
  3. @unittest.skipUnless

While decorator (1) enables skipping of a test method
unconditionally,the decorators (2), and (3) enable skipping of a test
method based on a pre-condition. The below unit test sample code
demonstrates usage of these decorators:

import sys
import unittest

class TestSample(unittest.TestCase):
    @unittest.skip('Unconditional skipping of the test method')
    def test_skip_unconditional(self):
        self.assertTrue(True)

    @unittest.skipUnless('-smoke' in sys.argv,\
                         'Skipping the smoke test')
    def test_skip_unless_conditional(self):
        self.assertTrue(True)

    @unittest.skipIf('-quick' in sys.argv,\
                     'Skipping a time consuming test')
    def test_skip_if_conditional(self):
        self.assertTrue(True)

if __name__ == '__main__':
    # don't pass the args to unittest module
    try:
        sys.argv.remove('-quick')
    except ValueError:
        pass

    try:
        sys.argv.remove('-smoke')
    except ValueError:
        pass

    unittest.main()

The test test_skip_unconditional is skipped all the times. The test test_skip_unless_conditional is run only if the unittest is invoked with -smoke option while the test test_skip_if_conditional is skipped if the unittest is invoked
with -quick option.

Here are results of running the above unittest module:

$ python sample_tests.py -v
test_skip_if_conditional (__main__.TestSample) ... ok
test_skip_unconditional (__main__.TestSample) ...
    skipped 'Unconditional skipping of the test method'
test_skip_unless_conditional (__main__.TestSample) ...
    skipped 'Skipping the smoke test'

-------------------------------------------------------------
Ran 3 tests in 0.000s

OK (skipped=2)

$ python sample_tests.py -v -quick -smoke
test_skip_if_conditional (__main__.TestSample) ...
    skipped 'Skipping a time consuming test'
test_skip_unconditional (__main__.TestSample) ...
    skipped 'Unconditional skipping of the test method'
test_skip_unless_conditional (__main__.TestSample) ... ok

----------------------------------------------------------------
Ran 3 tests in 0.000s

OK (skipped=2)

The unittest module also provides a decorator @unittest.expectedFailure
which can be used to mark a test as an expected failure. This is handy when you
identify a bug in your code and you have not decided to fix that bug in your current
release (say). But at the same time you want to have a record of failure
(so that it is easier re-create the issue at a later time or you don’t want to loose
track of the bug you identified to). In such cases you can mark such a method with the decorator @unittest.expectedFailure. This ensures that though the test is run it
is not counted as a failure (thus keeping both you and the build master happy). Here is a sample code using the decorator @unittest.expectedFailure:

import unittest

class TestSample(unittest.TestCase):
    @unittest.expectedFailure
    def test_to_fail(self):
        val = 100
        self.assertTrue(val == 42)

if __name__ == '__main__':
    unittest.main()

The resulting output obtained by running above test is as below:

$ python sample_tests.py
x
-----------------------------------------------------------------
Ran 1 test in 0.000s

OK (expected failures=1)
Posted in Python, Uncategorized | Tagged , | Leave a comment

Benchmarking approaches to find unique elements

Consider the scenario where we are receiving a stream of strings and
want to store only unique strings. One approach of achieving this
in python would be as below:

# Assuming values() returns a constant stream of strings
unique_vals = []
for val in self.values():
    if val not in unique_vals:
       unique_vals.append(val)

However in python strings are immutable and are hashable. Assuming we
are not interested in ordering wouldn’t it be better if we use a set
instead of list as the container to gather unique strings. So the
above code snippet can be written as:

unique_vals = set()
for i in self.values():
    unique_vals.add(val)

But which of this is faster ? I think the version using the set
would be faster since membership tests would make use of hash values
of the set elements. One way that this can be verified is by running
some benchmark tests using python’s timeit module.

Here are some of the tests I ran and the results I got.

>>> from timeit import timeit
>>> timeit(stmt='x = range(100); 99 in x')
4.287418931735737
>>> timeit(stmt='x = set(range(100)); 99 in x')
7.882405166019703

In the above example I create a list of numbers till 99 and check if
the number 99 exists in the container (list and set). I choose 99
because the membership test in a list would be a linear search and
choosing a value of 99 ensures that we are taking into account the
worst case scenario for benchmarking. Surprisingly the version
involving set takes more time (close to 2x) than the one that involves
the list. But a closer look reveals a problem in the benchmarking
code, the benchmark tests not only compare the lookups but also the
time it takes initialize the sequence (set or list).

So I run the benchmark test ignoring the time involved for sequence
initialization.

>>> timeit(setup='x=range(100)',stmt='99 in x')
2.739381960556443
>>> timeit(setup='x=set(range(100))',stmt='99 in x')
0.08809443658344662

As the results above indicate that snippet that uses the set
outperforms the one that uses the list.

So moving further and running benchmarks that is closer to the problem
I started with, here are the results I get:

>>> timeit(setup='x=[]',stmt="""for i in range(10)+range(15):
	if i not in x:
	    x.append(i)""")
7.740467014662272
>>> timeit(setup='x=set()',stmt="""for i in range(10)+range(15):
	    x.add(i)""")
5.565552325784438

As the above results indicate, to find unique elements in a sequence of
hashables it is efficient to use a set instead of a list (assuming we are not
interested in preserving the ordering).

Posted in Python, Tech | Tagged , , | Leave a comment

Are debuggers crutches ?

As every programmer knows debuggers are useful tools to step through
the program execution.

Some of the scenarios where debuggers are quite helpful are:

  • While tracing a bug or error they are useful to pinpoint the exact
    location of the problem in the source code.
  • They are helpful while trying to understand an unfamiliar code base
    one is just getting started with it.
  • When one is learning a programming language debuggers are handy in
    understanding the program flow. One can set breakpoints in the code
    where results doesn’t match the expectation and understand how the
    source code actually handles the data.

Given the value debuggers would provide to a developer why would
experienced developers have mixed opinions about them.

Here is what some of the experts say about debuggers:

“As a personal choice, we tend to not use debuggers beyond getting a
stack trace or the value of a variable or two. One reason is that it
is easy to get lost in details of complicated data structures and
control flow; we find stepping through a program less productive than
thinking harder and adding output statements and self-checking code at
critical places. Clicking over statements take longer than scanning
the output of judiciously-placed displays. It takes less time to
decide where to put print statements than to single-step to critical
section of code, even assuming we know where that is. More important,
debugging statements stay with the program; debugger sessions are transient.

Blind probing with a debugger is not likely to be productive. It is
more helpful to use the debugger to discover the state of the program
when it fails, then think about how the failure could have happened. “

Kernighan and Pike, The Practice of Programming

“Given the enormous power offered by modern debuggers, you might be
surprised that anyone would criticize them. But some of the most
respected people computer science recommend not using them. They
recommend using your brain and avoiding debugging tools
altogether. Their argument is that debugging tools are a crutch and
that you find problems faster and more accurately by thinking about
them than by relying on tools. They argue that you, rather than the
debugger, should mentally execute the program to flush out defects.”

Steve McConnell, Code Complete

“An interactive debugger is an outstanding example of what is not
needed — it encourages trial-and-error hacking rather than systematic
design, and also hides marginal people barely qualified for precision
programming”

Harlan Mills (Quote from Code Complete)

Like many programmers I also used the debugger very frequently.  While
programming with Perl I had used ptkdb and while working with Java I
used Eclipse which has an excellent visual debugger. But things got
changed when I started programming with Python. I got bitten while
debugging some Python code, which made heavy use of decorators, using
visual debugger that came with the IDE (it was not Eclipse). After
wasting a considerable amount of time I realized that the code flow
using the debugger (probably due to a bug in it) was different than
how the program would behave when it was executed. With that
experience I dropped the IDE and switched back to world’s most
customizable editor 🙂 and also stopped depending on the debugger
altogether.

Ever since then I have used the debugger very sparingly. At the same
time there have been no dearth of issues I faced with the code I
worked, developed.

These are some of my learning I got ever since I stopped being
over-dependent on debuggers:

  • Frequent usage of debuggers prevents one from stopping and thinking
    about the code one is developing (or having issue with). When the
    debugger was handy my first reaction when faced with an issue was to
    fire the debugger and launch the code trace with it. I have now become
    more disciplined, pay more attention to the stack trace and try to
    trace issues by adding print statements and spending more time
    thinking about possible code flows that would result in the problem.
  • As any one who has programmed and faced issues would tell that a
    reasonable amount of problem solving also happens when one is not
    sitting in front of the monitor and actively coding. When one is quite
    involved with a problem, thoughts about it linger even when one is away
    from the monitor (say during commute or in shower etc.). Quite often a
    solution or hint about possible solution come to mind in those
    monitor-away moments. What I have realized is that when I don’t use
    the debugger I have more context in my mind than I would have if I
    were using the debugger. This helps me in thinking better approaches
    toward solving the problem and think of more possible ways in which
    something would have gone wrong.
  • Not using debugger has made me to trace the root causes for an
    issue far more quickly.

Does this mean that one should never use a debugger as they prevent
one from getting a bigger picture. I don’t think so and I don’t go as
far as to suggest that one should avoid debugger at all costs. I think
use of debugger in moderation , not using as one’s first line of
action when faced with a problem and always trying to view the problem
one is facing at the right context (with or with out debugger) are
better approaches to debugging.

If you are frequently using the debugger, stop reaching for the
debugger and see if that makes you a better developer.

Also see an interesting discussion in StackOverlflow about debuggers:
Debugger mother of all evils

Posted in Uncategorized | Tagged , | Leave a comment

Moving to git from Clearcase

Git is distributed version control system (DVCS) where as Clearcase is
centralized source control system. Clearcase provides access to data in the
repository via a virtual file system called MVFS(Multiversion File System).
Conceptually Git and Clearcase are very different in the way they manage
data and maintain different versions.

Here are some of the points about git which are different from Clearcase and
took some time for me to sink in. The list is limited to operational differences
I notice while using git for day to today work. Gaining a good understanding
of Git object model and its internals would provide a good insight to how
git works and help one to use git better.

  • Each git clone is a replica of the central repository
    There is no central repository that is sitting and holding all the code
    and all the related metadata. When a repository is cloned all the data that
    is available at the server is available at user machine as well. To
    put it Clearcase terminology each clone is a replica. Most of the
    operations that a user does on his machine updates the local replica
    and these changes need to be explicitly pushed to the server (say for
    sharing changes with other users, data protection in case of user
    machine crash etc.)
  • Git has no checkin command and checkout command is
    not what it means in Clearcase

    In Clearcase to make a file (say) version controlled the user has to
    use mkelem and checkin commands. Once a file is version
    controlled it will be owned by Clearcase and for any further
    modifications to a version controlled file user has to check it out
    (using checkout) make the modifications and check it in
    (using checkin command).
  • In git a file (say) can be version controlled by using add
    and commit commands. The user need not do anything similar to
    Clearcase checkout to modify the file. Once the file is modified it
    needs to be added (using add) and committed
    (using commit) to create a newer version of the file.

    A warning to Clearcase user working with git is that doing
    a checkout file in git would get the version of file that was
    part of the previous commit thus overwriting any local changes made to
    the file.

  • There are no view and no config specs
    In Clearcase what is visible in a working directory is ruled by the
    configuration specification (cspec) the user’s view has. The cspec
    rules determine which version of a file needs to be made visible, from
    which branch a file needs to be picked etc. If the user needs to view
    a file (element) in a different branch he needs to edit his cspec or
    use the version extended path name of the element to access the
    version of the element in a different branch. This at its best is
    cumbersome.
  • In git there is no such thing as config spec or a view.Any directory
    to which a git repository has been cloned or repository has been created is the
    location where the repository contents are available.A user can
    switch branches by just checking out the appropriate branch.

    Some useful git links

Posted in Uncategorized | Tagged , | 4 Comments

Installing latex packages in Cygwin

  • Obtain the package you need to install from CTAN (http://www.ctan.org/).
  • Extract the downloaded package into a directory (say latex_source)
  • Run latex against the .ins file available as part of the package.
  • Copy the generated .sty and .cfg files to the directory where latex looks for the packages (In Cygwin one of the directory under which latex packages are searched under is : /usr/share/texmf).
  • Update the tex filename database running mktexlsr (or texhash which is an alias to mktexlsr)

Below is an example that uses above steps to install listings package. The listings package is a source code printer and is useful to typeset code snippets in the documents.

Download listings package from CTAN

%unzip listings.zip
%cd listings
%ls
Makefile  README  listings.dtx  listings.ind  listings.ins  listings.pdf  lstdrvrs.dtx
%latex listings.ins
[…]
**********************************************************
* This program converts documented macro-files into fast *
* loadable files by stripping off (nearly) all comments! *
**********************************************************
      * No Configuration file found, using default settings. *
********************************************************
Generating file(s) ./listings.sty ./lstmisc.sty ./lstdoc.sty ./lstdrvrs.ins ./l
istings.cfg
[…]

* You probably need to move all created `.sty' and `.cfg'
* files into a directory searched by TeX.
*
* And don't forget to refresh your filename database
* if your TeX distribution uses such a database.
% mkdir /usr/share/texmf/tex/latex/listings
%mv *.sty /usr/share/texmf/tex/latex/listings
%mv *.cfg /usr/share/texmf/tex/latex/listings
%
$ mktexlsr
mktexlsr: Updating /home/…./.texmf/var/ls-R...
mktexlsr: Updating /usr/share/texmf/ls-R...
mktexlsr: Updating /var/cache/fonts/ls-R...
mktexlsr: Updating /var/lib/texmf/ls-R...
mktexlsr: Done.

Posted in Uncategorized | Tagged , | Leave a comment

Beautiful Code

I have dabbled quite a few times with mindmaps but I am mostly linear mode person writing line after line or making bullet points when I want to make a note of anything worth taking note of. Now I am in the phase where I am dabbling again with mindmaps  and I came across this write-up about what constitutes a beautiful code. The write-up is by Yukihiro Matsumoto, creator of programming language Ruby, and is part of the book Beautiful Code . In the chapter the author writes about what beautiful code means to him and uses Ruby to explain his viewpoint. But the chapter is a short one and doesn’t has much code in it. I made a summary of the chapter using MindMap and here is the result:

Beautiful Code

The mindmap was created using FreeMind.

Posted in Uncategorized | Leave a comment