Overview

Last time, in part two, we built and tested our set container class, ModSet(), with a variety of methods that will prove useful when we construct the power set. Methods included copying, nesting/unnesting, filtering to keep only unique members as well as set-union and set-subtraction operations. You will see that most of these will be used, either implicitly or explicity, in both Binary Mask and Recursive methods of generating power sets.

Preliminaries

When we are ready to build our power set, we'll measure the time it takes each of the two routines to complete, so we can compare their performance. As a preliminary, let's import the time module so it stands at ready.

In [21]:
import time

Of course we will need an example source set to operate on. Let's define it now and be done with it.

In [65]:
clumsyPrez = ModSet(set([ 6.1, 2021, "fight like hell", "we love you",('impeachment','#2')]))

Binary Mask Power set Generator

In this iterative-based approach, we will populate the power set by applying binary masks to extract out member subsets, one-by-one, from the source set in ModSet().val. If you recall, a verbal description of power set generation using a binary mask we discussed way back in part one; you can find it here. Below, we first provide an overview of our binary mask power set generator routine, followed by the routine itself, with heavy commenting, concluded with a detailed walk-through after that.

The method, named .powerSet_bin(), contains 19 lines and calls .pushDown() and .union() sibling methods. Though we could use NumPy here to make mask generation and member extraction more compact, we did not want to confound the comparision (to the recursive approach to follow) by incorporating functionality of an external module. Already, the code has been somewhat compressed by use of generator expressions.

The routine below can be divided into two main blocks: variable initialization, and the run-time loop that builds each binary mask for extracting the corresponding subset on each iteration. The loop populates the powerset member-by-member and terminates when all $2^n$ members have been added. Because indexing is crucial in this approach, you'll notice heavy use of range() and list() datatypes. As you'll see later, the recursive approach requires no such crutch.

In [ ]:
class ModSet():
    .
    .
    .
    # Generate powerset via direct, binary mask approach
    def powerSet_bin(self):
        
        ## Initialize local variables ##
        S = list(self.val)         # convert to list for indexing
        setSize = len(self.val)    # count number of members in source set
        psetSize = pow(2, setSize) # calculate the number of elements in the power set
        lastIndex = setSize - 1    # index value of last member
        setIndices = range(0, setSize)  # make indices list for source set
        psetIndices = range(0, psetSize) # make indices list for power set to be built
        bMasks = [[False for s in setIndices] for p in psetIndices] # Initialize binary mask       
        pSet = ModSet(set())  # initialize power set as empty ModSet() instance
        pSet.pushDown(1)  # and nest it down one level for later joining
        
        ## Populate powerset with each subset, one at a time ##
        for i in psetIndices: # loop through each member of power set
            
            ## Generate binary mask for subset i of power set ##
            diff = i  # assign current pSet index as initial bit-flip threshold
            for j in setIndices: # loop through each digit of mask
                
                if (diff >= pow(2, lastIndex - j)): # if threshold >= highest digit, then
                    bMasks[i][lastIndex - j] = True  # set corresp. mask digit to "true"
                    diff -= pow(2, lastIndex - j)   # reduce threshold for next mask digit
                    
            ## Form subset i of power set ##
            # Use generator expression for compactness
            dummySet = ModSet(set([S[k] for k in setIndices if bMasks[i][k] == True]))
            dummySet.pushDown(1)  # nest ModSet instance down one level for union join
            pSet.union(dummySet)  # include new subset in power set
            
        return pSet, bMasks  # return complete power set and binary masks as output

With the binary mask routine complete, we're ready to build our power set. Below we've sandwiched the call within two time.time() reads, so we can measure its runtime duration. Let's examine bMasks output first.

In [66]:
tStart = time.time() # clock start timestamp
pSet_bin, bMasks = clumsyPrez.powerSet_bin() # call our binary mask power set generator
duration = time.time() - tStart # calc run duration by subtracting tStart from current time.
bMasks # show list of binary masks
Out[66]:
[[False, False, False, False, False],
 [True, False, False, False, False],
 [False, True, False, False, False],
 [True, True, False, False, False],
 [False, False, True, False, False],
 [True, False, True, False, False],
 [False, True, True, False, False],
 [True, True, True, False, False],
 [False, False, False, True, False],
 [True, False, False, True, False],
 [False, True, False, True, False],
 [True, True, False, True, False],
 [False, False, True, True, False],
 [True, False, True, True, False],
 [False, True, True, True, False],
 [True, True, True, True, False],
 [False, False, False, False, True],
 [True, False, False, False, True],
 [False, True, False, False, True],
 [True, True, False, False, True],
 [False, False, True, False, True],
 [True, False, True, False, True],
 [False, True, True, False, True],
 [True, True, True, False, True],
 [False, False, False, True, True],
 [True, False, False, True, True],
 [False, True, False, True, True],
 [True, True, False, True, True],
 [False, False, True, True, True],
 [True, False, True, True, True],
 [False, True, True, True, True],
 [True, True, True, True, True]]

Above, you can notice that masks progress from all false to all true in a logical pattern. This ordering would be a great feature if we cared about how members are ordered in our power set. But sets, strictly defined, are not distinguished by the ordering of their members2. That is, set $\{a, b, c\}$ is equivalent to set $\{c, a, b\}$ and $\{b, a, c\}$, and so on and so forth.

So, after taking a look at our power set in pSet_bin, we see that the nice ordering was all for naught.

In [83]:
# Report duration of binary mask power set generation
print('The Binary Mask approach took %0.6f seconds to complete.'%(duration))
print('The power set contains %i subset elements'%(len(pSet_bin.val)))
pSet_bin.val # show power set output; remember, this is a set!
The Binary Mask approach took 0.007008 seconds to complete.
The power set contains 32 subset elements
Out[83]:
{set(),
 {'fight like hell', ('impeachment', '#2'), 2021, 6.1},
 {'fight like hell', ('impeachment', '#2'), 2021},
 {'fight like hell', 2021, 6.1},
 {'fight like hell', 2021},
 {'fight like hell', 6.1},
 {'fight like hell'},
 {'we love you', 'fight like hell', 6.1},
 {'we love you', 'fight like hell'},
 {'we love you', ('impeachment', '#2'), 'fight like hell', 6.1},
 {'we love you', ('impeachment', '#2'), 'fight like hell'},
 {'we love you', ('impeachment', '#2'), 2021, 'fight like hell'},
 {'we love you', ('impeachment', '#2'), 2021, 6.1},
 {'we love you', ('impeachment', '#2'), 2021},
 {'we love you', ('impeachment', '#2'), 6.1},
 {'we love you', ('impeachment', '#2')},
 {'we love you', 2021, 'fight like hell'},
 {'we love you', 2021, 6.1, 'fight like hell'},
 {'we love you', 2021, 6.1},
 {'we love you', 2021},
 {'we love you', 6.1},
 {'we love you'},
 {('impeachment', '#2'), 'fight like hell', 6.1},
 {('impeachment', '#2'), 'fight like hell'},
 {('impeachment', '#2'), 2021, 6.1},
 {('impeachment', '#2'), 2021},
 {('impeachment', '#2'), 6.1},
 {('impeachment', '#2')},
 {2021, 6.1, 'we love you', ('impeachment', '#2'), 'fight like hell'},
 {2021, 6.1},
 {2021},
 {6.1}}

From the power set output string above, it looks like we have many sub-set instances as members of one big super-set. But, if you recall from all the way back in part one, this is not the case; our power set pSet_bin is actually a ModSet() object, whose .val attribute is a set, each element of which is another ModSet() instance that contains one unique sub-set of the original source set (as its .val attribute). We defined the ModSet.__repr__() method to return the code representation of its .val attribute; that's the reason why the return string from calling pSet_bin.val appears in this set-resembling format.

Recursive Power set generator

In part one, we went over the recursive algorithm that we used to design our recursive power set generator below. You can find it here if you would like a refresher.

Our recursion-based method for generating powersets contains just 11 lines of code but calls four sibling methods, as well as itself, to run. Though .powerSet_rec() has fewer lines than .powerSet_bin(), it relies more heavily upon behavior of sibling methods in the ModSet() class, i.e. .__copy__(), .pushDown(), .diffFunc() and .union() to perform the respective obligatory processing tasks of duplication, nesting, extraction, joining and filtering. Unlike the Binary Mask approach above, indexing is not needed here, and hence, not employed. Through recursion, we can generate the identical power set using thoughtfully-placed set-subtraction, -union and -filtering operations.

In [ ]:
class ModSet():
    .
    .
    .
    # Generate power set recursively.
    def powerSet_rec(self):
        
        pSet = self.__copy__()      # Preserve self instance; its copy, pSet
                                    # will be altered
        pSet.pushDown(1)            # Nest pSet for later joining.
              
        if len(self.val) > 0:       # Recursion termination condition
            for elSet in self.val:  # Iterate through members of set self.val
                # Generate subset that remains after removing current
                # element, elSet, from set self.val
                dummySet = self.diffFunc(ModSet(set([elSet]))) 
                # To current powerset, append the powerset of the
                # subset from previous step
                pSet.union(dummySet.powerSet_rec())  # Self-call power set method,
                                                     # union join power set of 
                                                     # dummySet with pSet
            return pSet             # Return power set at current 
                                    # level of recursion
        else:
            dummySet = ModSet(set())  # Generate instance of ModSet of empty set
            dummySet.pushDown(1)      # Nest empty set down one level so it can
            return dummySet           # be union-joined in the recursion level
                                      # above (that called this current run).

To summarize the program flow: first a duplicate of the calling instance is made to serve as the (local) power set within our routine. This instance is promptly pushed down one level so it can be joined by union later, if necessary. Next, the number of elements in the calling instance are counted. If empty, an empty ModSet() instance is returned on exit of the routine. This case path is the exit condition for the recursive flow; eventually all calling instances will dwindle down to empty as members are stripped from them in the alternative, non-empty, case that we'll describe next.

If the calling instance is not empty, .powerSet_rec() iterates over the elements therein, each time subtracting out the current element, elSet, and calling itself again to build the power set from elements that remain. Output from this recursive call is then joined by union with pSet. Remember that every time pSet.union() runs, it calls the filtering method removeDuplicates() to retain only unique members in pSet as it is assembled.

When the loop completes, and all subsets have been included at the present level, the local instance of pSet is returned so that it can be union-joined with the pSet instance inside the calling-function one level above. The method continues in this fashion until pSet at the top-most level of recursion is fully populated, at which point output is returned in response to the initial method call.

[Phew..!]

Let's give it a whirl and gauge its efficiency.

In [84]:
tStart = time.time()
pSet_rec = clumsyPrez.powerSet_rec()
duration = time.time() - tStart
print('The Recursive approach took %0.6f seconds to complete.'%(duration))
print('The power set contains %i subset elements'%(len(pSet_rec.val)))
pSet_rec.val
The Recursive approach took 0.006541 seconds to complete.
The power set contains 32 subset elements
Out[84]:
{set(),
 {'fight like hell', ('impeachment', '#2'), 2021, 6.1},
 {'fight like hell', ('impeachment', '#2'), 2021},
 {'fight like hell', 2021, 6.1},
 {'fight like hell', 2021},
 {'fight like hell', 6.1},
 {'fight like hell'},
 {'we love you', 'fight like hell', 6.1},
 {'we love you', 'fight like hell'},
 {'we love you', ('impeachment', '#2'), 'fight like hell', 6.1},
 {'we love you', ('impeachment', '#2'), 'fight like hell'},
 {'we love you', ('impeachment', '#2'), 2021, 'fight like hell'},
 {'we love you', ('impeachment', '#2'), 2021, 6.1},
 {'we love you', ('impeachment', '#2'), 2021},
 {'we love you', ('impeachment', '#2'), 6.1},
 {'we love you', ('impeachment', '#2')},
 {'we love you', 2021, 'fight like hell'},
 {'we love you', 2021, 6.1, 'fight like hell'},
 {'we love you', 2021, 6.1},
 {'we love you', 2021},
 {'we love you', 6.1},
 {'we love you'},
 {('impeachment', '#2'), 'fight like hell', 6.1},
 {('impeachment', '#2'), 'fight like hell'},
 {('impeachment', '#2'), 2021, 6.1},
 {('impeachment', '#2'), 2021},
 {('impeachment', '#2'), 6.1},
 {('impeachment', '#2')},
 {2021, 6.1, 'we love you', ('impeachment', '#2'), 'fight like hell'},
 {2021, 6.1},
 {2021},
 {6.1}}

As promised, we see the same power set as we saw from the Binary Mask approach. But unlike that approach, there is no mask output for us to examine this time.

Quick-and-dirty efficiency comparison

For a given programming objective, recursive algorithms tend to be less efficient than their iterative equivalents when used in imperative-based languages, like Python, where iteration is preferred3. Below we hobbled together a quick script to measure, and statistically compare, processing times of the two methods of power set generation of clumsyPrez.

In [91]:
import numpy as np
import matplotlib.pyplot as plt

# routine to repeatedly collect run-time durations
# of function genFunc
def genDurationsDist(genFunc, nReps):
    durations = np.zeros(nReps)
    for i in np.arange(0, nReps):
        tStart = time.time()
        genFunc()
        durations[i] = time.time() - tStart
    return durations
durations_bin = genDurationsDist(clumsyPrez.powerSet_bin, 500) # generate dist. of run-times
                                                               # of Binary Mask ps generator 
durations_rec = genDurationsDist(clumsyPrez.powerSet_rec, 500) # generate dist. of run-times
                                                               # of Recursive ps generator
# Plot overlapping histograms of the two distributions
plt.hist([durations_bin, durations_rec], label=['Bin. Mask', 'Recursive'])
plt.title('Distribs. of clumsyPrez power set computation times')
plt.xlabel('duration (sec.)')
plt.ylabel('counts')
plt.legend()
Out[91]:
<matplotlib.legend.Legend at 0x7fba1b76b860>

The two distributions certainly do not appear to be normally distributed. Below, we import and run the Mann-Whitney non-parametric comparison test to assess if the difference between the two reaches statistical significance.

In [99]:
from scipy.stats import mannwhitneyu as mwu
bin_med = np.median(durations_bin) # compute median duration of Bin. Mask ps durations
rec_med = np.median(durations_rec) # compute median duration of Recursive ps durations
print('Median Bin. Mask duration: %0.5f sec., Median Recursive duration: %0.5f sec.'\
      %(bin_med, rec_med))
# Run non-parametric test to determine if differences between 
# distributions are statistically significant
stat, pval = mwu(durations_bin, durations_rec)
print('Mann-Whitney U statistic: %0.2f, p-value: %0.2e'%(stat, pval))
Median Bin. Mask duration: 0.00205 sec., Median Recursive duration: 0.00461 sec.
Mann-Whitney U statistic: 851.00, p-value: 4.75e-163

Measuring the time elapsed for each of the two approaches, we find that, on average, our recursive algorithm does indeed take about twice the time to complete on our machine as the binary-mask method (461 ms versus 205 ms, U = 851., p < 0.01). Of course, ours is certainly not a very controlled comparison; so please take the result with "a grain of salt" if you will.

Summary

In part one we discussed power sets and their relation to the binary theorem. After some proding, we discovered that Python's set() class could not accomodate our need to include sets within sets. To solve the problem, we introduced a custom container class with a set-valued attribute, whose instances are "hashable" and thus, eligible for membership in set objects.

Starting with the container strategy from part one, in part two we expanded our custom class, ModSet(), to include methods that perform copying, nesting, uniqueness-filtering, set-union and set-difference operations as well as others. Several of these additional methods would prove necessary for both methods of power set generation to follow.

In part three, we realized our goal of generating power sets. First, we detailed program flow of both the Binary Mask and Recursive approaches of power set generation. The mask routine used list indexing and took more lines of code than recursive method, but called fewer class methods. The Recursive approach was shorter in terms of lines of code because it incorporated more sibling methods of the ModSet() class. Explaining its program flow was "interesting" to the least; with effort on the part of the reader, it might strengthen understanding of recursive algorithms. Without effort, it has potential to confuse. Finally, we compared relative run times of the two approaches. It turned out that, as expected, the Binary Mask routine took less time to run than its recursive counterpart.

Creative Commons License
The ModSet() class by nullexit.org is licensed under a Creative Commons Attribution 4.0 International License.
In [64]:
class ModSet():
    
    # Instance constructor
    def __init__(self, setElement):
        self.val = setElement  # set arg saved in .val attribute
    
    # String eval representation of self instance 
    def __repr__(self):
        return self.val.__repr__() # Return string eval represent.
                                   # of string object in .val
    
    # Method to make a copy of self instance
    def __copy__(self):
        return ModSet(self.val)
    
    # Modify .val to contain the set of itself, of itself, ...
    # nesting .val "depth" number of levels.
    def pushDown(self, depth):
        while depth > 0:
            self.val = set([ModSet(self.val)])
            depth -= 1     
    
    # Remove one nesting level from set in self.val.
    # If un-nested, ignore.
    def pullUpOneLevel(self):
        listSet = list(self.val)
        if len(listSet) == 1:
            self.val = listSet[0].val
        else:
            pass
    
    # Remove "height" nesting levels from set in
    # self.val by repeatedly calling above method
    def pullUp(self, height):
        while height > 0:
            self.pullUpOneLevel()
            height -= 1
    
    # Within a single set, multiple ModSets with 
    # equivalent .val attributes can exist.  
    # This can occur because each instance of ModSet
    # allocated its own location in memory. For sets
    # to have unique members in terms of ModSet.val
    # values, we define the following filtering method.
    def removeDuplicates(self):
        uniqueSet = set()  # initialize as empty the unique set to be built
        for s in self.val:  # s is a member of the ModSet().val set
            inUniqueSet = False  # initialize match detetion flag as false
            sTest = s  # default conditional testing value for s
            for us in uniqueSet: # us is a member of the uniqueSet set
                usTest = us  # default conditional testing value for us
                if isinstance(us, ModSet): # if member us is a ModSet
                    usTest = us.val        # change testing value to its attribute
                if isinstance(s, ModSet):  # if member s is a ModSet
                    sTest = s.val          # change testing value to its attribute
                if usTest == sTest:  # compare us and s testing values on this run
                    inUniqueSet = True  # if match, set existence flag to true
            if not inUniqueSet:    # only add member s to uniqueSet if
                uniqueSet.add(s)   # match is NOT detected
        self.val = uniqueSet    # set .val to the uniqueSet from above
    
    # union join multiple items, enforce that ModSet members be unique
    def union(self, *modSets):
        for modSet in modSets:
            self.val = self.val.union(modSet.val)  # this is union method from set class
            self.removeDuplicates()  # removes duplicate-valued instances of ModSet members
    
    # set intersect multiple items
    def intersection(self, *modSets):
        for modSet in modSets:
            self.val = self.val.intersection(modSet.val)
    
    # set difference multiple items. note: arg order matters here! 
    def difference(self, *modSets):
        for modSet in modSets:
            self.val = self.val.difference(modSet.val)
    
    # version of above method that returns a new instance,
    # only takes two arguments.
    def diffFunc(modSet1, modSet2):
        return ModSet(set.difference(modSet1.val, modSet2.val))
    
    # Generate powerset via direct, binary mask approach
    def powerSet_bin(self):
        
        ## Initialize local variables ##
        S = list(self.val)         # convert to list for indexing
        setSize = len(self.val)    # count number of members in source set
        psetSize = pow(2, setSize) # calculate the number of elements in the power set
        lastIndex = setSize - 1    # index value of last member
        setIndices = range(0, setSize)  # make indices list for source set
        psetIndices = range(0, psetSize) # make indices list for power set to be built
        bMasks = [[False for s in setIndices] for p in psetIndices] # Initialize binary mask       
        pSet = ModSet(set())  # initialize power set as empty ModSet() instance
        pSet.pushDown(1)  # and nest it down one level for later joining
        
        ## Populate powerset with each subset, one at a time ##
        for i in psetIndices: # loop through each member of power set
            
            ## Generate binary mask for subset i of power set ##
            diff = i  # assign current pSet index as initial bit-flip threshold
            for j in setIndices: # loop through each digit of mask
                
                if (diff >= pow(2, lastIndex - j)): # if threshold >= highest digit, then
                    bMasks[i][lastIndex - j] = True  # set corresp. mask digit to "true"
                    diff -= pow(2, lastIndex - j)   # reduce threshold for next mask digit
                    
            ## Form subset i of power set ##
            # Use generator expression for compactness
            dummySet = ModSet(set([S[k] for k in setIndices if bMasks[i][k] == True]))
            dummySet.pushDown(1)  # nest ModSet instance down one level for union join
            pSet.union(dummySet)  # include new subset in power set
            
        return pSet, bMasks  # return complete power set and binary masks as output
    
    # Generate powerset recursively.
    def powerSet_rec(self):
        
        pSet = self.__copy__()      # Preserve self instance; its copy, pSet
                                    # will be altered
        pSet.pushDown(1)            # Nest pSet for later joining.
              
        if len(self.val) > 0:       # Recursion termination condition
            for elSet in self.val:  # Iterate through members of set self.val
                # Generate subset that remains after removing current
                # element, elSet, from set self.val
                dummySet = self.diffFunc(ModSet(set([elSet]))) 
                # To current powerset, append the powerset of the
                # subset from previous step
                pSet.union(dummySet.powerSet_rec())  # Self-call powerset method,
                                                     # union join powerset of 
                                                     # dummySet with pSet
            return pSet             # Return powerset at current 
                                    # level of recursion
        else:
            dummySet = ModSet(set())  # Generate instance of ModSet of empty set
            dummySet.pushDown(1)      # Nest empty set down one level so it can
            return dummySet           # be union joined in the recursion level
                                      # above that called this run-through.