8. Appendices¶

Rules for Math
On Units
Who uses Bitmath
Related Projects

8.1. Rules for Math ¶

This section describes what we need to know to effectively use bitmath for arithmetic. Because bitmath allows the use of instances as operands on either side of the operator it is especially important to understand their behavior. Just as in normal every-day math, not all operations yield the same result if the operands are switched. E.g., 1 - 2 = -1 whereas 2 - 1 = 1.

This section includes discussions of the results for each supported mixed math operation. For mixed math operations (i.e., an operation with a bitmath instance and a number type), implicit coercion may happen. That is to say, a bitmath instance will be converted to a number type.

When coercion happens is determined by the following conditions and rules:

Precedence and Associativity of Operators in Python[1]
Situational semantics – some operations, though mathematically valid, do not make logical sense when applied to context.

8.1.1. Terminology ¶

The definitions describes some of the terminology used throughout this section.

Coercion

The act of converting operands into a common type to support arithmetic operations. Somewhat similar to how adding two fractions requires coercing each operand into having a common denominator.

Specific to the bitmath domain, this concept means using an instance’s prefix value for mixed-math.

Operand

The object(s) of a mathematical operation. That is to say, given 1 + 2, the operands would be 1 and 2.

Operator

The mathematical operation to evaluate. Given 1 + 2, the operation would be addition, +.

LHS

Left-hand side. In discussion this specifically refers to the operand on the left-hand side of the operator.

RHS

Right-hand side. In discussion this specifically refers to the operand on the right-hand side of the operator.

8.1.2. Two bitmath operands ¶

This section describes what happens when two bitmath instances are used as operands. There are three possible results from this type of operation.

Addition and subtraction: The result will be of the type of the LHS.
Multiplication: Supported, but yields strange results.

In [10]: first = MiB(5)

In [11]: second = kB(2)

In [12]: first * second
Out[12]: MiB(10000.0)

In [13]: (first * second).best_prefix()
Out[13]: GiB(9.765625)

As we can see on lines 6 and 9, multiplying even two relatively small quantities together (MiB(5) and kB(2)) yields quite large results.

Internally, this is implemented as:

\[ \begin{align}\begin{aligned}(5 \cdot 2^{20}) \cdot (2 \cdot 10^{3}) = 10,485,760,000 B\\10,485,760,000 B \cdot \dfrac{1 MiB}{1,048,576 B} = 10,000 MiB\end{aligned}\end{align} \]

Division: The result will be a number type due to unit cancellation.

8.1.3. Mixed Types: Addition and Subtraction ¶

This describes the behavior of addition and subtraction operations where one operand is a bitmath type and the other is a number type.

Mixed-math addition and subtraction always return a type from the numbers family (integer, float, long, etc…). This rule is true regardless of the placement of the operands, with respect to the operator.

Discussion: Why do 100 - KiB(90) and KiB(100) - 90 both yield a result of 10.0 and not another bitmath instance, such as KiB(10.0)?

When implementing the math part of the object datamodel customizations[2] there were two choices available:

Offer no support at all. Instead raise a NotImplemented exception.
Consistently apply coercion to the bitmath operand to produce a useful result (useful if you know the rules of the library).

In the end it became a philosophical decision guided by scientific precedence.

Put simply, bitmath uses the significance of the least significant operand, specifically the number-type operand because it lacks semantic significance. In application this means that we drop the semantic significance of the bitmath operand. That is to say, given an input like GiB(13.37) (equivalent to == 13.37 * 2³⁰), the only property used in calculations is the prefix value, 13.37.

Numbers carry mathematical significance, in the form of precision, but what they lack is semantic (contextual) significance. A number by itself is just a measurement of an arbitrary quantity of stuff. In mixed-type math, bitmath effectively treats numbers as mathematical constants.

A bitmath instance also has mathematical significance in that an instance is a measurement of a quantity (bits in this case) and that quantity has a measurable precision. But a bitmath instance is more than just a measurement, it is a specialized representation of a count of bits. This gives bitmath instances semantic significance as well.

And so, in deciding how to handle mixed-type (really what we should say is mixed-significance) operations, we chose to model the behavior off of an already established set of rules. Those rules are the Rules of Significance Arithmetic[3].

Let’s look at an example of this in action:

In [8]: num = 42

In [9]: bm = PiB(24)

In [10]: print num + bm
66.0

Equivalently, divorcing the bitmath instance from it’s value (this is coercion):

In [12]: bm_value = bm.value

In [13]: print num + bm_value
66.0

What it all boils down to is this: if we don’t provide a unit then bitmath won’t give us one back. There is no way for bitmath to guess what unit the operand was intended to carry. Therefore, the behavior of bitmath is conservative. It will meet us half way and do the math, but it will not return a unit in the result.

8.1.4. Mixed Types: Multiplication and Division ¶

Multiplication has commutative properties. This means that the ordering of the operands is not significant. Because of this fact bitmath allows arbitrary placement of the operands, treating the numeric operand as a constant. Here’s an example demonstrating this.

In [2]: 10 * KiB(43)
Out[2]: KiB(430.0)

In [3]: KiB(43) * 10
Out[3]: KiB(430.0)

Division, however, does not have this commutative property. I.e., the placement of the operands is significant. Additionally, there is a semantic difference in division. Dividing a quantity (e.g. MiB(100)) by a constant (10) makes complete sense. Conceptually (in the domain of bitmath), the intention of MiB(100) / 10) is to separate MiB(10) into 10 equal sized parts.

In [4]: KiB(43) / 10
Out[4]: KiB(4.2998046875)

The reverse operation does not maintain semantic validity. Stated differently, it does not make logical sense to divide a constant by a measured quantity of stuff. If you’re still not clear on this, ask yourself what you would expect to get if you did this:

\[\dfrac{100}{kB(33)} = x\]

8.1.5. Footnotes ¶

[1]	For a less technical review of precedence and associativity, see Programiz: Precedence and Associativity of Operators in Python

[2]	Python Datamodel Customization Methods

[3]	https://en.wikipedia.org/wiki/Significance_arithmetic

8.2. On Units ¶

As previously stated, in this module you will find two very similar sets of classes available. These are the NIST and SI prefixes. The NIST prefixes are all base 2 and have an ‘i’ character in the middle. The SI prefixes are base 10 and have no ‘i’ character.

For smaller values, these two systems of unit prefixes are roughly equivalent. The round() operations below demonstrate how close in a percent one “unit” of SI is to one “unit” of NIST.

In [15]: one_kilo = 1 * 10**3

In [16]: one_kibi = 1 * 2**10

In [17]: round(one_kilo / float(one_kibi), 2)

Out[17]: 0.98

In [18]: one_tera = 1 * 10**12

In [19]: one_tebi = 1 * 2**40

In [20]: round(one_tera / float(one_tebi), 2)

Out[20]: 0.91

In [21]: one_exa = 1 * 10**18

In [22]: one_exbi = 1 * 2**60

In [23]: round(one_exa / float(one_exbi), 2)

Out[23]: 0.87

They begin as roughly equivalent, however as you can see (lines: 7, 15, and 23), they diverge significantly for higher values.

Why two unit systems? Why take the time to point this difference out? Why should you care? The Linux Documentation Project comments on that:

Before these binary prefixes were introduced, it was fairly common to use k=1000 and K=1024, just like b=bit, B=byte. Unfortunately, the M is capital already, and cannot be capitalized to indicate binary-ness.

At first that didn’t matter too much, since memory modules and disks came in sizes that were powers of two, so everyone knew that in such contexts “kilobyte” and “megabyte” meant 1024 and 1048576 bytes, respectively. What originally was a sloppy use of the prefixes “kilo” and “mega” started to become regarded as the “real true meaning” when computers were involved. But then disk technology changed, and disk sizes became arbitrary numbers. After a period of uncertainty all disk manufacturers settled on the standard, namely k=1000, M=1000k, G=1000M.

The situation was messy: in the 14k4 modems, k=1000; in the 1.44MB diskettes, M=1024000; etc. In 1998 the IEC approved the standard that defines the binary prefixes given above, enabling people to be precise and unambiguous.

Thus, today, MB = 1000000B and MiB = 1048576B.

In the free software world programs are slowly being changed to conform. When the Linux kernel boots and says:
hda: 120064896 sectors (61473 MB) w/2048KiB Cache
the MB are megabytes and the KiB are kibibytes.

Source: man 7 units - http://man7.org/linux/man-pages/man7/units.7.html

Furthermore, to quote the National Institute of Standards and Technology (NIST):

“Once upon a time, computer professionals noticed that 2¹⁰ was very nearly equal to 1000 and started using the SI prefix “kilo” to mean 1024. That worked well enough for a decade or two because everybody who talked kilobytes knew that the term implied 1024 bytes. But, almost overnight a much more numerous “everybody” bought computers, and the trade computer professionals needed to talk to physicists and engineers and even to ordinary people, most of whom know that a kilometer is 1000 meters and a kilogram is 1000 grams.

“Then data storage for gigabytes, and even terabytes, became practical, and the storage devices were not constructed on binary trees, which meant that, for many practical purposes, binary arithmetic was less convenient than decimal arithmetic. The result is that today “everybody” does not “know” what a megabyte is. When discussing computer memory, most manufacturers use megabyte to mean 2²⁰ = 1 048 576 bytes, but the manufacturers of computer storage devices usually use the term to mean 1 000 000 bytes. Some designers of local area networks have used megabit per second to mean 1 048 576 bit/s, but all telecommunications engineers use it to mean 106 bit/s. And if two definitions of the megabyte are not enough, a third megabyte of 1 024 000 bytes is the megabyte used to format the familiar 90 mm (3 1/2 inch), “1.44 MB” diskette. The confusion is real, as is the potential for incompatibility in standards and in implemented systems.

“Faced with this reality, the IEEE Standards Board decided that IEEE standards will use the conventional, internationally adopted, definitions of the SI prefixes. Mega will mean 1 000 000, except that the base-two definition may be used (if such usage is explicitly pointed out on a case-by-case basis) until such time that prefixes for binary multiples are adopted by an appropriate standards body.”

8.3. Who uses Bitmath ¶

Shout-outs to all of the bitmath adopters out there I was able to identify:

ClusterHQ’s “Flocker”. A data volume manager for Dockerized applications
VMware’s vsphere flocker storage driver
EMC’s scaleio flocker storage driver
Dell Storage’s storage center block device flocker driver
TravelCRM - Free CRM for travel companies - Bitbucket
direscraw by Brian Mikolajczyk for recovering lost files
sizer by Calle Liljeholm. Calculating useable capacity in a flocker cluster

8.4. Related Projects ¶

Bitmath is not the first project to tackle a challenge of this nature, handling units in a sane OOP approach. Several other Python libraries exist which provide similar functionality to bitmath. It only seems fair that we should point out these other libraries in case bitmath isn’t the best fit for you.

8.4.1. Magnitude ¶

Magnitude implements efficient computation with physical quantities. It allows you to do mathematical operations with them as if they were numbers, taking care of the units behind the scenes.

Magnitide, from Juan Reyero, is a very extensible library for working with a large variety of units (e.g., mile = one mile), as well as derived units (e.g., mile/hour). Scaling, such as indicating one mega byte (1 MB) is also programmable with Magnitude. Juan is also kind enough to include a similar “related projects” section in his documentation.

8.4.2. hurry.filesize ¶

hurry.filesize a simple Python library that can take a number of bytes and returns a human-readable string with the size in it, in kilobytes (K), megabytes (M), etc.

hurry.filesize is very limited in functionality when compared to the other alternatives. However, it is an extremely simple and lightweight module. If you’re looking for a library just for turning counts of bytes into human-readable strings, then hurry.filesize will be great for you.

If you need any more functionality, such as greater control over output formatting, or arithmetic calculations, then you will find hurry.filesize lacking. This project has not updated since 2009, so I would not expect to see updates any time soon.

PyPi Homepage & Download

8.4.3. SymPy - Units ¶

This module provides around 200 predefined units that are commonly used in the sciences. Additionally, it provides the ``Unit`` class which allows you to define your own units.

The Units module from the SymPy library is another option. Like Magnitude, the Units library is very extensible and includes around 200 built-in units by default. While technically it supports handling quantities such as 1337 PiB, this support must be configured by the user.

In contrast, the bitmath module includes classes representing the full spectrum of byte and bit based units, out of the box. No conversion or derivation code required of the user.

Units Homepage & Docs
Download available through pip, or your distribution’s package system

8.4.4. Unum ¶

Unum stands for ‘unit-numbers’. It is a Python module that allows to define and manipulate true quantities, i.e. numbers with units such as 60 seconds, […], 30 dollars etc. The module validates unit consistency in arithmetic expressions; it provides also automatic conversion and output formatting. Unum is designed to be reliable, easy-to-use, customizable and open to any unit definition.

Unum, by Pierre X. Denis, is another extensible library for unit manipulation. The module does not appear to have seen any activity in quite some time. Looking over the docs gives me the impression that it also has a tendency to pollute your namespace with objects like M and anything else it pre-defines.