8. Appendices¶
8.1. Rules for Math¶
This section describes what we need to know to effectively use bitmath
for arithmetic. Because bitmath allows the use of instances as
operands on either side of the operator it is especially important to
understand their behavior. Just as in normal every-day math, not all
operations yield the same result if the operands are switched. E.g.,
1 - 2 = -1
whereas 2 - 1 = 1
.
This section includes discussions of the results for each supported mixed math operation. For mixed math operations (i.e., an operation with a bitmath instance and a number type), implicit coercion may happen. That is to say, a bitmath instance will be converted to a number type.
When coercion happens is determined by the following conditions and rules:
- Precedence and Associativity of Operators in Python[1]
- Situational semantics – some operations, though mathematically valid, do not make logical sense when applied to context.
8.1.1. Terminology¶
The definitions describes some of the terminology used throughout this section.
- Coercion
The act of converting operands into a common type to support arithmetic operations. Somewhat similar to how adding two fractions requires coercing each operand into having a common denominator.
Specific to the bitmath domain, this concept means using an instance’s prefix value for mixed-math.
- Operand
- The object(s) of a mathematical operation. That is to say, given
1 + 2
, the operands would be 1 and 2. - Operator
- The mathematical operation to evaluate. Given
1 + 2
, the operation would be addition, +. - LHS
- Left-hand side. In discussion this specifically refers to the operand on the left-hand side of the operator.
- RHS
- Right-hand side. In discussion this specifically refers to the operand on the right-hand side of the operator.
8.1.2. Two bitmath operands¶
This section describes what happens when two bitmath instances are used as operands. There are three possible results from this type of operation.
- Addition and subtraction
- The result will be of the type of the LHS.
- Multiplication
- Supported, but yields strange results.
1 2 3 4 5 6 7 8 9 | In [10]: first = MiB(5)
In [11]: second = kB(2)
In [12]: first * second
Out[12]: MiB(10000.0)
In [13]: (first * second).best_prefix()
Out[13]: GiB(9.765625)
|
As we can see on lines 6 and 9, multiplying even two
relatively small quantities together (MiB(5)
and kB(2)
) yields
quite large results.
Internally, this is implemented as:
- Division
- The result will be a number type due to unit cancellation.
8.1.3. Mixed Types: Addition and Subtraction¶
This describes the behavior of addition and subtraction operations where one operand is a bitmath type and the other is a number type.
Mixed-math addition and subtraction always return a type from the
numbers
family (integer, float, long, etc…). This rule is
true regardless of the placement of the operands, with respect to the
operator.
Discussion: Why do 100 - KiB(90)
and KiB(100) - 90
both
yield a result of 10.0
and not another bitmath instance, such as
KiB(10.0)
?
When implementing the math part of the object datamodel customizations[2] there were two choices available:
- Offer no support at all. Instead raise a
NotImplemented
exception. - Consistently apply coercion to the bitmath operand to produce a useful result (useful if you know the rules of the library).
In the end it became a philosophical decision guided by scientific precedence.
Put simply, bitmath uses the significance of the least significant
operand, specifically the number-type operand because it lacks
semantic significance. In application this means that we drop the
semantic significance of the bitmath operand. That is to say, given an
input like GiB(13.37)
(equivalent to == 13.37 * 230), the
only property used in calculations is the prefix value, 13.37
.
Numbers carry mathematical significance, in the form of precision, but what they lack is semantic (contextual) significance. A number by itself is just a measurement of an arbitrary quantity of stuff. In mixed-type math, bitmath effectively treats numbers as mathematical constants.
A bitmath instance also has mathematical significance in that an instance is a measurement of a quantity (bits in this case) and that quantity has a measurable precision. But a bitmath instance is more than just a measurement, it is a specialized representation of a count of bits. This gives bitmath instances semantic significance as well.
And so, in deciding how to handle mixed-type (really what we should say is mixed-significance) operations, we chose to model the behavior off of an already established set of rules. Those rules are the Rules of Significance Arithmetic[3].
Let’s look at an example of this in action:
In [8]: num = 42
In [9]: bm = PiB(24)
In [10]: print num + bm
66.0
Equivalently, divorcing the bitmath instance from it’s value (this is coercion):
In [12]: bm_value = bm.value
In [13]: print num + bm_value
66.0
What it all boils down to is this: if we don’t provide a unit then bitmath won’t give us one back. There is no way for bitmath to guess what unit the operand was intended to carry. Therefore, the behavior of bitmath is conservative. It will meet us half way and do the math, but it will not return a unit in the result.
8.1.4. Mixed Types: Multiplication and Division¶
Multiplication has commutative properties. This means that the ordering of the operands is not significant. Because of this fact bitmath allows arbitrary placement of the operands, treating the numeric operand as a constant. Here’s an example demonstrating this.
In [2]: 10 * KiB(43)
Out[2]: KiB(430.0)
In [3]: KiB(43) * 10
Out[3]: KiB(430.0)
Division, however, does not have this commutative
property. I.e., the placement of the operands is
significant. Additionally, there is a semantic difference in
division. Dividing a quantity (e.g. MiB(100)
) by a constant
(10
) makes complete sense. Conceptually (in the domain of
bitmath), the intention of MiB(100) / 10)
is to separate
MiB(10)
into 10 equal sized parts.
In [4]: KiB(43) / 10
Out[4]: KiB(4.2998046875)
The reverse operation does not maintain semantic validity. Stated differently, it does not make logical sense to divide a constant by a measured quantity of stuff. If you’re still not clear on this, ask yourself what you would expect to get if you did this:
8.1.5. Footnotes¶
[1] | For a less technical review of precedence and associativity, see Programiz: Precedence and Associativity of Operators in Python |
[2] | Python Datamodel Customization Methods |
[3] | https://en.wikipedia.org/wiki/Significance_arithmetic |
8.2. On Units¶
As previously stated, in this module you will find two very similar sets of classes available. These are the NIST and SI prefixes. The NIST prefixes are all base 2 and have an ‘i’ character in the middle. The SI prefixes are base 10 and have no ‘i’ character.
For smaller values, these two systems of unit prefixes are roughly
equivalent. The round()
operations below demonstrate how close in
a percent one “unit” of SI is to one “unit” of NIST.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | In [15]: one_kilo = 1 * 10**3
In [16]: one_kibi = 1 * 2**10
In [17]: round(one_kilo / float(one_kibi), 2)
Out[17]: 0.98
In [18]: one_tera = 1 * 10**12
In [19]: one_tebi = 1 * 2**40
In [20]: round(one_tera / float(one_tebi), 2)
Out[20]: 0.91
In [21]: one_exa = 1 * 10**18
In [22]: one_exbi = 1 * 2**60
In [23]: round(one_exa / float(one_exbi), 2)
Out[23]: 0.87
|
They begin as roughly equivalent, however as you can see (lines: 7, 15, and 23), they diverge significantly for higher values.
Why two unit systems? Why take the time to point this difference out? Why should you care? The Linux Documentation Project comments on that:
Before these binary prefixes were introduced, it was fairly common to use k=1000 and K=1024, just like b=bit, B=byte. Unfortunately, the M is capital already, and cannot be capitalized to indicate binary-ness.
At first that didn’t matter too much, since memory modules and disks came in sizes that were powers of two, so everyone knew that in such contexts “kilobyte” and “megabyte” meant 1024 and 1048576 bytes, respectively. What originally was a sloppy use of the prefixes “kilo” and “mega” started to become regarded as the “real true meaning” when computers were involved. But then disk technology changed, and disk sizes became arbitrary numbers. After a period of uncertainty all disk manufacturers settled on the standard, namely k=1000, M=1000k, G=1000M.
The situation was messy: in the 14k4 modems, k=1000; in the 1.44MB diskettes, M=1024000; etc. In 1998 the IEC approved the standard that defines the binary prefixes given above, enabling people to be precise and unambiguous.
Thus, today, MB = 1000000B and MiB = 1048576B.
In the free software world programs are slowly being changed to conform. When the Linux kernel boots and says:
hda: 120064896 sectors (61473 MB) w/2048KiB Cachethe MB are megabytes and the KiB are kibibytes.
- Source:
man 7 units
- http://man7.org/linux/man-pages/man7/units.7.html
Furthermore, to quote the National Institute of Standards and Technology (NIST):
“Once upon a time, computer professionals noticed that 210 was very nearly equal to 1000 and started using the SI prefix “kilo” to mean 1024. That worked well enough for a decade or two because everybody who talked kilobytes knew that the term implied 1024 bytes. But, almost overnight a much more numerous “everybody” bought computers, and the trade computer professionals needed to talk to physicists and engineers and even to ordinary people, most of whom know that a kilometer is 1000 meters and a kilogram is 1000 grams.
“Then data storage for gigabytes, and even terabytes, became practical, and the storage devices were not constructed on binary trees, which meant that, for many practical purposes, binary arithmetic was less convenient than decimal arithmetic. The result is that today “everybody” does not “know” what a megabyte is. When discussing computer memory, most manufacturers use megabyte to mean 220 = 1 048 576 bytes, but the manufacturers of computer storage devices usually use the term to mean 1 000 000 bytes. Some designers of local area networks have used megabit per second to mean 1 048 576 bit/s, but all telecommunications engineers use it to mean 106 bit/s. And if two definitions of the megabyte are not enough, a third megabyte of 1 024 000 bytes is the megabyte used to format the familiar 90 mm (3 1/2 inch), “1.44 MB” diskette. The confusion is real, as is the potential for incompatibility in standards and in implemented systems.
“Faced with this reality, the IEEE Standards Board decided that IEEE standards will use the conventional, internationally adopted, definitions of the SI prefixes. Mega will mean 1 000 000, except that the base-two definition may be used (if such usage is explicitly pointed out on a case-by-case basis) until such time that prefixes for binary multiples are adopted by an appropriate standards body.”
8.3. Who uses Bitmath¶
Shout-outs to all of the bitmath adopters out there I was able to identify:
- ClusterHQ’s “Flocker”. A data volume manager for Dockerized applications
- VMware’s vsphere flocker storage driver
- EMC’s scaleio flocker storage driver
- Dell Storage’s storage center block device flocker driver
- TravelCRM - Free CRM for travel companies - Bitbucket
- direscraw by Brian Mikolajczyk for recovering lost files
- sizer by Calle Liljeholm. Calculating useable capacity in a flocker cluster