9. Appendices
9.1. Rules for Math
This section describes what we need to know to effectively use bitmath
for arithmetic. Because bitmath allows the use of instances as
operands on either side of the operator it is especially important to
understand their behavior. Just as in normal every-day math, not all
operations yield the same result if the operands are switched. E.g.,
1 - 2 = -1 whereas 2 - 1 = 1.
This section includes discussions of the results for each supported mixed math operation. For mixed math operations (i.e., an operation with a bitmath instance and a number type), implicit coercion may happen. That is to say, a bitmath instance will be converted to a number type.
When coercion happens is determined by the following conditions and rules:
Situational semantics – some operations, though mathematically valid, do not make logical sense when applied to context.
9.1.1. Terminology
The definitions describes some of the terminology used throughout this section.
- Coercion
The act of converting operands into a common type to support arithmetic operations. Somewhat similar to how adding two fractions requires coercing each operand into having a common denominator.
Specific to the bitmath domain, this concept means using an instance’s prefix value for mixed-math.
- Operand
The object(s) of a mathematical operation. That is to say, given
1 + 2, the operands would be 1 and 2.- Operator
The mathematical operation to evaluate. Given
1 + 2, the operation would be addition, +.- LHS
Left-hand side. In discussion this specifically refers to the operand on the left-hand side of the operator.
- RHS
Right-hand side. In discussion this specifically refers to the operand on the right-hand side of the operator.
9.1.2. Two bitmath operands
This section describes what happens when two bitmath instances are used as operands. There are three possible results from this type of operation.
- Addition and subtraction
The result will be of the type of the LHS.
- Multiplication
Supported, but yields results which may not be intuitive. The math is performed at the byte-level (ignoring prefix units). The result is in the unit of the LHS. Technically speaking, if the LHS of the equation is a byte unit, then the result should be squared. You are advised to call the
best_prefix()method on the result to get a useful value back.
>>> bitmath.kB(3).bytes, bitmath.MiB(5).bytes
(3000.0, 5242880.0)
>>> bitmath.best_prefix(3000 * 5242880)
GiB(14.6484375)
>>> bitmath.Byte(3000) * bitmath.MiB(5)
B(15728640000.0)
>>> (bitmath.Byte(3000) * bitmath.MiB(5)).best_prefix()
GiB(14.6484375)
The final result represents how many “GiB-sized” chunks of bytes are in that total. It’s weird, but it works.
If the LHS is larger than a byte prefix unit then you get a result matching the unit of the LHS of the equation.
>>> bitmath.MiB(5) * bitmath.kB(3)
MiB(15000.0)
# LHS was MiB, result is MiB
# And if we wrap it with best_prefix() we get the earlier result back
>>> (bitmath.MiB(5) * bitmath.kB(3)).best_prefix()
GiB(14.6484375)
- Division
The result will be a number type due to unit cancellation.
>>> bitmath.kB(3) / bitmath.MiB(5)
0.00057220458984375
>>> bitmath.kB(3).bytes / bitmath.MiB(5).bytes
0.00057220458984375
Above you can see that the math is performed on the bytes of each operand. As noted above, the units cancel out. This is the opposite of the multiplication case where the units square together if you do not coerce them into a larger prefix unit.
9.1.3. Mixed Types: Addition and Subtraction
This describes the behavior of addition and subtraction operations where one operand is a bitmath type and the other is a number type.
Mixed-math addition and subtraction return a type from the
numbers family (integer, float, etc…) regardless of the
placement of the operands, with one exception: when the left operand
is exactly 0, the result is the bitmath instance itself.
This exception exists so that Python’s built-in sum()
function works correctly with iterables of bitmath objects, since
sum() starts accumulation from 0 by default:
>>> import bitmath
>>> sum([bitmath.Byte(1), bitmath.MiB(1), bitmath.GiB(1)])
Byte(1074790401.0)
For all non-zero numeric operands the behavior (returning a number) applies.
Discussion: Why do 100 - KiB(90) and KiB(100) - 90 both
yield a result of 10.0 and not another bitmath instance, such as
KiB(10.0)?
When implementing the math part of the object datamodel customizations[2] there were two choices available:
Offer no support at all. Instead raise a
NotImplementedexception.Consistently apply coercion to the bitmath operand to produce a useful result (useful if you know the rules of the library).
In the end it became a philosophical decision guided by scientific precedence.
Put simply, bitmath uses the significance of the least significant
operand, specifically the number-type operand because it lacks
semantic significance. In application this means that we drop the
semantic significance of the bitmath operand. That is to say, given an
input like GiB(13.37) (equivalent to == 13.37 * 230), the
only property used in calculations is the prefix value, 13.37.
Numbers carry mathematical significance, in the form of precision, but what they lack is semantic (contextual) significance. A number by itself is just a measurement of an arbitrary quantity of stuff. In mixed-type math, bitmath effectively treats numbers as mathematical constants.
A bitmath instance also has mathematical significance in that an instance is a measurement of a quantity (bits in this case) and that quantity has a measurable precision. But a bitmath instance is more than just a measurement, it is a specialized representation of a count of bits. This gives bitmath instances semantic significance as well.
And so, in deciding how to handle mixed-type (really what we should say is mixed-significance) operations, we chose to model the behavior off of an already established set of rules. Those rules are the Rules of Significance Arithmetic[3].
Let’s look at an example of this in action:
In [8]: num = 42
In [9]: bm = PiB(24)
In [10]: print(num + bm)
66.0
Equivalently, divorcing the bitmath instance from it’s value (this is coercion):
In [12]: bm_value = bm.value
In [13]: print(num + bm_value)
66.0
What it all boils down to is this: if we don’t provide a unit then bitmath won’t give us one back. There is no way for bitmath to guess what unit the operand was intended to carry. Therefore, the behavior of bitmath is conservative. It will meet us half way and do the math, but it will not return a unit in the result.
Keeping the result as a bitmath type
If the intent is to add or subtract a quantity of the same unit —
for example, incrementing Byte(1) by one more byte — use an
explicit bitmath operand on both sides:
>>> Byte(1) + Byte(1)
Byte(2.0)
>>> KiB(10) - KiB(3)
KiB(7.0)
This makes the unit explicit rather than relying on implicit
conversion, which eliminates ambiguity — KiB(10) - 3 could mean
“subtract 3 KiB” or “subtract the number 3 from the prefix value.”
bitmath does not guess; using a bitmath operand on both sides states
the intent clearly.
9.1.4. Mixed Types: Multiplication and Division
Multiplication has commutative properties. This means that the ordering of the operands is not significant. Because of this fact bitmath allows arbitrary placement of the operands, treating the numeric operand as a constant. Here’s an example demonstrating this.
In [2]: 10 * KiB(43)
Out[2]: KiB(430.0)
In [3]: KiB(43) * 10
Out[3]: KiB(430.0)
Division, however, does not have this commutative
property. I.e., the placement of the operands is
significant. Additionally, there is a semantic difference in
division. Dividing a quantity (e.g. MiB(100)) by a constant
(10) makes complete sense. Conceptually (in the domain of
bitmath), the intention of MiB(100) / 10) is to separate
MiB(10) into 10 equal sized parts.
In [4]: KiB(43) / 10
Out[4]: KiB(4.3)
The reverse operation does not maintain semantic validity. Stated differently, it does not make logical sense to divide a constant by a measured quantity of stuff. If you’re still not clear on this, ask yourself what you would expect to get if you did this:
Unless you’re representing rates that doesn’t mean much at all. The units of operands when expressing rates is going to be context sensitive and can be very non-intuitive without additional knowledge. For example:
>>> 100/bitmath.kB(33)
3.0303030303030303
This is functionally equivalent to writing:
This might mean something to you, but we can’t express that as a prefix unit. We let you do it, but it is up to you to determine the significance of the result.
9.1.5. Design Philosophy: Floating-Point Measurements
bitmath represents sizes as floating-point measurements, not as
discrete counts of hardware bits. This is an intentional design choice.
Every constructor (by unit value, by bytes=, or by bits=)
normalizes its input to a float, so the bytes and bits
properties always return floating-point values.
A file reported as 1.7 GiB is a measurement — the same way
2.3 miles or 1.7 liters are measurements. Physical storage is
discrete (you cannot store half a bit), but the measurement of
storage is legitimately continuous. Fractional values appear naturally
in division, unit conversion chains, and proportional calculations:
>>> KiB(1) / 3
KiB(0.3333333333333333)
>>> MiB(1).to_Bit()
Bit(8388608.0)
>>> KiB(1/3).to_Bit()
Bit(2730.6666666666665)
The last example is not a bug. The fractional bit count is the faithful
representation of a fractional byte input. If you need integer results,
Python’s built-in math.floor(), math.ceil(), and
round() all work on bitmath instances and return an instance
of the same type:
>>> import math
>>> math.floor(KiB(1) / 3)
KiB(0)
>>> math.ceil(KiB(1) / 3)
KiB(1)
>>> round(MiB(1.75))
MiB(2)
Warning
Rounding intermediate results is a lossy operation.
math.floor(GiB(10) / 3) * 3 yields GiB(9), not
GiB(10). Only round at the final output step.
Floating-point accumulation: Because bitmath uses IEEE 754 64-bit
floats internally, arithmetic across many operations may accumulate
small rounding errors — identical to ordinary Python float arithmetic.
For the file-size domain (values up to exabyte scale), 64-bit float
provides approximately 15 significant decimal digits of precision,
which is sufficient for all practical purposes. If exact integer
semantics are required at the byte level, use int(instance.bytes)
to work in raw integers.
See also
Rounding and Integer Conversion — instance methods for rounding and integer conversion.
9.1.6. Footnotes
9.2. On Units
As previously stated, in this module you will find two very similar sets of classes available. These are the NIST and SI prefixes. The NIST prefixes are all base 2 and have an ‘i’ character in the middle. The SI prefixes are base 10 and have no ‘i’ character.
For smaller values, these two systems of unit prefixes are roughly
equivalent. The round() operations below demonstrate how close in
a percent one “unit” of SI is to one “unit” of NIST.
1In [15]: one_kilo = 1 * 10**3
2
3In [16]: one_kibi = 1 * 2**10
4
5In [17]: round(one_kilo / one_kibi, 2)
6
7Out[17]: 0.98
8
9In [18]: one_tera = 1 * 10**12
10
11In [19]: one_tebi = 1 * 2**40
12
13In [20]: round(one_tera / one_tebi, 2)
14
15Out[20]: 0.91
16
17In [21]: one_exa = 1 * 10**18
18
19In [22]: one_exbi = 1 * 2**60
20
21In [23]: round(one_exa / one_exbi, 2)
22
23Out[23]: 0.87
They begin as roughly equivalent, however as you can see (lines: 7, 15, and 23), they diverge significantly for higher values.
Why two unit systems? Why take the time to point this difference out? Why should you care? The Linux Documentation Project comments on that:
Before these binary prefixes were introduced, it was fairly common to use k=1000 and K=1024, just like b=bit, B=byte. Unfortunately, the M is capital already, and cannot be capitalized to indicate binary-ness.
At first that didn’t matter too much, since memory modules and disks came in sizes that were powers of two, so everyone knew that in such contexts “kilobyte” and “megabyte” meant 1024 and 1048576 bytes, respectively. What originally was a sloppy use of the prefixes “kilo” and “mega” started to become regarded as the “real true meaning” when computers were involved. But then disk technology changed, and disk sizes became arbitrary numbers. After a period of uncertainty all disk manufacturers settled on the standard, namely k=1000, M=1000k, G=1000M.
The situation was messy: in the 14k4 modems, k=1000; in the 1.44MB diskettes, M=1024000; etc. In 1998 the IEC approved the standard that defines the binary prefixes given above, enabling people to be precise and unambiguous.
Thus, today, MB = 1000000B and MiB = 1048576B.
In the free software world programs are slowly being changed to conform. When the Linux kernel boots and says:
hda: 120064896 sectors (61473 MB) w/2048KiB Cachethe MB are megabytes and the KiB are kibibytes.
Source:
man 7 units- http://man7.org/linux/man-pages/man7/units.7.html
Furthermore, to quote the National Institute of Standards and Technology (NIST):
“Once upon a time, computer professionals noticed that 210 was very nearly equal to 1000 and started using the SI prefix “kilo” to mean 1024. That worked well enough for a decade or two because everybody who talked kilobytes knew that the term implied 1024 bytes. But, almost overnight a much more numerous “everybody” bought computers, and the trade computer professionals needed to talk to physicists and engineers and even to ordinary people, most of whom know that a kilometer is 1000 meters and a kilogram is 1000 grams.
“Then data storage for gigabytes, and even terabytes, became practical, and the storage devices were not constructed on binary trees, which meant that, for many practical purposes, binary arithmetic was less convenient than decimal arithmetic. The result is that today “everybody” does not “know” what a megabyte is. When discussing computer memory, most manufacturers use megabyte to mean 220 = 1 048 576 bytes, but the manufacturers of computer storage devices usually use the term to mean 1 000 000 bytes. Some designers of local area networks have used megabit per second to mean 1 048 576 bit/s, but all telecommunications engineers use it to mean 106 bit/s. And if two definitions of the megabyte are not enough, a third megabyte of 1 024 000 bytes is the megabyte used to format the familiar 90 mm (3 1/2 inch), “1.44 MB” diskette. The confusion is real, as is the potential for incompatibility in standards and in implemented systems.
“Faced with this reality, the IEEE Standards Board decided that IEEE standards will use the conventional, internationally adopted, definitions of the SI prefixes. Mega will mean 1 000 000, except that the base-two definition may be used (if such usage is explicitly pointed out on a case-by-case basis) until such time that prefixes for binary multiples are adopted by an appropriate standards body.”
9.3. Who uses Bitmath
Shout-outs to all of the bitmath adopters out there I was able to identify:
ClusterHQ’s “Flocker”. A data volume manager for Dockerized applications
VMware’s vsphere flocker storage driver
EMC’s scaleio flocker storage driver
Dell Storage’s storage center block device flocker driver
TravelCRM - Free CRM for travel companies
direscraw by Brian Mikolajczyk for recovering lost files
sizer by Calle Liljeholm. Calculating useable capacity in a flocker cluster