Did you ever wonder, why all files have approximately the same number of zeroes and ones?

Lets elaborate this statistical reality.

We shall look at combination, that is the mathematical definition of combination.

Combination deals with the total size of a set and the size of a subset. Then the total number of unique places the subset can take within the set is calculated. This then becomes the output of the combination algorithm.

For example the 'set' can be eight chairs and the subset can be three people where, each person is to occupy no less and no more than one seat. Therefore there is exactly three seats being sat on and five seats that are empty.

Combination deals with the total number of seating that can be done given the three people and eight chairs whereby each seating is (at least slightly) different in which seats are taken.

Also...again, different seats being sat on makes a different seating, two people swapping seats is not considered a different seating.

The algorithm to be used to calculate this is :

'C(n,r)' stands for 'Combination(n,r)' where n is the 'set' and r is the 'subset'.

Learn to read functions again, 'C(n,r)' is a declaration of a function.

Given we are talking about Combination, you should try to automatically associate the C in 'C(n,r)' with Combination. You are familiar with numbers but probably not familiar with 'letters as numbers' otherwise called variables.

Variables are numbers but what their value is can vary. The reason for this will be explored later.

Suffice it to say, when you look at 'C(n,r)' the C supposed to be a representation of Combination. Combination requires two inputs to calculate the total number of unique seating. Those two are 'n' and 'r'. Note that such declaration of a function does not contain its implementation, merely what it does and what input it takes.

In the above image 'C(n,r)' is presented as equivalent to 'n!/(r!(n-r!))'.

'C(n,r)' again, is a meaningful description in that the C represents Combination, but this meaningful description does not include the implementation.

'n!/(r!(n-r!))' is the implementation.

The n and r simply to be understood as whatever n and r is in 'C(n,r)' given a specific calculation, such as C(8,3)

the algorithm specifically looks like this: 8!/(3!(8-3!)), yes.

So. we learned about numbers, such as 8 and 3.

We learned about brackets such as () which describes the order of calculation where calculating the most nested brackets is done first.

We learned about variables. So on the top level of an algorithm, both numbers and variables can exist where numbers represent parts of the algorithm that are static and variables where the numbers may be different from one use of the algorithm to another.

The bottom of two levels, all variables received a number, whereby the calculation could actually be done. These two levels exist for all algorithms including variables.

We learned about different representations of functions.

Now then. Information theory, that is, lets answer the question, why all bit-sets have approximately the same number of zeroes and ones.

Quickly, we had the example above of 8 chairs and lets say 3 people each sitting on one chair. C(8,3) in other words.

But for finding out why all files have aproximately the same number of zeroes and ones, chairs and people are a bad example. We will use the correct terminology.

The variable n will be called the number of bits and the variable r will be called the number of ones.

Again, just like from the number of chairs and the number of people therefore the number of chairs currently sat on, we could deduce the number of seats that have no one sitting on it, from the knowledge of the number of bits and the number of those bits whose value is one, we can deduct the number of bits with value zero.

Now the, the finale. if all different states of an 8 bit bit-set are divided based on the number of ones it contains,

4 ones gets the largest number of unique states. This is followed by a tie by 3 ones and 5 ones.

Note this is a law. You could go from 8 bit bit-set to 1MB or in other words: 8,000,000 bit-set, the 'r' value containing the largest number of unique states is 4,000,000 followed by a tie at 3,999,999 and 4,00,001.

So what am I saying. I present it as a theory that when writing a software, creating graphics or typing a document, the bits that make up the software and the bits that make up the graphics and the bits that make up the document have equal chance of representing their respective file by each state within the bit-set.

SO. Given the fact #1 that the more equal the ratio of zeroes and ones is within a bit-set, the larger the number of unique states that are contained within that ratio.

And theory #1 that whatever the file being represented, the chance of representation across all states within the

bit-set is the same, ratio of zeroes and ones closer to each other will be more likely to be the ones representing the file.

Ratios of zeroes and ones are close to each is another way of saying, the file has approximately the same number of zeroes and ones, therefore the above elaboration explains why this is so.