proper (standardized) file size symbols?

JackFoo · Post by *JackFoo » 2003-06-03, 10:27 UTC

I must say it again: This is wrong. M is 1024*1024 in TC. Furthermore such an irregular scheme as you insist on would not be like TC/Christian at all. We are just talking about a notation. So cluster size and stuff like that is completely irrelevant here.

Well I'm not insisting, however several points do bother me:
1. If k, M, G are all converted by the same scale why are they in different case?
2. If ---//----- why Christian only points out that k = 1024b but doesn't for M and G?
3. As I said earlier a floppy is called 1.44MB but using TC's conversion you'll end up with aprox. 1.406MB which IS the correct way but a little non-standard (de-facto standard, that is).

I think that to end this I would just make some files on my puter (such that cluster size wouldn't matter) and check it out.

BTW I didn't say anything about rounding or truncating, TC's help:

Code: Select all

Additionally, the file size is rounded to the cluster size of the source and destination directory, to get real space required.

Hmmm... maybe it's only rounds when using [Calculate space]?

Cheers.

Maxwish · Post by *Maxwish » 2003-06-03, 11:06 UTC

JackFoo wrote:1. If k, M, G are all converted by the same scale why are they in different case?

According to the SI standard, all prefixes used when a size is bigger than the base unit are in capital letters (M, G, T, etc) and all prefixes used when a size is smaller than the base unit are in small letters (c, m, µ, n).
Exceptions: k (small) for kilo because K (capital) can be confused with K form Kelvin. Same goes for and h (hecto) and da (deca) but I can't remember what units they can be confused with...

Maxwish · Post by *Maxwish » 2003-06-03, 11:17 UTC

b, B
Use little b for bit, big B for Byte. Spell these out where necessary to avoid ambiguity.

k
Little k is the standard SI prefix for 10^3 (1000). It is not often used in computing.

K
Use big K for the multiplier 2^10 (1024) common in computing. Do not write or pronounce big K as kilo; to do so invites confusion with little k, 1000. Simply write it as upper-case K and pronounce it "kay".

baud
The term baud does not apply to data rate, but to symbol rate. When you see the unit baud used in computing, the unit b/s (bit per second) is nearly always meant.

mega, giga
When applied to a base unit other than bit, byte or pixel, M (mega) and G (giga) refer to the SI power-of-ten multipliers 10^6 and 10^9. Standard data communication rates are based on powers of ten and use the SI multipliers, not power-of-two multipliers: 1.544 Mb/s denotes 1 544 000 bits per second; 19 200 bits per second is properly written 19.2 kb/s (not 19.2 Kb/s).

disk storage
When applied to bytes of disk storage capacity:
M (mega) denotes 10^3 * 2^10 (1000 K)
G (giga) denotes 10^6 * 2^10 (1 000 000 K).

bits, bytes or pixels
When applied to raw bits, bytes or pixels:
M (mega) denotes 2^20 (1024 K)
G (giga) denotes 2^30

In computing, M (mega) and G (giga) are ambiguous. M could denote 1 000 000, 1 024 000, or 1 048 576. G could denote 1 000 000 000, 1 024 000 000, 1 048 576 000, or 1 073 741 824. The value of the giga prefix in computing varies more than 7 percent depending on its context. If an exact value is important, write out the whole number!

3. As I said earlier a floppy is called 1.44MB but using TC's conversion you'll end up with aprox. 1.406MB which IS the correct way but a little non-standard (de-facto standard, that is).

a floppy can hold 1 457 644 Bytes/1024 = 1 423.5 KB = 1.42 MB ???
Now I don't get it, why is it called 1.44 MB ???
Idea: is the last 0.02 MB used to give the floppy structure during the formatting ?

JohnFredC · Post by *JohnFredC » 2003-06-03, 13:07 UTC

because it is a FACT (not just an opinion) that rounding is more accurate than truncating generally.

That would depend on the rounding method. For instance, years ago (I haven't checked lately) the rounding methods used by both Lotus 123 and Excel were statistically inaccurate.

IF the rounding algorithm gives results that, when accumulated (summed) over a large number of instances, cause that sum to converge toward the true sum of the original, unrounded, values, then I say use it...

Otherwise, it seems six of one and half a dozen of the other, to me. I have hundreds of gigabytes (however that is defined) of disk space. The exact number of bytes in any particular file is rarely of importance to me. The closest I would ever care about would be an accuracy to one sector size.

"Back in the days", well, I felt differently. Not now.

Though I suppose I do care more about file sizes on my Pocket PC, so in that case these issues have bearing.

Valentino · Post by *Valentino » 2003-06-03, 13:39 UTC

jb wrote:Although not explicitly expressed in the post somewhere (where and when was it again? In the Internet 2 years ago? ) someone agrees that rounding is preferable because it is a FACT (not just an opinion) that rounding is more accurate than truncating generally.

So, we came to agreement!

JackFoo · Post by *JackFoo » 2003-06-03, 14:19 UTC

That would depend on the rounding method. For instance, years ago (I haven't checked lately) the rounding methods used by both Lotus 123 and Excel were statistically inaccurate.

Hmm, unless MS an Lotus made some dorky implementation of a simple function I can't see it happening. Mathematically speaking rounding is the best choice; for example given a fair bit that can be 0 or 1 with equal probability (1/2), the best choice IS the mean value i.e. 0.5 the same applies to rounding.

About the 1.44MB; a floppy has 1,474,560b unformatted and 1,457,644b when formatted in FAT. The calculation is done using either 1440/1024~1.40 or 1457644/(1024)^2~1.39 which is the real space.
Take a look here:

Taken from obscure DOS documentation

Code: Select all

PSS ID Number: Q121839
Article last modified on 03-06-1996
 
3.x 4.x 5.x 6.00 6.20 6.21 6.22 | 3.10 3.11 95
 
MS-DOS                          | WINDOWS
 

---------------------------------------------------------------------
The information in this article applies to:
 
 - Microsoft MS-DOS operating system versions 3.x, 4.x, 5.x, 6.0,
   6.2, 6.21, 6.22
 - Microsoft Windows operating system versions 3.1, 3.11
 - Microsoft Windows for Workgroups versions 3.1, 3.11
 - Microsoft Windows 95
---------------------------------------------------------------------
 
SUMMARY
=======
 
The 1.44-megabyte (MB) value associated with the 3.5-inch disk format
does
not represent the actual size or free space of these disks. Although its
size has been popularly called 1.44 MB, the correct size is actually
1.40
MB.
 
MORE INFORMATION
================
 
The correct size is determined by multiplying the number of tracks,
sides,
sectors per track, and 512 bytes per sector, then subtracting the bytes
required to format the disk, and then dividing this figure by 1024. For
a
"1.44-MB" 3.5-inch floppy disk, there are
 
   80 tracks
   18 sectors per track
   512 bytes per sector
   2 sides
 
Multiplying the above gives you 1,474,560 bytes. This is the unformatted
size.
 
To determine the number of bytes formatting requires, you need to know
how
many bytes are used for the boot sector, file allocation table (FAT),
and
root directory.
 
There is 1 sector used for the boot sector, which is 512 bytes; 18
sectors for the two FATs (9 sectors each), which is 9216 bytes (512 *
18 = 9216); and 14 sectors for the root directory, which is 7168
bytes.
 
NOTE: There are two ways to arrive at the 7168 number:
 
   224 entries * 32 bytes per entry = 7168 bytes
 
   -or-
 
   512 bytes per sector (14 * 512 = 7168 bytes)
 
Adding these figures gives you 16,896 bytes.
 
Subtracting the amount used for formatting from the total unformatted
size
gives you 1,457,664. (1,474,560 - 16,896 = 1,457,664 bytes)
 
Dividing the above figure by 1024 bytes generates 1440. (1,474,560 /
1024 = 1440 KB)
 
To convert to megabytes, divide by 1024. (1440 KB / 1024 = 1.406 MB)
 
This formula works for 1.2-MB disks as well. The only variable is the
number of sectors, which is 15, for the calculations with 1.2-MB disks.
 
From the calculations shown above, we can see that the 3.5-inch disk
considered to have 1.44 MB free disk space actually has 1.40 MB, and the
5.25-inch disk considered to have 1.2 MB actually has 1.17 MB.
 
The misunderstanding comes from the incorrect calculation below:
 
   1440 KB / 1000 = 1.44 MB
 
The calculation should be:
 
   1440 KB / 1024 = 1.40 MB
 
There are 1024 bytes in a kilobyte, not 1000.
 
Note that in Windows 95, the properties for a blank, formatted 3.5-inch
1.44-MB disk show that there are 1.38 MB of free disk space.

Cheers.

Maxwish · Post by *Maxwish » 2003-06-03, 14:38 UTC

So my idea was right: some space is used for the formatting.
So 1 floppy = 1.44 MB = 1.44 * 1000 * 1024 = 1 474 560 Bytes - 16 896 (space for formatting) = 1 457 664 Bytes free
So 1 MB = 1000 KB

The misunderstanding comes from the incorrect calculation below:
1440 KB / 1000 = 1.44 MB
The calculation should be:
1440 KB / 1024 = 1.40 MB
There are 1024 bytes in a kilobyte, not 1000

Good article but note the last bold sentence:
I agree there are 1024 Bytes in a KB, but this still doesn't mean there are 1024 kilobytes in a MB.
So the above calculation 1440 KB /1000 = 1.44 MB is still correct IMO

Note that in Windows 95, the properties for a blank, formatted 3.5-inch
1.44-MB disk show that there are 1.38 MB of free disk space

TC shows 1 457 644 B free when footer display is set to bytes (correct)

and 1 423 KB free when the display set to kilobytes:
1 457 644 /1024 = 1 423.48 KB ~ 1 423 KB (correct)

when using the dynamic display (x.x k/M/G) TC shows 1.3 MB ??
so this can only be explained if 1 457 644 / 1024 = 1 423.48 KB /1024 = 1.390 MB ~ 1.3 MB...
So Christian does use 1024 KB = 1 MB and truncates the values
now I'm really confused....

JackFoo · Post by *JackFoo » 2003-06-03, 15:02 UTC

A lot of people already noted that TC truncates the result, this would explain your result.

BTW how can someone confuse k - kelvin in computers?????

@jb : it seems you're correct after all, my sincere apologies.

Cheers.

JohnFredC · Post by *JohnFredC » 2003-06-03, 15:07 UTC

If I remember correctly, I discovered that Lotus and Excel always rounded up when faced with an "up or down" decision.

For instance, consider the following numbers:

1.5
2.5
3.5
---
7.5

Lotus and Excel always rounded these to:

2
3
4
---
9

Whereas the better method:

1
3
3
---
7

... produces a more accurate result.

So does

2
2
3
---
7

The Excel and Lotus results were from 10 or more years ago and their methods only were "bad" for numbers that ended in 5 and were to be rounded to the left by one position. I haven't checked lately.

Maxwish · Post by *Maxwish » 2003-06-03, 15:21 UTC

2 JFC
I find it more than logical that 2.5 is rounded up to 3. You can't have a spreadsheet round 2.5 to 2 or 3 at random as in your examples (except if your a bookkeeper for ENRON ...)

The only correct way to do it is to add up the numbers as decimals and round the answer: 1.5+2.5+3.5= 7.5 ~ 8

JohnFredC · Post by *JohnFredC » 2003-06-03, 15:55 UTC

Hi MaxWish...

You can't have a spreadsheet round 2.5 to 2 or 3 at random as in your examples (except if your a bookkeeper for ENRON ...)

That's not at random. The rule is to inspect whether the digit to the left is even or odd, then round either up or down accordingly... it doesn't matter which direction, as long as even is rounded one way and odd is rounded the other.

My first "improved" example was "round up" on even and the second example was "round down" on even.

Since there are the same number of evens and odds in the number system, you can see that such a rounding approach is statistically "correct".

The spreadsheet programs have embedded Round functions that provide interfaces to internal schemae. So, no problem for the big vendors to implement in code such an algorithm as the one I describe.

I've done it myself using div and mod.

JackFoo · Post by *JackFoo » 2003-06-03, 16:37 UTC

Since there are the same number of evens and odds in the number system, you can see that such a rounding approach is statistically "correct".

Yes, there is a lemma in probability theory about it, but it requires quite a large number of values to work correctly also it requires the values to be randomly chosen. In short with small number of values probability will not ensure anything nor will it if the values aren't random.

So this doesn't solve the problem, the only solution in this case is to make a random round (viva La Enron) or round only the sum and not the intermediate values.

Cheers.

Spaghettificus · Post by *Spaghettificus » 2003-06-03, 19:41 UTC

I've noticed that a lot of sharing programs store file sizes like :
(unrounded KB) I mean 1024 bytes.

It's too bad more programs don't round KB sizes correctly. I'm not sure if TC should round byte sizes or not.

JohnFredC · Post by *JohnFredC » 2003-06-03, 21:42 UTC

Hey JackFoo

requires quite a large number of values to work

It really doesn't. Use a random number generator to make a list of numbers, select an arbitrary rounding digit and try it! Then sum your rounds, sum the original values, and compare.

One version of the algorithm is:

Rule 1: If the digit_to_be_rounded<5, then replace it and all digits to the right with 0.

Rule 2: If the digit_to_be_rounded>5, then replace it and all digits to the right with 0 AND increment the digit to its left by 1

Rule 3: If the digit=5, then do:
if the digit to the left is odd, then apply Rule 1
else
apply Rule 2.

JackFoo · Post by *JackFoo » 2003-06-03, 21:53 UTC

It really doesn't. Use a random number generator to make a list of numbers, select an arbitrary rounding digit and try it! Then sum your rounds, sum the original values, and compare.

Cute but no dice, if this rounding method is for convenience only and doesn't implement crucial data then you're ok. BUT assuming you'll get equal number of odds and evens with a small (less than at least a couple of K's) number of values is plain wrong and not just mathematically speaking. Using random values will help but once your numbers have bias (and they usually do) you'll start getting "weird" results; this is probability and is provable.

Cheers.