Rounding Percentages
Posted on: September 7th, 2005 5:53 AM GMT
Topic: long tail, programming, algorithms, percentages, rounding, logarithms, math, tech
Maybe someday somebody will find this through Google and it will be useful. My explanations might be inadequate, but hopefully it will make sense from the examples, and hopefully my make-believe code translates into your language of choice well. So here goes. While doing percentage calculations, I ran into problems when I realized that the long tail of the data was being swept unceremoniously under the rug by my hamfisted approach to rounding. In oversimplified make-believe code, here's what I was doing:
println dataset.name foreach group in dataset # round(number, precision) # round to hundredths (1 place to right of decimal) println group.name, round((group.count / dataset.count) * 100, 1) end
Which output something like this:
Item Types itemA %48.8 itemB %14.4 itemC %12.8 itemD %10.4 itemE % 8 itemF % 3.6 itemG % 1.1 itemH % 0.7 itemI % 0.2 itemJ % 0.1 itemK % 0 itemL % 0 ...
There was data behind those 0% numbers at the end, and I realized of course I needed to be taking into account significant digits in order to see it. So I came up with a special function:
function sig_round(num, sig_digits)
return round(num, floor(0 - log10(num)) + sig_digits)
end
println dataset.name
foreach group in dataset
# round to three significant digits
println group.name,
sig_round((group.count / dataset.count) * 100, 3)
end
Which gave me much more meaningful results:
Item Types itemA %48.8 itemB %14.4 itemC %12.8 itemD %10.4 itemE % 8.01 itemF % 3.59 itemG % 1.14 itemH % 0.732 itemI % 0.165 itemJ % 0.0965 itemK % 0.0478 itemL % 0.0231
But I realized I wasn't out of the woods yet. For these percentages, numbers approaching 100 implied a remainder approaching zero, so I needed the algorithm to respect this remainder for percentages over 50, otherwise I could end up with results like this:
Item Types itemA %100 itemB % 0.0231
Hence the final code:
function sig_perc_round(num, sig_digits)
lognum = (num < 50) ? num : 100 - num
return round(num, floor(0 - log10(lognum)) + sig_digits)
end
println dataset.name
foreach group in dataset
# round the percentage to three significant digits
println group.name,
sig_perc_round((group.count / dataset.count) * 100, 3)
end
Which gives me what I wanted in both cases:
Item Types itemA %48.8 itemB %14.4 itemC %12.8 itemD %10.4 itemE % 8.01 itemF % 3.59 itemG % 1.14 itemH % 0.732 itemI % 0.165 itemJ % 0.0965 itemK % 0.0478 itemL % 0.0231
And also:
Item Types itemA %99.977 itemB % 0.0231
So there you have it. A way to round percentages that doesn't step on your tail(s).