Rounding Percentages

Posted on: September 7th, 2005 5:53 AM GMT

By: Greg Reimer (Code Monkey Extraordinaire)

Topic: long tail, programming, algorithms, percentages, rounding, logarithms, math, tech

Maybe someday somebody will find this through Google and it will be useful. My explanations might be inadequate, but hopefully it will make sense from the examples, and hopefully my make-believe code translates into your language of choice well. So here goes. While doing percentage calculations, I ran into problems when I realized that the long tail of the data was being swept unceremoniously under the rug by my hamfisted approach to rounding. In oversimplified make-believe code, here's what I was doing:

println dataset.name

foreach group in dataset
  # round(number, precision)
  # round to hundredths (1 place to right of decimal)
  println group.name, 
    round((group.count / dataset.count) * 100, 1)
end

Which output something like this:

Item Types
itemA %48.8
itemB %14.4
itemC %12.8
itemD %10.4
itemE % 8
itemF % 3.6
itemG % 1.1
itemH % 0.7
itemI % 0.2
itemJ % 0.1
itemK % 0
itemL % 0
...

There was data behind those 0% numbers at the end, and I realized of course I needed to be taking into account significant digits in order to see it. So I came up with a special function:

function sig_round(num, sig_digits)
  return round(num, floor(0 - log10(num)) + sig_digits)
end

println dataset.name

foreach group in dataset
  # round to three significant digits
  println group.name, 
    sig_round((group.count / dataset.count) * 100, 3)
end

Which gave me much more meaningful results:

Item Types
itemA %48.8
itemB %14.4
itemC %12.8
itemD %10.4
itemE % 8.01
itemF % 3.59
itemG % 1.14
itemH % 0.732
itemI % 0.165
itemJ % 0.0965
itemK % 0.0478
itemL % 0.0231

But I realized I wasn't out of the woods yet. For these percentages, numbers approaching 100 implied a remainder approaching zero, so I needed the algorithm to respect this remainder for percentages over 50, otherwise I could end up with results like this:

Item Types
itemA %100
itemB %  0.0231

Hence the final code:

function sig_perc_round(num, sig_digits)
  lognum = (num < 50) ? num : 100 - num
  return round(num, floor(0 - log10(lognum)) + sig_digits)
end

println dataset.name

foreach group in dataset
  # round the percentage to three significant digits
  println group.name, 
    sig_perc_round((group.count / dataset.count) * 100, 3)
end

Which gives me what I wanted in both cases:

Item Types
itemA %48.8
itemB %14.4
itemC %12.8
itemD %10.4
itemE % 8.01
itemF % 3.59
itemG % 1.14
itemH % 0.732
itemI % 0.165
itemJ % 0.0965
itemK % 0.0478
itemL % 0.0231

And also:

Item Types
itemA %99.977
itemB % 0.0231

So there you have it. A way to round percentages that doesn't step on your tail(s).

weblog home »
show all posts »

Valid XHTML Valid CSS Valid Atom