by Douglas W. Hubbard ## Outline

Measuring is not counting: anything lowering the level of uncertainty can be counted as measurement.

• The first step of measurement is establishing what need to be measured - what is valuable is usually not easy to measure (calculated through value of information). If information is not measurable it might be decomposed into more measurable information.
• The second step is to determine what precision is required - sometimes only a rough range is giving most of the information.
• The next step would be to chose a measuring instrument and use it. The book is presenting plenty, including calibrated estimates, monte-carlo simulations, sampling, bayesian statistics, a few human-based measurements (willingness to pay, risk tolerance, subjective trade-offs, Rasch Models, Lens Model and linear models), and a few new-technology methods.

## Brilliant ideas

• To have a value, estimates must be calibrated: we must determine how good we are at estimates by measuring how good we are at estimates. (See calibration exercise)
• Information has value. The Expected Value of Information is equivalent to the reduction in Expected Opportunity Loss. `EVI = EOL before info` - `EOL after info` where `EOL = chance of being wrong x cost of being wrong`.
• The vast majority of variables have an information value of zero (current level of uncertainty about that variable is acceptable) - no further measurement is justified. It usually is worth measuring more if the confidence in the new measurement is higher than the existing threshold.
• Start by documenting - others most probably thought and worked on measuring it.
• When sampling, beware of the observer bias (Heisenberg & Hawthorne bias) - observing causes changes in behavior. Also expectancy and selection bias are lowering results’ quality.
• When sampling, if sample are completely random in an homogeneous distribution, a very small sample (5 or less) is giving a 93% CI measure (simple statistics). Samples also help a lot with calibrated estimates (e.g. if asked to estimate average jelly bean weight, knowing that the first two weigh 1 gram is good information)
• If information does not exist to confirm or infirm hypotheses, create it by experimentation
• Scores and weighed score cards are usually very bad at measuring. They are often used where actual quantitative measurements could be done and much more suited. A way to make them better is to calculate the std-dev of each criteria and adjust each score to make it more calibrated.

## Other Ideas

• Determine whether to measure by asking is there any measurement method that can reduce the uncertainty enough to justify the cost of measurement
• Calibration exercise: try to estimate a 90% confidence range on some questions and verify how frequently we are correct, difference between 90% and actuals is how good we are. Look at each bound independently and ask: am I 95% sure value is over/under that estimate. Same thing with binary tests: estimate validity of statements and give an estimate that you’re true. Average of confidence compared to actual result tells if you’re under or over confident.
• Risk paradox: if an organization uses risk analysis, it is usually for the routine operational decisions, rarely for risky decisions
• Don’t start a brand new topic by Google but Wikipedia
• Observation - does the information we’re searching for leaves a trail of any kind? If not, can we observe it directly? If not, can we observe it through proxies? If not, can we “force it” through experiments?
• Release-Recatch is how they calculate the population of animals: catch 1000 birds, tag them, release, wait for some time for them to blend again, then catch a 1000 again. If you catch 50 with tags, means that 1000 is 5% of the population and you have 20.000 birds. Move to a range by calculating simple variance, then combine with t-statistics → `20.000 + 20.000 * variance x 1.645` and `20.000 - 20.000 * variance x 1.645`
• Correlation in data is expressed by a number between 1 and -1. 1 means variable increase in direct relationship (there is a transfer function from one to the other), 0 means there is no relation whatsoever. When put on a graph, correlations can be easy to spot
• Rasch - to be completed
• New methods include using markets or large portions of internet to get opinions.

### Errors in measurement

• Systemic: tendency to have a consistent error in one direction
• Random error: error that is not predictable, not consistent
• Accuracy: characteristic of a measurement having low systemic errors
• Precision: characteristic of a measurement having low random errors levels.

### Estimates

Estimates usually have a value only if the estimator is calibrated (actually knows what 90% confidence is and verified she is 90% right all the time) 1) Estimates can be done by range with a confidence indicator: 90% confidence means that you think the actual value has 95% chance of being higher that lower bound and 95% chance of being lower than higher bound. 2) Binary estimates can be done by statements + confidence indicator (CI)

### Simple Monte-Carlo

• Decompose the value to measure into smaller variables
• Generate random normally-distributed values using Excel: `=norminv(rand(), average, (upper-lower) / stddev in values)`.
• Observe results and look at the distribution
• Conclude

### Assemble an instrument

If the instrument is not directly visible:

• Imagine what the consequences through absurd
• How would others do it?
• Iterate
• Just do it.

### T-statistics and samples

Samples are great to reduce uncertainty when it’s very large, and is usually helping to narrow down calibrated estimates.

To calculate the error margin for a small sample set, multiply std-dev by the corresponding T-score for 90% CI: ||Sample Size||t-score|| |2|6.38| |3|2.92| |4|2.35| |5|2.13| |6|2.02| |8|1.89| |12|1.80| |16|1.75| |28|1.70| |More|1.645|

### Bayesian stats

`P(A|B) = P(A) x P(B|A) / P(B)`. This can be used with calibrated estimates: give an estimate, then refine using the topic, on both A and B → refined estimate. That can be translated into “inverted-bayesian”: what is the chance of seeing X if the truth is Y is equivalent to what is the chance the truth is Y if I see X.

### Willingness to pay

Measuring happiness can be achieved by establishing correlation between income, life events, and happiness. (A. Oswald did that and achieved the calculation that a healthy marriage is happiness equivalent to an additional \$100k / y).

Measurements just help making enlightened decisions, they don’t necessarily call for making the most cost-effective justified decisions → e.g. if you prefer giving money to a local business that is more expensive that a big competitor. That is art buying problem.

It is possible to measure trade-offs and acceptable thresholds by measuring several variables using different predetermined setups. The goal is to find points on the “boundary”: It is possible to take additional parameters into account by determining severable boundaries (e.g. for \$100k investments, then \$120k investments, …). Point of the same boundary are equally valuable. This can be very useful when comparing options with different strong points, such as performance comparison across people.

### Using human judges

A good way to get estimates is gathering people with know-how who can naturally estimate things through their experience. Beware of biases:

• anchoring (being influenced by other unrelated numbers)
• halo-effect (if somebody favours a solution, she might interpret every new information about the solution under a positive light) - respectively horns effect
• bandwagon effect - we are influenced by other people’s opinions
• emerging preferences - post-rationalizing judgement criteria because we like a solution