Efficiently estimating recall
tl;dr Factorize recall measurement into a cheaper precision measurement problem and profit. Measuring precision tells you where your model made a mistake but measuring recall tells you where your model can improve. Estimating precision directly is relatively easy but estimating recall directly is quintessentially hard on “open-world” domains because you don’t know what you don’t know. As a result, recall-oriented annotation can cost an order of magnitude more than the analogous precision-oriented annotation. By combining cheaper precision-oriented annotations on several models’ predictions with an importance-reweighted estimator, you can triple the data efficiency of getting an unbiased estimate of true recall. If you’re trying to detect or identify an event with a machine learning system, the metrics you really care about are precision and recall: if you think of the problem as finding needles in a haystack, precision tells you how often your system mistakes a straw of hay for a needle, while recall helps you measure how often your system misses needles entirely. Both of these metrics are important in complementary ways. The way I like to think about it is that measuring precision tells you where your model made a mistake while measuring recall tells you where your model can improve. ...