Analytics Mindset- Spend just 10 minutes on Twitter to catch up with Covid-19 news, and you’ll run into updated numbers and loud (sometimes angry) arguments about what all the data we’re collecting means. It’s proving difficult to pin down how infectious the virus is, what its mortality rate is, how effective different mitigation efforts are, and why different regions are seeing such different patterns of infection, mortality, and recurrence.
That lack of certainty is not at all surprising; after all, it’s a new disease that we’re learning about in real time, under horribly high-pressure conditions. Moreover, different regions have vastly different testing capacity and healthcare systems — those factors alone can explain much of the variability we’re witnessing.
That said, epidemiologists and other experts are running into many of the same issues that come up in any data-analytics problem. The truth is that collecting and analyzing data is rarely straightforward; at every stage, you need to make difficult judgment calls. The decisions you make about three factors — whom to include in your data set, how much relative weight to give different factors when you investigate causal chains, and how to report the results — will have a significant impact on your findings. Making the right calls will save lives in the current healthcare crisis, and improve performance in less drastic business settings.
Who should be tested?
In the case of an unknown disease, it is easiest to test only very sick people or even those who have already passed away. (In areas without enough testing kits, there may not be any choice in the matter.) Unfortunately, while this approach is easiest, it increases the perceived mortality rate. Let’s say 10 people are very sick and 1 would fall victim to a disease. Then we would record a 10% mortality rate. But if 100 people were actually infected, and 90 of them had mild symptoms (or no symptoms at all), then the actual mortality rate would be 1% — but you wouldn’t know that unless you tested more widely. The lesson: only looking at the most obvious cases makes the virus look worse than it is. Statisticians call this issue a selection bias in sampling.