How much data is enough?

by | Feb 6, 2020 | ICT4D |

‘Go big or go home.’ ‘Take when you can.’ ‘You don’t know if you’ll be able to come back tomorrow, so do it all now.’ There are many variations of these sayings. Most of them come from a scarcity perspective – resources are limited, while demands are endless, so hoard resources when you can as it generates wealth and power.

We see this play out in the data space. Collecting information is easy, so we collect lots of it because we can. But then, eventually, we realise huge datasets are a liability as we need to protect them. And this approach is the opposite of the data minimisation approach – collect only the data you need for the work you are doing. So don’t be speculative.

Being clear on why you need the information can be difficult. Whatever our purpose or problem we are trying to solve is, we should start asking, what is the minimum data required to solve it.

If our goal is a unique ID for each person we work with so that we have the ability to de-duplicate our lists, we likely need only 5-7 data fields, not 30. The other data fields might be interesting and even useful for possible future projects, but aren’t needed to solve the stated problem. And when we need more information in the future, we can always add it to the dataset about the person.

Scarcity thinking leads to hoarding, building walls and moats. This might work in the short term, but rarely in the long term. Perhaps it would be helpful for us to begin asking ‘how much data is enough?’

Photo by davisco


Submit a Comment

Your email address will not be published. Required fields are marked *