The Value of “Small Data” in ML and AI

This is a comment from LinkedIn.

I wish we paid more attention to “small data”. Models that are built from small data aren’t necessarily bad – it depends on the data generating process you’re trying to model. More data doesn’t necessarily imply better models, especially if the veracity of the data is questionable. Data-centric AI is a discussion that’s being had now in this context. However, when you don’t need large scale ML models are are (prudently) content building statistical tests and simple models, these small data problems become important.

What decision makers shouldn’t forget is that the essential nature of decision making won’t change just due to the size of the data – ultimately it is the insight that models provide (based on many factors) that are the commodity we consume as decision makers. Consequently there should not be an aversion towards “small data” problems but a healthy curiosity. Like all efficiency movements that came before, small data paradigms are innately attractive – if I can verifiably build better models by doing less work, that should logically be a point of value.