Good news! There certainly has been a tendency in ML & AI to obtain and use larger and larger datasets for training.
"... A new algorithmic method developed by MIT researchers could help. Their mathematical framework provably identifies the smallest dataset that guarantees finding the optimal solution to a problem, often requiring fewer measurements than traditional approaches suggest. ...
The researchers first developed a precise geometric and mathematical characterization of what it means for a dataset to be sufficient. Every possible set of costs (travel times, construction expenses, energy prices) makes some particular decision optimal. These “optimality regions” partition the decision space. A dataset is sufficient if it can determine which region contains the true cost. ..."
From the abstract:
"We study the fundamental question of how informative a dataset is for solving a given decision-making task. In our setting, the dataset provides partial information about unknown parameters that influence task outcomes.
Focusing on linear programs, we characterize when a dataset is sufficient to recover an optimal decision, given an uncertainty set on the cost vector.
Our main contribution is a sharp geometric characterization that identifies the directions of the cost vector that matter for optimality, relative to the task constraints and uncertainty set.
We further develop a practical algorithm that, for a given task, constructs a minimal or least-costly sufficient dataset. Our results reveal that small, well-chosen datasets can often fully determine optimal decisions -- offering a principled foundation for task-aware data selection."
No comments:
Post a Comment