Increasingly, real world problems require multiple predictions while traditional supervised learning techniques focus on making a single best prediction. For instance in advertisement placement on the web, a list of advertisements is placed on a page with the objective of maximizing click-through rate on that list. In this work, we build an efficient framework for making sets or lists of predictions where the objective is to optimize any utility function which is (monotone) submodular over a list of predictions. Other examples of tasks where multiple predictions are important include: grasp selection in robotic manipulation where the robot arm must evaluate a list of grasps with the aim of finding a sucessful grasp, as early on in the list as possible and trajectory selection for mobile ground robots where given the computational time limits, the task is to select a list of trajectories from a much larger set of feasible trajectories for minimizing expected cost of traversal. In computer vision tasks like frame-to-frame target tracking in video, multiple hypotheses about the target location and pose must be considered by the tracking algorithm. For each of these cases, we optimize for the content and order of the list of predictions. Crucially– and in contrast with existing work on list prediction – our approach to pre- dicting lists is based on very simple reductions of the problem of predicting lists to a series of simple classification/regression tasks. This provides powerful flexibility to use any existing prediction method while ensuring rigorous guarantees on prediction performance. We analyze these meta-algorithms for list prediction in both the online, no-regret and generalization settings. Furthermore we extend the methods to make multiple predictions in structured output domains where even a single prediction is a combinatorial object, e.g. , challenging vision tasks like semantic scene labeling and monocular pose estimation. We conclude with case studies that demonstrate the power and flexibility of these reductions in problems from document summarization, prediction of the pose of humans in images, to predicting the best set of robotic grasps and purely vision based autonomous flight in densely cluttered environments.