Martin Schmidt - Trier University (Germany)

November 17th, 12pm - 1.15pm / F022, Galion Building, Campus Cergy

The minimum sum-of-squares clustering problem: Techniques for computing global solutions and a robust approach to deal with measurement errors

Abstract: The minimum sum-of-squares clustering (MSSC) problem is a very important problem in data mining and (unsupervised) machine learning with very many applications in, e.g., medicine or social sciences. However, it is known to be NP-hard in all relevant cases and to be notoriously hard to be solved to global optimality in practice. Moreover, in many modern applications the clustering suffers from unstructured measurement errors because the MSSC result then represents a clustering of the erroneous measurements instead of the true but unknown underlying data.

In this talk, we discuss both mentioned issues. First, we present different tailored mixed-integer programming techniques to improve the performance of state-of-the-art MINLP solvers when applied to the problem - among them are cutting planes, propagation techniques, branching rules, or primal heuristics. Our numerical study shows that our techniques significantly improve the performance of the open-source MINLP solver SCIP. Second, we tackle the other issue by applying techniques from robust optimization to hedge the clustering result against unstructured errors in the observed data. To this end, we derive strictly and Gamma-robust counterparts. Since the nominal problem is already NP-hard, global approaches are often not feasible in practice. As a remedy, we develop tailored alternating direction methods by decomposing the search space of the robustified problems to quickly obtain feasible points of good quality. Our numerical results reveal an interesting feature: the less conservative Gamma-approach is clearly outperformed by the strictly robust clustering method. In particular, the strictly robustified clustering method is able to recover clusterings of the original data even if only erroneous measurements are observed.

Presentation: https://drive.google.com/file/d/1k2Wp9QDSauMwsaWiYoMPtYmbL8ctFTK/view?usp=share_link

Yanlu ZHOU - ESSEC Business School

September 02nd, 12pm - 1pm / room will be communicated soon - Campus Cergy

Market Thickness in Online Food Delivery Platforms: The Impact of Food Processing Times

Abstract: Online food delivery (OFD) platforms are expanding rapidly worldwide, allowing customers to order food from a wide array of restaurants on their mobile phones. The pace of this expansion is accelerated in part by changes in consumer preferences during the COVID-19 pandemic. We study new algorithms to match drivers with orders on OFD platforms - a core function of many of these platforms. We develop real-time matching algorithms that use the (highly variable) food processing time to `delay' the assignment of drivers to orders to thicken the `market' of orders and drivers. The policy relies on machine learning techniques to predict the food processing time of an order, and the dispatch timings are obtained through a careful trade-off  analysis of the supply-demand dynamics and the variability

in the food processing times. Traditional methods in this  field include greedy (online matching) and batching (matching in periodic intervals) policies. We introduce a new level-k policy to delay the assignment until we have no more than k matching options left, and we show numerically that this policy performs exceedingly well on real data. Numerical experiments based on a real dataset from Meituan show that our policy can reduce total costs by 42% compared to matching policies that are currently in use. To explain the advantage of the level-k policy, we evaluate its performance on a simplified model and derive several analytical properties - the total costs are quasiconvex in the level of market thickness, implying that an intermediate level of thickness is optimal. We also find that the food processing time is key information for the matching decisions. Finally, our  findings indicate that if the unit delay penalty cost is high, the decision-maker of the platform should match conservatively.