Modern web search engines are federated -a user query is sent to the numerous specialized search engines called verticals like web (text documents), News, Image, Video, etc. and the results returned by these engines are then aggregated and composed into a search result page (SERP) and presented to the user. For a specific query, multiple verticals could be relevant, which makes the placement of these vertical results within blocks of textual web results challenging: how do we represent, assess, and compare the relevance of these heterogeneous entities? In this paper we present a machine-learning framework for SERP composition in the presence of multiple relevant verticals. First, instead of using the traditional label generation method of human judgment guidelines and trained judges, we use a randomized online auditioning system that allows us to evaluate triples of the form <query, web block, verticals> We use a pairwise click preference to evaluate whether the web block or the vertical block had a better users' engagement. Next, we use a hinged feature vector that contains features from the web block to create a common reference frame and augment it with features representing the specific vertical judged by the user. A gradient boosted decision tree is then learned from the training data. For the final composition of the SERP, we place a vertical result at a slot if the score is higher than a computed threshold. The thresholds are algorithmically determined to guarantee specific coverage for verticals at each slot. We use correlation of clicks as our offline metric and show that click-preference target has a better correlation than human judgments based models. Furthermore, on online tests for News and Image verticals we show higher user engagement for both head and tail queries.