Systematic research of instruction-based conceptual change in Mathematics and Science is characterized by examining the effectiveness of a particular instructional principle in isolation. It is suggested that the field could gain from studying how different instructional principles interact when they are combined. The goal of this research was to systematically study the combined effects of collaborative learning and hypothesis testing on cognitive growth. In a randomized experiment, 496 9th graders solved challenging tasks that required fully developed proportional reasoning. Half of them were given the opportunity to test their solutions. Based on individual pretests, each student was assigned to one of three competency levels (low, medium, high), and randomly assigned to either work alone or with a (low, medium, high) peer. The findings show that the effectiveness of hypothesis testing are conditioned by fine-grained differences in the contingencies between the target student's level of competence, the peer partner's level of competence and the feedback they receive from the objective testing device. Most of the early research on cognitive growth through peer collaboration focused on the question of optimal dyad composition (e.g., Messer, Joiner, Loveridge, Light & Littleton, 1993; Tudge & Winterhoff, 1993). However, results have overall been inconclusive and research has largely been abandoned in favor of process-oriented investigations, such as peer dialogue (e.g., Asterhan & Schwarz, 2007, 2009; Schwarz, Neuman & Biezuner, 2000) or other instructional techniques to elicit cognitive conflict, such as collaborative hypothesis testing. (e.g., Howe, Tolmie, Duchak-Tanner & Rattay, 2000; Howe, Tolmie & Rodgers, 1992).Hypothesis testing tasks require learners to translate their conceptual knowledge into hypotheses and subject these to empirical evaluation. When disconfirmed, it may confront learners with compelling evidence that they should reconsider their prior understanding even when two learners agree on their predictions (e.g., Howe et al, 2000). Vice versa, when a prediction is confirmed, it validates the explanation that led to the prediction. In this paper, we present findings from a new study that examines whether the effects of hypothesis testing techniques depend on dyad compositions. We predict that it is. First of all, hypothesis testing in collaborating dyads may create conflict in W-W dyads (two 'wrong' learners), and settle a social conflict between members W-R dyads (one 'right' and one 'wrong' learner), who each gave different predictions and explanation. The success of hypothesis testing in socio-cognitive conflict tasks, however, hinges on a careful design: only the correct explanation or strategy should lead to a confirmation. If not, the feedback may confirm an individual's naïve, incorrect conception. If designed carefully, this can then lead to quite powerful learning opportunities: For instance, a 'wrong' (W) student that collaborates with a 'right' (R) student will not only be exposed to a higher level of reasoning during the discussion phase, but will also receive empirical confirmation that this reasoning is correct. That is likely to be a quite powerful combination. Students in a Wx-Wx pair on the other hand, would be expected to reach quick agreement without much discussion, but shown wrong in the hypothesis testing phase, forcing them to generate a new, higher-level explanation for these findings all by themselves. Lastly, in Wx-Wy pairs the outcomes are likely to be contingent on the competency level of the particular student: A lower competency W student (W1) is likely to benefit more from interaction with a slightly more competent W student (W2) when there is no hypothesis testing than with it. The reason for this somewhat counterintuitive expectation is that if the W1 student will be convinced by W2's reasoning in the discussion phase, this solution will be proven wrong in the hypothesis testing phase. As a result, W1 students may very well regress back to their prior level of reasoning and W2 students may regress as well. Very few studies have examined whether hypothesis testing techniques are more effective in collaborative or individual conditions. Two studies are particularly relevant to ours and are worth mentioning in further detail: The first is a study reported by Ellis, Klahr & Siegler (1993) that sought to investigate the effects of feedback and collaboration on 5th graders' use of mathematical rules for decimal fractions. Each of the approximately 120 pupils in this study consistently used one of two incorrect mathematical rules that were equally wrong, but qualitatively different. They were assigned to either work alone or in Wx-Wy, Wx-Wx or Wy-Wy pairs. The results demonstrated that children who had the opportunity to collaborate with a partner were more likely to use a correct rule on a posttest than children who worked alone, but only if they were given feedback during the interaction as to whether their answers were correct or not. However, dyadic composition was not found to affect children's understanding on individual tests. Tudge, Winterhoff and Hogan (1996) also investigated the effects of feedback (hypothesis testing) and dyad composition on early elementary school children's problem solving performance on a balance beams task (N = 83). Children in this study either worked alone or with a partner who was equally, less, or more competent and either did or did not receive feedback on the correctness of their predictions. In direct conflict with the findings reported by Ellis et al, the presence of a partner was more effective than working alone only when children did not receive feedback. When children received feedback, working alone was more effective than working with a partner. Similar to the Ellis et al findings, no differences were found between the different types of dyad compositions. The findings from these two studies then lead to quite different predictions: Based on the Tudge et al findings, students may be expected to profit more from hypothesis testing when they work alone, whereas based on the Ellis et al study and findings reported by Howe et al students are expected to benefit particularly from the combination of hypothesis testing and collaboration and hypothesis testing. The main aim of the present study is then to settle the disparate findings with regard to hypothesis testing and dyad composition in collaborative problem solving and address the following caveats in the literature. Moreover, none one of the above-mentioned studies systematically tested the effects of hypothesis testing for the full range of different dyad compositions that specifies the target student's and the partner's competence level. Finally, they did not control for nested effects of the individual within the dyad and reported findings may thus be overestimates. The topic domain that was chosen for this study is proportional reasoning. Research suggests that students experience difficulty with proportional reasoning problems because they over-extend numerical equivalence concepts to proportional equivalence problems (e.g., Mix, Levine, & Huttenlocher, 1999; Tourniaire & Pulos, 1985). Sophisticated tests, such as the Blocks task, have been developed to serve both as instructional interventions as well as assessment tools (e.g., Schwarz & Linchevski, 2007).