Biases such as the preference of a particular response for no obvious reason, are an integral part of psychophysics. Such biases have been reported in the common two-alternative forced choice (2AFC) experiments, where participants are instructed to compare two consecutively presented stimuli. However, the principles underlying these biases are largely unknown and previous studies have typically used ad-hoc explanations to account for them. Here we consider human performance in the 2AFC tone frequency discrimination task, utilizing two standard protocols. In both protocols, each trial contains a reference stimulus. In one (Reference-Lower protocol), the frequency of the reference stimulus is always lower than that of the comparison stimulus whereas in the other (Reference protocol), the frequency of the reference stimulus is either lower or higher than that of the comparison stimulus. We find substantial interval biases. Namely, participants perform better when the reference is in a specific interval. Surprisingly, the biases in the two experiments are opposite: performance is better when the reference is in the first interval in the Reference protocol, but is better when the reference is second in the Reference-Lower protocol. This inconsistency refutes previous accounts of the interval bias, and is resolved when experiments statistics is considered. Viewing perception as incorporation of sensory input with prior knowledge accumulated during the experiment accounts for the seemingly contradictory biases both qualitatively and quantitatively. The success of this account implies that even simple discriminations reflect a combination of sensory limitations, memory limitations, and the ability to utilize stimuli statistics.
We consider an infinite two-player stochastic zero-sum game with a Borel winning set, in which the opponent's actions are monitored via stochastic private signals. We introduce two conditions of the signalling structure: Stochastic Eventual Perfect Monitoring (SEPM) and Weak Stochastic Eventual Perfect Monitoring (WSEPM). When signals are deterministic these two conditions coincide and by a recent result due to [Shmaya (2011)] entail determinacy of the game. We generalize [Shmaya (2011)]'s result and show that in the stochastic learning environment SEPM implies determinacy while WSEPM does not.
We performed controlled experiments of human participants in a continuous sequence of ad auctions, similar to those used by Internet companies. The goal of the research was to understand users' strategies in making bids. We studied the behavior under two auction types: (1) the Generalized Second-Price (GSP) auction and (2) the Vickrey–Clarke–Groves (VCG) payment rule, and manipulated also the participants' knowledge conditions: (1) explicitly given valuations and (2) payoff information from which valuations could be deduced. We found several interesting behaviors, among them are: - No convergence to equilibrium was detected; moreover the frequency with which participants modified their bids increased with time. - We can detect explicit "better-response" behavior rather than just mixed bidding. - While bidders in GSP auctions do strategically shade their bids, they tend to bid higher than theoretically predicted by the standard VCG-like equilibrium of GSP. - Bidders who are not explicitly given their valuations but can only deduce them from their gains behave a little less "precisely" than those with such explicit knowledge, but mostly during an initial learning phase. - VCG and GSP yield approximately the same (high) social welfare, but GSP tends to give higher revenue.
Ester Samuel-Cahn Abba M. Krieger. 2014. “A Generalized Secretary Problem”. Publisher's Version Abstract
{A new Secretary Problem is considered, where for fixed k and m one wins if at some time i = m(j .. 1) + 1 up to jm one selects one of the j best items among the first jm items
Alessandro Acquisti Maya Bar-Hillel, Eyal Peer. 2014. “"Heads or Tails?" - A Reachability Bias in Binary Choice”. Publisher's Version Abstract
When asked to mentally simulate coin tosses, people generate sequences which differ systematically from those generated by fair coins. It has been rarely noted that this divergence is apparent already in the very first mental toss. Analysis of several existing data sets reveals that about 80% of respondents start their sequence with Heads. We attributed this to the linguistic convention describing coin toss outcomes as "Heads or Tails", not vice versa. However, our subsequent experiments found the "first-toss" bias reversible under minor changes in the experimental setup, such as mentioning Tails before Heads in the instructions. We offer a comprehensive account in terms of a novel response bias, which we call reachability. It is more general than the first-toss bias, and reflects the relative ease of reaching one option compared to its alternative in any binary choice context. When faced with a choice between two options (e.g., Heads and Tails, when "tossing" mental coins), whichever of the two is presented first by the choice architecture (hence, is more reachable) will be favored. This bias has far-reaching implications extending well beyond the context of randomness cognition, and in particular to binary surveys (e.g., accept vs. reject) and tests (e.g., True-False). In binary choice, there is an advantage to what presents first. Keywords: acquiescence bias; order effects; randomness cognition; reachability; response bias
Maximizing the revenue from selling two goods (or items) is a notoriously difficult problem, in stark contrast to the single-good case. We show that simple "one-dimensional" mechanisms, such as selling the two goods separately, guarantee at least 73% of the optimal revenue when the valuations of the two goods are independent and identically distributed, and at least 50% when they are independent. However, in the general case where the valuations may be correlated, simple mechanisms cannot guarantee any positive fraction of the optimal revenue. We also introduce a "measure of complexity" for mechanisms–-the menu size–-and show that it is naturally related to the fraction of the optimal revenue that can be guaranteed.
Many competitions require investment of nonrefundable resources, e.g., political campaigns, financial markets, sports or courting rituals. One contestant wins the prize for the invested amount, while all others forfeit their investments without receiving compensation. Frequently, contests are asymmetric, due to differing resources or prize valuations. This could lead weaker contestants to avoid investing, and stronger ones to lower their investment. Two experiments explored the effects of asymmetry between the contestants "arising from their endowments or prizes "on investments. Subjects played both symmetric and asymmetric contests, enabling direct within-subject comparisons. We observed an effect of asymmetry only when it concerned endowments: Subjects invested less when their endowments were asymmetric, whereas (a-)symmetry in the prizes did not influence investments. The changes between consecutive investments can be explained by reactions to the previous outcome (win or loss) in terms of regret over the previous investment being too much or too little.
Statistical models in econometrics, biology, and most other areas, are not expected to be correct, and often are not very accurate. The choice of a model for the analysis of data depends on the purpose of the analysis, the relation between the data and the model, and also on the sample or data size. Combining ideas from Erev, Roth, Slonim, and Barron (2007) and the well-known AIC criterion and cross-validation, we propose a variant of model selection approach as a function of the models and the data size, with quantification of the chosen model's relative value. Our research is motivated by data from experimental economics, and we also give a simple biological example.
Yonatan Loewenstein Gianluigi Mongillo, Hanan Shteingart. 2014. “Misbehavior of Reinforcement Learning, The”. Publisher's Version Abstract
Organisms modify their behavior in response to its consequences, a phenomenon referred to as operant learning. The computational principles and neural mechanisms underlying operant learning are a subject of extensive experimental and theoretical investigations. Theoretical approaches largely rely on concepts and algorithms from Reinforcement Learning. The dominant view is that organisms maintain a value function, that is a set of estimates of the cumulative future rewards associated with the different behavioral options. These values are then used to select actions. Learning in this framework results from the update of these values depending on experience of the consequences of past actions. An alternative view questions the applicability of such a computational scheme to many real-life situations. Instead, it posits that organisms exploit the intrinsic variability in their action selection mechanism(s) to modify their behavior, e.g., via stochastic gradient ascent, without the need of an explicit representation of values. In this review, we compare these two approaches in terms of their computational power and flexibility, their putative neural correlates and, finally, in terms of their ability to account for behavior as observed in repeated-choice experiments. We discuss the successes and failures of these alternative approaches in explaining the observed patterns of choice behavior. We conclude by identifying some of the important challenges to a comprehensive theory of operant learning.
Many disputes involve conflicts of rights. A common view is that rights cannot really be in conflict so one of those being claimed must be a mistake. This idea leads to extreme outcomes that cut some parties out. Many studies have investigated how to choose a compromise among rights but they have focus on situations where the incompatibility comes from the degrees of the claims, as when, for example, a deceased person promised his heirs more than his total estate. I analyze a Talmudic problem where the difficulty is the pattern of the rights - each one trumps another in a cycle. The theory of non-transferable utility coalitional games suggests two solutions, one based on Shapley's and Maschler-Owen's values, which are equivalent for the problem, and the other on Harsanyi's and Kalai-Samet's, also equivalent. Each satisfies four out of five desirable properties, better than several other solutions. The NTU games are appropriate not just for power-based negotiation but for disputes over justice, fairness and rights. It is hoped that this analysis will form part of a general understanding of rights conflicts.
Can noncooperative behaviour of merchants lead to a market allocation that prima facie seems anticompetitive? We introduce a model in which service providers aim at optimizing the number of customers who use their services, while customers aim at choosing service providers with minimal customer load. Each service provider chooses between a variety of levels of service, and as long as it does not lose customers, aims at minimizing its level of service; the minimum level of service required to satisfy a customer varies across customers. We consider a two-stage competition, in the first stage of which the service providers select their levels of service, and in the second stage –- customers choose between the service providers. (We show via a novel construction that for any choice of strategies for the service providers, a unique distribution of the customers' mass between them emerges from all Nash equilibria among the customers, showing the incentives of service providers in the two-stage game to be well defined.) In the two-stage game, we show that the competition among the service providers possesses a unique Nash equilibrium, which is moreover super strong; we also show that all sequential better-response dynamics of service providers reach this equilibrium, with best-response dynamics doing so surprisingly fast. If service providers choose their levels of service according to this equilibrium, then the unique Nash equilibrium among customers in the second phase is essentially an allocation (i.e. split) of the market between the service providers, based on the customers' minimum acceptable quality of service; moreover, each service provider's chosen level of service is the lowest acceptable by the entirety of the market share allocated to it. Our results show that this seemingly-cooperative allocation of the market arises as the unique and highly-robust outcome of noncooperative (i.e. free from any form of collusion), even myopic, service-provider behaviour. The results of this paper are applicable to a variety of scenarios, such as the competition among ISPs, and shed a surprising light on aspects of location theory, such as the formation and structure of a city's central business district.
Correlation of players' actions may evolve in the common course of the play of a repeated game with perfect monitoring ("obline correlation). In this paper we study the concealment of such correlation from a boundedly rational player. We show that "strong players, i.e., players whose strategic complexity is less stringently bounded, can orchestrate the obline correlation of the actions of "weak players, where this correlation is concealed from an opponent of "intermediate strength. The feasibility of such "ol concealed correlation is reflected in the individually rational payoff of the opponent and in the equilibrium payoffs of the repeated game. This result enables the derivation of a folk theorem that characterizes the set of equilibrium payoffs in a class of repeated games with boundedly rational players and a mechanism designer who sends public signals. The result is illustrated in two models, each of which captures a different aspect of bounded rationality. In the first, players use bounded recall strategies. In the second, players use strategies that are implementable by finite automata.
Yonatan Loewenstein Hanan Shteingart. 2014. “Reinforcement Learning and Human Behavior”. Publisher's Version Abstract
The dominant computational approach to model operant learning and its underlying neural activity is model-free reinforcement learning (RL). However, there is accumulating behavioral and neuronal-related evidence that human (and animal) operant learning is far more multifaceted. Theoretical advances in RL, such as hierarchical and model-based RL extend the explanatory power of RL to account for some of these findings. Nevertheless, some other aspects of human behavior remain inexplicable even in the simplest tasks. Here we review developments and remaining challenges in relating RL models to human operant learning. In particular, we emphasize that learning a model of the world is an essential step prior or in parallel to learning the policy in RL and discuss alternative models that directly learn a policy without an explicit world model in terms of state-action pairs.
The purpose of this work is to offer for any zero-sum game with a unique strictly mixed Nash equilibrium, a measure for the risk when deviating from the Nash equilibrium. We present two approaches regarding the nature of deviations; strategic and erroneous. Accordingly, we define two models. In each model we define risk measures for the row-player (PI) and the column player (PII), and prove that the risks of PI and PII coincide. This result holds for any norm we use for the size of deviations. We develop explicit expressions for the risk measures in the L1 and L2 norms, and compute it for several games. Although the results hold for all norms, we show that only the L1 norm is suitable in our context, as it is the only norm which is consistent in the sense that it gives the same size to potentially equivalent deviations. The risk measures defined here enables testing and evaluating predictions on the behavior of players. For example: Do players deviate more in a game with lower risks than in a game with higher risk?
We consider a memoryless unobservable single-server queue where customers are homogeneous with respect to their reward (due to service completion) and with respect to their cost per unit of time of waiting. Left to themselves, it is well known that in equilibrium they will join the queue at a rate that is higher than it is socially optimal. We show that if customers draw a random preemptive priority parameter prior to deciding whether or not to join, the resulting equilibrium joining rate coincides with the socially optimal one. We also introduce some variations of this regulation scheme and review a few existing schemes from the literature. We suggest a classification of all these schemes, based on a few key properties, and use it to compare our new schemes with the existing ones.
In operant learning, behaviors are reinforced or inhibited in response to the consequences of similar actions taken in the past. However, because in natural environments the 'same  situation never recurs, it is essential for the learner to decide what 'similar  is so that he can generalize from experience in one state of the world to future actions in different states of the world. The computational principles underlying this generalization are poorly understood, in particular because natural environments are typically too complex to study quantitatively. In this paper we study the principles underlying generalization in operant learning of professional basketball players. In particular, we utilize detailed information about the spatial organization of shot locations to study how players adapt their attacking strategy in real time according to recent events in the game. To quantify this learning, we study how a make'miss from one location in the court affects the probabilities of shooting from different locations. We show that generalization is not a spatially-local process, nor is governed by the difficulty of the shot. Rather, to a first approximation, players use a simplified binary representation of the court into 2pt and 3pt zones. This result indicates that rather than using low-level features, generalization is determined by high-level cognitive processes that incorporate the abstract rules of the game.
Noam Nisan Yannai A. Gonczarowski. 2014. “A Stable Marriage Requires Communication”. Publisher's Version Abstract
The Gale-Shapely algorithm for the Stable Marriage Problem is known to take Theta(n^2) steps to find a stable marriage in the worst case, but only Theta(n log n) steps in the average case (with n women and n men). In 1976, Knuth asked whether the worst-case running time can be improved in a model of computation that does not require sequential access to the whole input. A partial negative answer was given by Ng and Hirschberg, who showed that Theta(n^2) queries are required in a model that allows certain natural random-access queries to the participants' preferences. Using a reduction to the communication complexity of the disjointness problem, we prove a significantly more general - albeit slightly weaker - result, showing that Omega(n^2) Boolean queries of any type are required. Our lower bound generalizes to (A) randomized algorithms, (B) even just verifying the stability of a proposed marriage, (C) even allowing arbitrary separate preprocessing of the women's preferences and of the men's preferences, and (D) several variants of the basic problem, such as whether a given pair is married in every/some stable marriage.
We provide Game-theoretic analysis of the arrival process to a multi-serve r system with a limited queue buffer, which admits customers only during a finite time interval. A customer who arrives at a full system is blocked and do es not receive service. Customers can choose their arrival times with the goal of minimizing their probability of being blocked. We characterize the unique symmetric Nash equilibrium arrival distribution and present a method for computing it. This distribution is comprised of an atom at time zero, an interval with no arrivals (a gap), and a continuous distribution until the closing time. We further present a fluid approximation for the equilibrium behaviour when the population is large, where the fluid solution also admits an atom at zero, no gap, and a uniform distribution throughout the arrival interval. In doing so, we provide an approximation model for the equilibrium behaviour that do es not require a numerical solution for a set of differential equations, as is required in the discrete case. For the corresponding problem of social optimization we provide explicit analysis of some special cases and numerical analysis of the general model. An upper bound is established for the price of anarchy (PoA). The PoA is shown to b e not monotone with respect to population size.
A few species within the genus Colchicum of the Colchicaceae family, a small group of species native to the transitional belt of the Mediterranean and the Middle East deserts, are characterized by unique morphological traits: nectarial appendages that occur at the base of the perianth segments and consist of two lamellae with teeth. The morphology of the nectarial appendages was measured in three species and in a new population with similar traits to this group for the first time. Nectarial appendages and nectar standing crop are larger for the inner whorl of perianth segments in all species, although the perianth segments are themselves usually smaller. Intact flowers received more ant visits in outer than in inner whorl perianth nectaries. Removal of the nectarial appendages resulted in an opposite trend, implying that these organs prevent ant access to nectaries. Ant access to flowers reduced nectar standing crop, which could reduce the fitness of the species assuming that ants do not pollinate. The role of nectarial appendages as nectar-theft deterrents is reinforced in light of the group's harsh habitat and flowering season.
Yehuda (John) Levy Ziv Hellman. 2013. “Bayesian Games With a Continuum of States”. Publisher's Version Abstract
Negative results on the the existence of Bayesian equilibria when state spaces have the cardinality of the continuum have been attained in recent years. This has led to the natural question: are there conditions that characterise when Bayesian games over continuum state spaces have measurable Bayesian equilibria? We answer this in the affirmative. Assuming that each type has finite or countable support, measurable Bayesian equilibria may fail to exist if and only if the underlying common knowledge $sigma$-algebra is non-separable. Furthermore, anomalous examples with continuum state spaces have been presented in the literature in which common priors exist over entire state spaces but not over common knowledge components. There are also spaces over which players can have no disagreement, but when restricting attention to common knowledge components disagreements can exist. We show that when the common knowledge $sigma$-algebra is separable all these anomalies disappear.