Statistical models in econometrics, biology, and most other areas, are not expected to be correct, and often are not very accurate. The choice of a model for the analysis of data depends on the purpose of the analysis, the relation between the data and the model, and also on the sample or data size. Combining ideas from Erev, Roth, Slonim, and Barron (2007) and the well-known AIC criterion and cross-validation, we propose a variant of model selection approach as a function of the models and the data size, with quantification of the chosen model's relative value. Our research is motivated by data from experimental economics, and we also give a simple biological example.
Yonatan Loewenstein Gianluigi Mongillo, Hanan Shteingart. 2014. “Misbehavior of Reinforcement Learning, The”. Publisher's Version Abstract
Organisms modify their behavior in response to its consequences, a phenomenon referred to as operant learning. The computational principles and neural mechanisms underlying operant learning are a subject of extensive experimental and theoretical investigations. Theoretical approaches largely rely on concepts and algorithms from Reinforcement Learning. The dominant view is that organisms maintain a value function, that is a set of estimates of the cumulative future rewards associated with the different behavioral options. These values are then used to select actions. Learning in this framework results from the update of these values depending on experience of the consequences of past actions. An alternative view questions the applicability of such a computational scheme to many real-life situations. Instead, it posits that organisms exploit the intrinsic variability in their action selection mechanism(s) to modify their behavior, e.g., via stochastic gradient ascent, without the need of an explicit representation of values. In this review, we compare these two approaches in terms of their computational power and flexibility, their putative neural correlates and, finally, in terms of their ability to account for behavior as observed in repeated-choice experiments. We discuss the successes and failures of these alternative approaches in explaining the observed patterns of choice behavior. We conclude by identifying some of the important challenges to a comprehensive theory of operant learning.
Many disputes involve conflicts of rights. A common view is that rights cannot really be in conflict so one of those being claimed must be a mistake. This idea leads to extreme outcomes that cut some parties out. Many studies have investigated how to choose a compromise among rights but they have focus on situations where the incompatibility comes from the degrees of the claims, as when, for example, a deceased person promised his heirs more than his total estate. I analyze a Talmudic problem where the difficulty is the pattern of the rights - each one trumps another in a cycle. The theory of non-transferable utility coalitional games suggests two solutions, one based on Shapley's and Maschler-Owen's values, which are equivalent for the problem, and the other on Harsanyi's and Kalai-Samet's, also equivalent. Each satisfies four out of five desirable properties, better than several other solutions. The NTU games are appropriate not just for power-based negotiation but for disputes over justice, fairness and rights. It is hoped that this analysis will form part of a general understanding of rights conflicts.
Can noncooperative behaviour of merchants lead to a market allocation that prima facie seems anticompetitive? We introduce a model in which service providers aim at optimizing the number of customers who use their services, while customers aim at choosing service providers with minimal customer load. Each service provider chooses between a variety of levels of service, and as long as it does not lose customers, aims at minimizing its level of service; the minimum level of service required to satisfy a customer varies across customers. We consider a two-stage competition, in the first stage of which the service providers select their levels of service, and in the second stage –- customers choose between the service providers. (We show via a novel construction that for any choice of strategies for the service providers, a unique distribution of the customers' mass between them emerges from all Nash equilibria among the customers, showing the incentives of service providers in the two-stage game to be well defined.) In the two-stage game, we show that the competition among the service providers possesses a unique Nash equilibrium, which is moreover super strong; we also show that all sequential better-response dynamics of service providers reach this equilibrium, with best-response dynamics doing so surprisingly fast. If service providers choose their levels of service according to this equilibrium, then the unique Nash equilibrium among customers in the second phase is essentially an allocation (i.e. split) of the market between the service providers, based on the customers' minimum acceptable quality of service; moreover, each service provider's chosen level of service is the lowest acceptable by the entirety of the market share allocated to it. Our results show that this seemingly-cooperative allocation of the market arises as the unique and highly-robust outcome of noncooperative (i.e. free from any form of collusion), even myopic, service-provider behaviour. The results of this paper are applicable to a variety of scenarios, such as the competition among ISPs, and shed a surprising light on aspects of location theory, such as the formation and structure of a city's central business district.
Correlation of players' actions may evolve in the common course of the play of a repeated game with perfect monitoring ("obline correlation). In this paper we study the concealment of such correlation from a boundedly rational player. We show that "strong players, i.e., players whose strategic complexity is less stringently bounded, can orchestrate the obline correlation of the actions of "weak players, where this correlation is concealed from an opponent of "intermediate strength. The feasibility of such "ol concealed correlation is reflected in the individually rational payoff of the opponent and in the equilibrium payoffs of the repeated game. This result enables the derivation of a folk theorem that characterizes the set of equilibrium payoffs in a class of repeated games with boundedly rational players and a mechanism designer who sends public signals. The result is illustrated in two models, each of which captures a different aspect of bounded rationality. In the first, players use bounded recall strategies. In the second, players use strategies that are implementable by finite automata.
Yonatan Loewenstein Hanan Shteingart. 2014. “Reinforcement Learning and Human Behavior”. Publisher's Version Abstract
The dominant computational approach to model operant learning and its underlying neural activity is model-free reinforcement learning (RL). However, there is accumulating behavioral and neuronal-related evidence that human (and animal) operant learning is far more multifaceted. Theoretical advances in RL, such as hierarchical and model-based RL extend the explanatory power of RL to account for some of these findings. Nevertheless, some other aspects of human behavior remain inexplicable even in the simplest tasks. Here we review developments and remaining challenges in relating RL models to human operant learning. In particular, we emphasize that learning a model of the world is an essential step prior or in parallel to learning the policy in RL and discuss alternative models that directly learn a policy without an explicit world model in terms of state-action pairs.
The purpose of this work is to offer for any zero-sum game with a unique strictly mixed Nash equilibrium, a measure for the risk when deviating from the Nash equilibrium. We present two approaches regarding the nature of deviations; strategic and erroneous. Accordingly, we define two models. In each model we define risk measures for the row-player (PI) and the column player (PII), and prove that the risks of PI and PII coincide. This result holds for any norm we use for the size of deviations. We develop explicit expressions for the risk measures in the L1 and L2 norms, and compute it for several games. Although the results hold for all norms, we show that only the L1 norm is suitable in our context, as it is the only norm which is consistent in the sense that it gives the same size to potentially equivalent deviations. The risk measures defined here enables testing and evaluating predictions on the behavior of players. For example: Do players deviate more in a game with lower risks than in a game with higher risk?
We consider a memoryless unobservable single-server queue where customers are homogeneous with respect to their reward (due to service completion) and with respect to their cost per unit of time of waiting. Left to themselves, it is well known that in equilibrium they will join the queue at a rate that is higher than it is socially optimal. We show that if customers draw a random preemptive priority parameter prior to deciding whether or not to join, the resulting equilibrium joining rate coincides with the socially optimal one. We also introduce some variations of this regulation scheme and review a few existing schemes from the literature. We suggest a classification of all these schemes, based on a few key properties, and use it to compare our new schemes with the existing ones.
In operant learning, behaviors are reinforced or inhibited in response to the consequences of similar actions taken in the past. However, because in natural environments the 'same  situation never recurs, it is essential for the learner to decide what 'similar  is so that he can generalize from experience in one state of the world to future actions in different states of the world. The computational principles underlying this generalization are poorly understood, in particular because natural environments are typically too complex to study quantitatively. In this paper we study the principles underlying generalization in operant learning of professional basketball players. In particular, we utilize detailed information about the spatial organization of shot locations to study how players adapt their attacking strategy in real time according to recent events in the game. To quantify this learning, we study how a make'miss from one location in the court affects the probabilities of shooting from different locations. We show that generalization is not a spatially-local process, nor is governed by the difficulty of the shot. Rather, to a first approximation, players use a simplified binary representation of the court into 2pt and 3pt zones. This result indicates that rather than using low-level features, generalization is determined by high-level cognitive processes that incorporate the abstract rules of the game.
Noam Nisan Yannai A. Gonczarowski. 2014. “A Stable Marriage Requires Communication”. Publisher's Version Abstract
The Gale-Shapely algorithm for the Stable Marriage Problem is known to take Theta(n^2) steps to find a stable marriage in the worst case, but only Theta(n log n) steps in the average case (with n women and n men). In 1976, Knuth asked whether the worst-case running time can be improved in a model of computation that does not require sequential access to the whole input. A partial negative answer was given by Ng and Hirschberg, who showed that Theta(n^2) queries are required in a model that allows certain natural random-access queries to the participants' preferences. Using a reduction to the communication complexity of the disjointness problem, we prove a significantly more general - albeit slightly weaker - result, showing that Omega(n^2) Boolean queries of any type are required. Our lower bound generalizes to (A) randomized algorithms, (B) even just verifying the stability of a proposed marriage, (C) even allowing arbitrary separate preprocessing of the women's preferences and of the men's preferences, and (D) several variants of the basic problem, such as whether a given pair is married in every/some stable marriage.
We provide Game-theoretic analysis of the arrival process to a multi-serve r system with a limited queue buffer, which admits customers only during a finite time interval. A customer who arrives at a full system is blocked and do es not receive service. Customers can choose their arrival times with the goal of minimizing their probability of being blocked. We characterize the unique symmetric Nash equilibrium arrival distribution and present a method for computing it. This distribution is comprised of an atom at time zero, an interval with no arrivals (a gap), and a continuous distribution until the closing time. We further present a fluid approximation for the equilibrium behaviour when the population is large, where the fluid solution also admits an atom at zero, no gap, and a uniform distribution throughout the arrival interval. In doing so, we provide an approximation model for the equilibrium behaviour that do es not require a numerical solution for a set of differential equations, as is required in the discrete case. For the corresponding problem of social optimization we provide explicit analysis of some special cases and numerical analysis of the general model. An upper bound is established for the price of anarchy (PoA). The PoA is shown to b e not monotone with respect to population size.
A few species within the genus Colchicum of the Colchicaceae family, a small group of species native to the transitional belt of the Mediterranean and the Middle East deserts, are characterized by unique morphological traits: nectarial appendages that occur at the base of the perianth segments and consist of two lamellae with teeth. The morphology of the nectarial appendages was measured in three species and in a new population with similar traits to this group for the first time. Nectarial appendages and nectar standing crop are larger for the inner whorl of perianth segments in all species, although the perianth segments are themselves usually smaller. Intact flowers received more ant visits in outer than in inner whorl perianth nectaries. Removal of the nectarial appendages resulted in an opposite trend, implying that these organs prevent ant access to nectaries. Ant access to flowers reduced nectar standing crop, which could reduce the fitness of the species assuming that ants do not pollinate. The role of nectarial appendages as nectar-theft deterrents is reinforced in light of the group's harsh habitat and flowering season.
Yehuda (John) Levy Ziv Hellman. 2013. “Bayesian Games With a Continuum of States”. Publisher's Version Abstract
Negative results on the the existence of Bayesian equilibria when state spaces have the cardinality of the continuum have been attained in recent years. This has led to the natural question: are there conditions that characterise when Bayesian games over continuum state spaces have measurable Bayesian equilibria? We answer this in the affirmative. Assuming that each type has finite or countable support, measurable Bayesian equilibria may fail to exist if and only if the underlying common knowledge $sigma$-algebra is non-separable. Furthermore, anomalous examples with continuum state spaces have been presented in the literature in which common priors exist over entire state spaces but not over common knowledge components. There are also spaces over which players can have no disagreement, but when restricting attention to common knowledge components disagreements can exist. We show that when the common knowledge $sigma$-algebra is separable all these anomalies disappear.
Optimal Lobbying is the problem a lobbyist or a campaign manager faces in a full-information voting scenario of a multi-issue referendum when trying to influence the result. The Lobby is faced with a profile that specifies for each voter and each issue whether the voter approves or rejects the issue, and seeks to find the smallest set of voters it must influence to change their vote, for a desired outcome to be obtained. This computational problem also describes problems arising in other scenarios of aggregating complex opinions, such as principal-agents incentives scheme in a complex combinatorial problem, and bribery and manipulation in Truth-Functional Judgement Aggregation. We study the computational complexity of Optimal Lobbying when the issues are aggregated using an anonymous monotone function and the family of desired outcomes is an upward-closed family. We analyze this problem with regard to two parameters: the minimal number of supporters needed to pass an issue, and the size of the maximal minterm of the desired set. We show that for the extreme values of the parameters, the problem is tractable, and provide algorithms. On the other hand, we prove intractability of the problem for the non-extremal values, which are common values for the parameters.
We add two results to the theory of consistent voting. Let M be the set of all survivors of some feasible elimination procedure. We prove that i) M can be computed in polynomial time for each profile of preferences and ii) M is characterized by anonymity, non- imposition, Maskin monotonicity, and additive blocking.
For a random permutation on 1,2, ¦,n for fixed n, and for MŠ†1,2, ¦,n, we analyse the distribution of the combined length L=L(,M) of all cycles of that contain at least one element of M. We give a simple, explicit formula for the probability of every possible value for L (backed by three proofs of distinct flavours), as well as closed-form formulae for its expectation and variance, showing that less than 1/(|M|+1) of the elements 1, ¦,n are expected to be contained in cycles of that are disjoint from M, with low probability for a large deviation from this fraction. We furthermore give a simple explicit formula for all rising-factorial moments of L. These results are applicable to the study of manipulation in matching markets.
We consider a fast evolutionary dynamic process on finite stopping games, where each player at each node has at most one move to continue the game. A state is evolutionarily stable if its long-run relative frequency of occurrence is bounded away from zero as the mutation rate decreases to zero. The fast dynamic process allows each individual in each population to change its strategy at every stage. We define a robustness index of backward induction and show examples where the backward induction equilibrium component is not evolutionarily stable for large populations. We show some sufficient conditions for evolutionary stability, which are different from the ones for the conventional evolutionary model. Even for this fast dynamic process, the transition between any two Nash equilibrium components may take very long time.
We consider a basic dynamic evolutionary model with rare mutation and a best-reply (or better-reply) selection mechanism. A state is evolutionarily stable if its long-term relative frequency of occurrence is bounded away from zero as the mutation rate decreases to zero. We prove that, for all finite extensive-form games of perfect information, only Nash equilibria are evolutionarily stable. We show that, in games where a player may play at more than one node along some path, even when the populations increase to infinity, there may be some evolutionarily stable states which are not part of the backward induction equilibrium component. We give a sufficient condition for evolutionary stability and show how much extra value is needed in the terminal payoffs to make an equilibrium evolutionarily stable.
The usual purpose of negotiations is to explore options and reach an agreement, if possible. We investigated a notable exception to this generalization, where a party negotiates without any intention of reaching an agreement. False negotiation occurs when a party gains more by stalling the negotiations until an external change takes place that improves its position considerably. While false negotiators aim to avoid agreement within the current frame of the negotiations, they also aim to keep the negotiation process alive, since walking away from the negotiation table could endanger their position. We report the results of a study that compared the actions of false and sincere negotiators. The false negotiators used competitive tactics that encumbered the negotiations, yet they concealed their intentions by maintaining a fa\Sade of cooperation. Our theoretical discussion is focused on the balancing act involved in false negotiations and the challenges it poses for actors in social, managerial, and political settings. We conclude our analysis with an example from the realm of international negotiations.
Consider a setting in which agents can each take one of two ordered actions and in which the incentive of any given agent to take the high action is positively reinforced by the number of other agents that take it. Furthermore, assume that we don't know any other details about the game being played. What can we say about the details of the structure of the interaction between actions and incentives when we observe a set or a subset of all possible equilibria? In this paper we study 3 nested classes of games: (a) binary games of strategic complements; (b) games in (a) that admit a network representation: and (c) games in (b) in which the network is complete. Our main results are the following: It has long been established in the literature that the set of pure strategy Nash equilibria of any binary game of strategic complements among a set N of agents can be seen as a lattice on the set of all subsets of N under the partial order defined by the set inclusion relation. If the game happens to be strict in the sense that agents are never indifferent among outcomes (games in (a)), then the resulting lattice of equilibria satisfies a straightforward sparseness condition. (1) We show that, in fact, the games in (a) express all such lattices. (2) We characterize the collection of subsets of N that can be weakly expressed as the set of equilibria of some game of thresholds (games in (b)). (3) We characterize the collection of subsets of N that can be weakly expressed as the set of equilibria of some game of thresholds on the complete graph (games in (c)).