An introduction to modern bayesian econometrics download




















Neyman, and E. Howie provides a concise summary of the development of probability and statistics up to the s and then focuses on the debate between H. Jeffreys, who took the Bayesian position, and R. Fisher, who argued against it. The application of the Bayesian viewpoint to econometric models was pioneered by A.

Zellner starting in the early s. His early work is summarized in his highly influential book, Zellner , and he continues to contribute to the literature. Introduction statistical and econometric models. This is an active area of research by statisticians, econometricians, and probabilists. Several other recent textbooks cover Bayesian econometrics: Poirier , Koop , Lancaster , and Geweke The book by Poirier, unlike the present book and the others mentioned earlier, compares and contrasts Bayesian methods with other approaches to statistics and econometrics in great detail.

The present book focuses on Bayesian methods with only occasional comments on the frequentist approach. Two textbooks that emphasize the frequentist viewpoint — Mittelhammer et al.

Several statistics books take a Bayesian viewpoint. Berry is an excellent introduction to Bayesian ideas. His discussion of differences between observational and experimental data is highly recommended. Another fine introductory book is Bolstad Excellent intermediate level books with many examples are Carlin and Louis and Gelman et al. Although directed at a general statistical audience, three books by Congdon , , cover many common econometric models and utilize Markov chain Monte Carlo methods extensively.

Schervish covers both Bayesian and frequentist ideas at an advanced level. We therefore begin by stating the basic axioms of probability and explaining the two views. A probability is a number assigned to statements or events. The complement of A is the event that A does not occur; it is denoted by Ac. The probability of event A is denoted by P A. Probabilities are assumed to satisfy the following axioms: Probability Axioms 1.

Basic Concepts of Probability and Inference 4. P B All the theorems of probability theory can be deduced from these axioms, and probabilities that are assigned to statements will be consistent if these rules are observed.

By consistent we mean that it is not possible to assign two or more different values to the probability of a particular event if probabilities are assigned by following these rules. Assigning some probabilities may put bounds on others.

Consider A1 : we can imagine repeating the experiment of tossing a coin three times and recording the number of times that two or three heads were reported. Axiom 1 is satisfied because the ratio of a subset of outcomes to all possible outcomes is between zero and one.

But to those who believe in a subjective interpretation of probability, an even greater problem is its inability to assign probabilities to such statements as A3 , which cannot be considered an outcome of a repeated experiment. We next consider the subjective view. Such assignments would lead to inconsistencies. In particular, when the odds are fair, you will not find yourself in the position that you will lose money no matter which outcome obtains.

We now show that coherent behavior implies that probabilities satisfy the axioms. First, let us review the standard betting setup: in a standard bet on the event A, you buy or sell betting tickets at a price of 1 per ticket, and the money you receive or pay out depends on the betting odds k. We omit the currency unit in this discussion. In this setup, the price of the ticket is fixed and the payout depends on the odds. In the de Finetti betting setup, the price of the ticket, denoted by p, is chosen by you, the payout is fixed at 1, and your opponent chooses S.

Although you set p, the fact that your opponent determines whether you bet for or against A forces you to set a fair value. We can now show the connection between p and P A. Accordingly, in the following discussion, you can interpret p as your subjective belief about the value of P A. Consider a simple bet on or against A, where you have set the price of a ticket at p and you are holding S tickets for which you have paid pS; your opponent has chosen S.

If A occurs, you pay pS and collect S. If A does not occur, you collect pS. Verify that these results are valid for both positive and negative values of S. Axiom 1 is therefore implied by the principle of coherency. This verifies Axiom 2. A positive negative S1 again means that you are betting on against A1 , and the same for S2 and S3. Your opponent can now set each of W1 , W2 , and W3 to negative values and solve for the values of S1 , S2 , and S3 that create those losses for you.

But you can prevent this by choosing p1 , p2 , and p3 in such a way that the equations cannot be solved. For this case, we assume that the bet on A B is cancelled if B fails to occur. The point of this discussion is that the assignment of subjective probabilities must follow the standard axioms if a person is to be coherent in the sense of not setting probabilities in a way that is sure to result in losses.

As mentioned above, probability theory is about the consistent setting of probabilities. We now turn to the statistical implications of the subjective view of probability.

In the next chapter, we explain how the posterior distribution can be used to analyze the central issues in inference: point estimates, interval estimates, prediction, and model comparisons. To understand the implications for statistical inference of adopting a subjective view of probability, it is useful to consider a simple example.

It is not given a probability distribution of its own, because it is not regarded as being the outcome of a repeated experiment. Since there is uncertainty over its value, it can be regarded as a random variable and assigned a probability distribution. All the models we consider in this book have one or more parameters, and an important goal of statistical inference is learning about their values.

When there is more than one parameter, the posterior distribution is a joint distribution of all the parameters, conditioned on the observed data. This complication is taken up in the next chapter. Before proceeding, we explain some conventions about notation for distributions. For continuous or general y, we rewrite 2. Equation 2. It is necessary to understand it thoroughly. Now consider the right-hand side. Take the coin-tossing experiment as an example. It is important to note that the likelihood function is not a p.

The second term in the numerator of 2. The prior distribution usually depends on parameters, called hyperparameters, which may either be supplied by the researcher or given probability distributions. We have already remarked that the denominator of 2. It is useful to think of 2. We illustrate these ideas with the coin-tossing example. For n independent tosses of a coin, we therefore have p y1 ,.

Why choose the beta distribution? First, it is defined in the relevant range. Second, it is capable of producing a wide variety of shapes, some of which are displayed in Figure 2. These relationships may be found in A. A third reason for choosing this distribution is that the beta prior in combination with the likelihood function of 2. This is an example of a conjugate prior, where the posterior distribution is in the same family as the prior distribution. From 2.

This result shows how the prior distribution and the data contribute to determine the mean of the posterior distribution. We graph in Figure 2. The likelihood has been normalized to integrate to one for easier comparison with the prior and posterior. Prior, likelihood, and posterior for coin-tossing example.

Beta priors, for example, do not easily accommodate bimodal distributions. We describe methods later in the book that can approximate the posterior distribution for any specified prior, even if the prior information does not lead to a posterior distribution of a standard form.

Having established that subjective probabilities must satisfy the usual axioms of probability theory and, therefore, the theorems of probability theory, we derived the fundamental result of Bayesian inference: the posterior distribution of a parameter is proportional to the likelihood function times the prior distribution. Second, apply coherency to a betting scheme like those in Section 2. It continues with an explanation of how a Bayesian statistician uses the posterior distribution to conduct statistical inference, which is concerned with learning about parameter values either in the form of point or interval estimates, making predictions, and comparing alternative models.

We continue by generalizing the concept to include models with more than one parameter and go on to discuss the revision of posterior distributions as more data become available, the role of the sample size, and the concept of identification.

The latter is somewhat controversial and is discussed in Chapter 4, but the choice of a likelihood function is also an important matter and requires discussion. A central issue is that the Bayesian must specify an explicit likelihood function to derive the posterior distribution. In some cases, the choice of a likelihood function appears straightforward. In the coin-tossing experiment of Section 2. These assumptions might be considered prior information, but they are conventionally a part of the likelihood function rather than of the prior distribution.

The normal linear regression model, discussed in detail later, is a good example. Jaynes offers arguments for adopting the normal distribution when little is known about the distribution. He takes the position that it is a very weak assumption in the sense that it maximizes the uncertainty of the distribution of yi , where uncertainty is measured by entropy.

For example, a Student-t distribution with small degrees of freedom puts much more probability in the tail areas than does a normal distribution with the same mean and variance, and this feature may be reflected in the posterior distribution. Since for large degrees of freedom, there is little difference between the normal and t distributions, a possible way to proceed is to perform the analysis with several degrees of freedom and choose between them on the basis of posterior odds ratios see Section 3.

In addition, distributions more general than the normal and t may be specified; see Section 8. Distributional assumptions also play a role in the frequentist approach to sta- tistical inference. A commonly used estimator in the frequentist literature is the MLE, which requires a specific distribution. Accordingly, a frequentist statistician who employs that method must, like a Bayesian, specify a distribution.

Of course, the latter is also required to specify a prior distribution. Other approaches used by frequentist econometricians, such as the generalized method of moments, do not require an explicit distribution. But, since the finite-sample properties of such methods are rarely known, their justification usually depends on a large-sample property such as consistency, which is invoked even with small samples. Although this type of analysis is more general than is specifying a particular distribution, the assumptions required to derive large-sample properties are often very technical and difficult to interpret.

The limiting distribution may also be a poor approximation to the exact distribution. In contrast, the Bayesian approach is more transparent because a distributional assumption is explicitly made, and Bayesian analysis does not require large-sample approximations. From the joint distributions, we may derive marginal and conditional distri- butions according to the usual rules of probability.

It is important to recognize that the marginal posterior distribution is different from the conditional posterior distribution.

In most appli- cations, the marginal distribution of a parameter is more useful than its conditional distribution because the marginal takes into account the uncertainty over the values of the remaining parameters, while the conditional sets them at particular values.

This may be found as above by integrating out the remaining parameters. While the marginal posterior distributions for any number of parameters can be defined, attention is usually focused on one- or two-dimensional distributions because these can be readily graphed and understood. Joint distributions in higher dimensions are usually difficult to summarize and comprehend.

Although it is easy to write down the definition of the marginal posterior distri- bution, performing the necessary integration to obtain it may be difficult, especially if the integral is not of a standard form.

Parts II and III of this book are concerned with the methods of approximating such nonstandard integrals, but we now discuss an example in which the integral can be computed analytically.

In this model, each trial, assumed independent of the other trials, results in one of d outcomes, labeled 1, 2,. When the experiment is repeated n times and outcome i arises yi times, the likelihood function is y y y p y1 ,.

The next step is to specify a prior distribution. To keep the calculations manage- able, we specify a conjugate distribution that generalizes the beta distribution em- ployed for the Bernoulli model. It is the Dirichlet distribution see Section A. From the result given in Section A. To summarize, when dealing with a model that contains more than one pa- rameter, simply redefine the parameter as a vector.

Then, all the definitions and concepts discussed in Section 2. In addition, the marginal and conditional distributions of individual parameters or groups of parameters can be found by applying the usual rules of probability. Whether or not the data sets are independent, however, note that 3. Thus, as new information is acquired, the posterior distribution becomes the prior for the next experiment. In this way, the Bayesian updates the prior distribution to reflect new information.

It is important to emphasize that this updating is a consequence of probability theory and requires no new principles or ad hoc reasoning. As a simple example of updating, consider data generated from the Bernoulli example.

To summarize, when data are generated sequentially, the Bayesian paradigm shows that the posterior distribution for the parameter based on new evidence is proportional to the likelihood for the new data, given previous data and the parameter, times the posterior distribution for the parameter, given the earlier data.

This is an intuitively reasonable way of allowing new information to influence beliefs about a parameter, and it appears as a consequence of standard probability theory. Posterior Distributions and Inference This analysis offers important insights about the nature of the posterior distribution, particularly about the relative contributions of the prior distribution and likelihood function in determining the posterior distribution.

We can now examine the effect of the sample size n on the posterior distribution, which is proportional to the product of the prior distribution and a term that involves an exponential raised to n times a number. Accordingly, we can expect that the prior distribution will play a relatively smaller role than do the data, as reflected in the likelihood function, when the sample size is large. Conversely, the prior distribution has relatively greater weight when n is small.

As an example of this phenomenon, recall the coin-tossing example of Section 2. This idea can be taken one step further. This property is similar to the criterion of consistency in the frequentist literature and extends to the multiparameter case.

Finally, we can use these ideas to say something about the form of the posterior distribution for large n. Posterior Distributions and Inference 3. Our starting point is the likelihood function, which is also used by frequentist statisticians to discuss the concept.

In that case, the two models are said to be observationally equivalent. The model or the parameters of the model are not identified or unidentified when two or more models are observationally equivalent. The model is identified or the parameters are identified if no model is observationally equivalent to the model of interest.

A familiar example of this situation and how to deal with it is the specification of a linear regression model with a dummy or indicator variable. It is well known that a complete set of dummy variables cannot be included in a model along with a constant, because the set of dummies and the constant are perfectly correlated; this is a symptom of the nonidentifiability of the constant and the coefficients of a complete set of dummies.

The problem is solved by dropping either one of the dummies or the constant. This last result is the main point of our discussion of identification: since the data are only indirectly informative about unidentified parameters — any difference between their prior and posterior distributions is due to the nature of the prior distribution — inferences about such parameters may be less convincing than are inferences about identified parameters.

A researcher should know whether the parameters included in a model are identified through the data or through the prior distribution when presenting and interpreting posterior distributions. There are some situations when it is convenient to include unidentified param- eters in a model. Examples of this practice are presented at several places later in the book, where the lack of identification will be noted.

It is left for an exercise to derive the optimal estimators under the absolute value and bilinear loss functions. Another exercise considers a loss function that yields the mode of the posterior distribution as the optimal estimator. It is enlightening to contrast the Bayesian approach to point estimation with that of a frequentist statistician.

The frequentist stipulates one or more criteria that an estimator should satisfy and then attempts to determine whether a particular estimator satisfies those criteria. For many models, it is impossible to determine whether an estimator is unbiased; in such cases, a large-sample property, such as consistency, is often substituted.

For other models, there is more than one unbiased estimator, and a criterion such as efficiency is added to choose between them. Although both frequentist and Bayesian approaches to point estimation involve an expected value, it is important to recognize that the expectations are taken over different probability distributions.

The Bayesian calculation is taken over the posterior distribution of the parameter, which is conditioned on the observed data y. The coin-tossing example illustrates this difference.

In contrast, Bayesian calculations are based on the posterior distribution, which is conditioned only on data that have been observed. There is another very important difference between the approaches.

In the fre- quentist approach, it is necessary to propose one or more estimators that are then tested to see whether they satisfy the specified criteria. There is no general method of finding candidates for estimators that are sure to satisfy such criteria. In con- trast, the Bayesian approach is mechanical: given a loss function, the problem is to find the estimator that minimizes expected loss.

Under quadratic loss, for example, it is necessary to find the mean of the posterior distribution. While the details of finding the mean may be difficult in some cases, the goal is clear. It is not necessary to devise an estimator for every type of model that might be encountered. Of course, 0. Bayesians call such intervals credibility intervals or Bayesian confidence intervals to distinguish them from a quite different concept that appears in frequentist statistics, the confidence interval.

If more than one pair is possible, the pair that results in the shortest interval may be chosen; such a pair yields the highest posterior density interval h. This procedure is possible because probability statements can be made about the values of a parameter.

In contrast, frequentists define a confidence interval, which does not involve the probability distribution of a parameter. As in the case of point estimators, this approach makes use of unobserved data. This calculation involves sample means that are not observed. The Bayesian approach, based on the posterior distribution, con- ditions on the observed data points and does not make use of data that are not observed.

To fix ideas, consider the coin-tossing example. Notice carefully what we have done. The general case has the same form. This situation arises in some models for time series. Consider prediction in the coin-tossing example. Since we found in 2. Two models may differ in their priors, their likelihoods, or their parameters.

This difference also implies different priors and different likelihood functions. From the definition of the posterior distribution in 2.

Jeffreys Guidelines. Note that f y of 3. A large value of R12 is evidence that M1 is better supported than is M2 by the data and the prior information, and a small value is evidence that M2 is better supported; values around 1 indicate that both models are supported equally well.

Such pairwise comparisons can also be made when there are more than two models. It is convenient to present log10 R12 rather than R12 because the ratio is often very large or very small, and the logarithm to base 10 is immediately interpreted as powers of Table 3. If you are reluctant to specify the prior odds ratio, the burden falls on the Bayes factor to discriminate between models. This indicates that the results favor M1 unless you think M1 to be very improbable a priori compared to M2.

Model choice can be implemented in terms of loss functions for making correct and incorrect choices, but, in practice, models are often informally compared by their Bayes factors or their posterior odds ratios. One possible outcome of such comparisons is that one or more models are effectively eliminated from consideration because other models have much greater support on the basis of these criteria.

Another possibility is that several models that are not eliminated have pairwise Bayes factors or posterior odds ratios close to one or zero on the log scale. In this case, it would be reasonable to conclude that two or more models are consistent with the data and prior information and that a choice between them must be delayed until further information becomes available.

When a prediction is to be made and more than one model is being considered, the technique of model averaging can be applied. The frequentist approach to model comparison makes use of hypothesis tests. In this approach, the null hypothesis H0 is rejected in favor of the alternative hypothesis HA if the value of a statistic computed from the data falls in the critical region. The critical region is usually specified to set the probability that H0 is rejected when it is true at a small value, where the probability is computed over the distribution of the statistic.

As mentioned before, this calculation depends on values of the statistic that were not observed. An important advantage of the Bayesian approach to model comparison over the frequentist approaches is that the former can easily deal with nonnested hypotheses, especially with models that deal with different representations of the response variable. A common example is the choice between y and log y as the response variable.

Note also that this result generalizes to multivariate y and z, where the absolute value of the derivative is replaced by the absolute value of the Jacobian of the transformation. Exponentiate 3. The first term in the first square bracket is the logarithm of the likelihood ratio. It will tend to become large if M1 is the true model and small if M2 is true.

The second term shows that the log Bayes factor penalizes models with larger numbers of parameters, where the penalty is log n times the difference in the number of parameters divided by two.

We return to the coin-tossing example to illustrate the use of Bayes factors for model comparison. To specify two competing models, consider the following vari- ation on our basic experiment. A coin is tossed m times by Michaela and then tossed m times by Lila. Suppose we believe it possible that the different ways in which the girls toss the coin result in different probabilities.

We also consider a model in which there is no difference in the probabilities. Posterior Distributions and Inference Table 3. Note that the Bayes factor in favor of M1 increases as the proportion of heads for both girls approaches 0. In particular, we considered models with more than one parameter, updating posterior distribu- tions as additional data become available, how the posterior distribution behaves as sample size increases, and the concept of identification.

We then explained how posterior distributions can be used to find point and interval estimates, make predictions, and compare the credibility of alternative models.

Zellner has proposed the Bayesian Method of Moments when there are difficulties in for- mulating a likelihood function. See Zellner for further discussion and references. Section 3. Plot the resulting distribution and compare the results. Comment on the effect of having a larger sample. Posterior Distributions and Inference where 1 A is the indicator function that equals 1 if A is true and 0 otherwise.

Sam types the first m pages of a manuscript and makes e1 errors in total, and Levi types the last m pages and makes e2 errors. On the one hand, the prior distribution allows the researcher to include in a systematic way any information he or she has about the parameters being studied. This chapter puts forth, in general terms, some ideas on how to specify prior distributions.

The topic is revisited in connection with specific models in Part III. The normal linear regression model, described next, is the primary example for the topics in this chapter. We consider it here because of its wide applicability and because it is a relatively easy model with which to illustrate the specification of hyperparameters. Under the further assumption of joint normality of ui , xi , the previous assumption implies that each xik is independent of ui.

Such covariates are said to be exogenous. We discuss in Chapter 11 how to proceed when the assumption of independence is untenable. In writing the likelihood function, we invoke the additional assumption that the probability distributions of the covariates do not depend on any of the parameters in the equation for yi.

This assumption is relaxed when the covariates include lagged values of yi , as in the time series models of Section The remainder of this chapter is devoted to methods for doing this, but we first derive the likelihood function for this model.

Here and in the following, we follow the convention of usually not explicitly including the covariates X in the conditioning set of the posterior distribution. Many such specifications imply improper prior distributions, which are distributions that are not integrable; that is, their integral is infinite. In contrast, we assume that the researcher has sufficient knowledge to specify a proper prior, one that integrates to unity, even if it is highly dispersed.

This prior is improper: its integral is unbounded, and it cannot be normalized to one. Another example is the beta distribution prior discussed in connection with the coin-tossing example of Section 2.

In our view, a researcher should be able to provide enough information to specify a proper prior. In the regression model, for example, it is hard to believe that a researcher is so ignorant about a phenomenon that the probability of a regression coefficient falling in any interval of equal length from minus to plus infinity is equal. In addition, a number of methods to aid in the elicitation of prior probabilities from experts in the subject matter of the inquiry have been developed; see the references in Section 4.

The ability to specify proper prior distributions is crucial for the use of Bayes factors and posterior odds ratios for comparing models. When the prior is proper, the value of the marginal likelihood is well defined. Accordingly, we assume proper priors. For the model we are studying, where the likelihood function has the form of 4. How to specify values for these parameters is discussed later; as of now, we concentrate on the mechanics of showing this is a conjugate prior and on some properties of the posterior distribution.

We present a prior that does not have this property in Section 4. This shows that the prior specified in 4. Indeed, this is the last model considered in the book where this is possible. Since this is a known form for which the normalizing constant is known, we can find its moments, derive interval estimates, and plot it.

Discussion of further properties of this model and its marginal likelihood is pursued in the exercises. Before continuing, we note that many authors work with a different, but equiva- lent, parameterization of the model.

The concept of precision extends to covariance matrices in the case of multivariate random vec- tors. Both precisions and variances are used in later chapters of this book. Since information is not likely to be available about prior covariances, researchers will often take those to be zero and restrict attention to diagonal B0.

We continue by discussing typical applications to illustrate these ideas. A common use of linear regression is to investigate the effects of prices and income on demand or supply for some product.

In this case, a reparameterization of the model is likely to facilitate the assignment of a prior. This parameterization frees us from concern about units of measurement of y, currency units, and absolute price levels; moreover, such elasticities have been estimated in many studies for various types of data sets. Furthermore, the Gaussian assumption on the additive errors is consistent with both positive and negative values for the dependent variable log price , but is inconsistent with price, which must be nonnegative, as the dependent variable.

Given a mean, the properties of the normal distribution can be used. Prior Distributions standard deviations includes Similarly, an assumption about the income elasticity might start with a prior mean of 1.

In our bread demand example, on the basis of prior information, we might believe that the quantity of bread consumed per week, controlling for household size and other variables, does not exhibit great variation. From A. These calculations are rough — we have taken the mean of the logarithm to be the logarithm of the mean — but, in many cases, it suffices to get orders of magnitude right. This example illustrates how knowledge of the subject matter can be used in a family of conjugate prior distributions to specify hyperparameters for the prior.

In most research areas, there are likely to be previous studies that can be used to shed light on likely values for the means and variances of the regression coefficients and the variance.

Since some element of subjectivity is inevitably involved, the sensitivity of results to different prior specifications, as discussed later, should be included in any empirical research.

As an empirical example, we consider the effect of union membership on wages. The data are derived from Vella and Verbeek , but we work with a highly simplified version of their ambitious and sophisticated model.

The data are taken from the Youth sample of the National Longitudinal Survey and consist of observations on young men. The response variable y is the logarithm of hourly wages. The log transformation is made to allow us to think in terms of proportional, rather than absolute, effects, and the transformed variable is consistent with the assumption of additive Gaussian errors. The covariate of interest is a dummy indicator variable for union membership.

Important specification issues taken up in the article are neglected here to present a simple example and to focus on the main point of this discussion. With 31 covariates other than the intercept, we expect the variance of u to be considerably smaller than the variance of y, say about 0. Accordingly, from A. For the remaining regression coefficients, we assume a mean of 0 and values in B0 of 1.

This assumption takes a neutral stance about the sign of the coefficient and allows each to have a fairly small impact. This specification of the prior illustrates that choosing hyperparameter values in the context of a particular application can be done without appealing to devices that attempt to capture completely uninformative priors.

In many, if not most applications, there is relevant information. A specialist in labor economics should be able to assign more appropriate values than we have. We consider later the sensitivity of the results to the prior specification. The independent assignment of the two variances is considered in Section 4. The posterior mean is 0. Prior and posterior distributions for coefficient of union membership. This approach is often computationally convenient and is widely applied in Part III.

Another approach to specifying prior distributions takes advantage of a type of symmetry that appears in some models. That symmetry, called exchangeability, generalizes the property of statistical independence when applied to observable random variables, as we first explain. We then show how the idea may be applied in specifying a prior for unknown parameters. The formal definition of exchangeability, a concept proposed by de Finetti, is in terms of the joint distribution of a set of random variables zi : the random variables z1 , z2 ,.

Exchangeability generalizes the concept of independence: identically distributed and mutually independent random variables are exchangeable, but exchangeability does not imply independence. Loosely speaking, a set of random variables is exchangeable if nothing other than the observations themselves distinguishes any of the zi s from any of the others.

If the only information we have is that three tosses resulted in two heads, then exchangeability requires that we assign the same probability to each of these three outcomes. As an example of exchangeability applied to prior distributions, consider the problem of heteroskedasticity in the linear regression model that arises when the assumption that Var ui is the same for all i is untenable.

Prior Distributions implies different expected values for each observation; that is, knowing the value of i gives us covariate values xi that provide information about the mean level of yi. While the heteroskedastic regression model is an example of specifying an exchangeable prior, it is of interest in its own right as an extension of the linear model.

But the model has an interesting property that is exploited in Section 8. Other levels may be added, but this is rarely done. Before turning to an example, we make a few comments. As an example, consider the heteroskedastic linear model of Section 4. We return to this model in Section 4.

The idea is to take advantage of the Bayesian updating discussed in Section 3. A portion of the sample is selected as the training sample. It is combined with a relatively uninformative prior to yield a first-stage posterior distribution. In turn, this the prior for the remainder of the sample. By a rela- tively uninformative prior, we mean a prior with a large variance and a mean of zero.

Prior Distributions Table 4. We illustrate a sensitivity check with the Vella— Verbeek union data discussed in Section 4. Table 4. The table shows some sensitivity around our benchmark result of 0. When results seem rather sensitive to the prior mean, the researcher should attempt to justify the choice for this value by referring to the relevant literature. Another possibility of refining this choice might be to take a training sample approach. The Vella—Verbeek data set contains information on the same young men for 8 years.

One possibility might be to take an earlier year as a training sample. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are as essential for the working of basic functionalities of the website.

We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience. Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website.

These cookies do not store any personal information. Search for:. Skip to content. This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.

Privacy Overview This website uses cookies to improve your experience while you navigate through the website. Privacy Overview. Necessary Always Enabled. Check out the top books of the year on our page Best Books of It works through the implications for econometric practice using practical examples and accessible computer software.

I have used it as such on several occasions with a teaching style that emphasizes calculations; the practicality of Bayesian methods; and demonstrates sampling algorithms including use of markov chain monte carlo procedures in class and requires students to solve problems numerically. Graduate students in economics will find it highly accessible.

Visit our Beautiful Books page and find lovely books for lanacster, photography lovers and more. The mathematics used in the book rarely extends beyond introductory calculus and the rudiments of matrix algebra and I have tried to limit even this to situations where mathematical analysis clearly seems to give additional insight into a problem. This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are as essential for the working of basic functionalities of the website.

We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information. PDF Rakhi. All rights reserved. This website uses cookies to improve your experience.



0コメント

  • 1000 / 1000