Wo mit binare optionen handeln erfahrungen31 comments
Binary forex strategy
This is part four of the Multiple Imputation in Stata series. For a list of topics covered by this series, see the Introduction. This section will talk you through the details of the imputation process. Be sure you've read at least the previous section, Creating Imputation Models , so you have a sense of what issues can affect the validity of your results.
To illustrate the process, we'll use a fabricated data set. Unlike those in the examples section, this data set is designed to have some resemblance to real world data. Our goal is to regress wages on sex, race, education level, and experience. To see the "right" answers, open the do file that creates the data set and examine the gen command that defines wage. The imputation process creates a lot of output. We'll put highlights in this page, however, a complete log file including the associated graphs can be found here:.
Each section of this article will have links to the relevant section of the log. Click "back" in your browser to return to this page. The first step in using mi commands is to mi set your data. This is somewhat similar to svyset , tsset , or xtset. The mi set command tells Stata how it should store the additional imputations you'll create. We suggest using the wide format, as it is slightly faster.
On the other hand, mlong uses slightly less memory. However, they are not equivalent and you would never use reshape to change the data structure used by mi. Instead, type mi convert wide or mi convert mlong add ,clear if the data have not been saved since the last change.
Most of the time you don't need to worry about how the imputations are stored: But if you need to manipulate the data in a way mi can't do for you, then you'll need to learn about the details of the structure you're using. You'll also need to be very, very careful. If you're interested in such things including the rarely used flong and flongsep formats run this do file and read the comments it contains while examining the data browser to see what the data look like in each form.
Imputed variables are variables that mi is to impute or has imputed. Regular variables are variables that mi is not to impute, either by choice or because they are not missing any values. Passive variables are variables that are completely determined by other variables. For example, log wage is determined by wage, or an indicator for obesity might be determined by a function of weight and height. Interaction terms are also passive variables, though if you use Stata's interaction syntax you won't have to declare them as such.
Passive variables are often problematic—the examples on transformations , non-linearity , and interactions show how using them inappropriately can lead to biased estimates.
If a passive variable is determined by regular variables, then it can be treated as a regular variable since no imputation is needed. Passive variables only have to be treated as such if they depend on imputed variables. Registering a variable tells Stata what kind of variable it is.
Imputed variables must always be registered:. However, passive variables are more often created after imputing. Do so with mi passive and they'll be registered as passive automatically. In our example data, all the variables except female need to be imputed. The appropriate mi register command is:. Before proceeding to impute we will check each of the imputation models.
Always run each of your imputation models individually, outside the mi impute chained context, to see if they converge and insofar as it is possible verify that they are specified correctly. Note that when categorical variables ordered or not appear as covariates i.
As we'll see later, the output of the mi impute chained command includes the commands for the individual models it runs. Thus a useful shortcut, especially if you have a lot of variables to impute, is to set up your mi impute chained command with the dryrun option to prevent it from doing any actual imputing, run it, and then copy the commands from the output into your do file for testing.
The first thing to note is that all of these models run successfully. Complex models like mlogit may fail to converge if you have large numbers of categorical variables, because that often leads to small cell sizes.
To pin down the cause of the problem, remove most of the variables, make sure the model works with what's left, and then add variables back one at a time or in small groups until it stops working.
With some experimentation you should be able to identify the problem variable or combination of variables. At that point you'll have to decide if you can combine categories or drop variables or make other changes in order to create a workable model. Perfect prediction is another problem to note. The imputation process cannot simply drop the perfectly predicted observations the way logit can. You could drop them before imputing, but that seems to defeat the purpose of multiple imputation.
The alternative is to add the augment or just aug option to the affected methods. This tells mi impute chained to use the "augmented regression" approach, which adds fake observations with very low weights in such a way that they have a negligible effect on the results but prevent perfect prediction. For details see the section "The issue of perfect prediction during imputation of categorical data" in the Stata MI documentation.
You should also try to evaluate whether the models are specified correctly. A full discussion of how to determine whether a regression model is specified correctly or not is well beyond the scope of this article, but use whatever tools you find appropriate. Here are some examples:. For continuous variables, residual vs.
Consider the plot for experience:. Note how a number of points are clustered along a line in the lower left, and no points are below it:. This reflects the constraint that experience cannot be less than zero, which means that the fitted values must always be greater than or equal to the residuals, or alternatively that the residuals must be greater than or equal to the negative of the fitted values.
If the graph had the same scale on both axes, the constraint line would be a 45 degree line. If all the points were below a similar line rather than above it, this would tell you that there was an upper bound on the variable rather than a lower bound. The y-intercept of the constraint line tells you the limit in either case.
You can also have both a lower bound and an upper bound, putting all the points in a band between them. The "obvious" model, regress , is inappropriate for experience because it won't apply this constraint.
It's also inappropriate for wages for the same reason. Alternatives include truncreg, ll 0 and pmm we'll use pmm. Thus one way to check for misspecification is to add interaction terms to the models and see whether they turn out to be important. For example, we'll compare the obvious model:. We'll run similar comparisons for the models of the other variables. This creates a great deal of output, so see the log file for results.
Interactions between female and other variables are significant in the models for exp , wage , edu , and urban. There are a few significant interactions between race or urban and other variables, but not nearly as many and keep in mind that with this many coefficients we'd expect some false positives using a significance level of. We'll thus impute the men and women separately. This is an especially good option for this data set because female is never missing.
If it were, we'd have to drop those observations which are missing female because they could not be placed in one group or the other. In the imputation command this means adding the by female option.
When testing models, it means starting the commands with the by female: The improved imputation models are thus:. Each method specifies the method to be used for imputing the following varlist The possibilities for method are regress , pmm , truncreg , intreg , logit , ologit , mlogit , poisson , and nbreg. N is the number of imputations to be added to the data set. R is the seed to be used for the random number generator—if you do not set this you'll get slightly different imputations each time the command is run.
The tracefile is a dataset in which mi impute chained will store information about the imputation process. We'll use this dataset to check for convergence.
Options that are relevant to a particular method go with the method, inside the parentheses but following a comma e. Options that are relevant to the imputation process as a whole like by female go at the end, after the comma. Note that this does not include a savetrace option. As of this writing, by and savetrace cannot be used at the same time, presumably because it would require one trace file for each by group.
Stata is aware of this problem and we hope this will be changed soon. For purposes of this article, we'll remove the by option when it comes time to illustrate use of the trace file. If this problem comes up in your research, talk to us about work-arounds. There is some disagreement among authorities about how many imputations are sufficient. Some say in almost all circumstances, the Stata documentation suggests at least 20, while White, Royston, and Wood argue that the number of imputations should be roughly equal to the percentage of cases with missing values.
However, we are not aware of any argument that increasing the number of imputations ever causes problems just that the marginal benefit of another imputation asymptotically approaches zero. Increasing the number of imputations in your analysis takes essentially no work on your part. Just change the number in the add option to something bigger. On the other hand, it can be a lot of work for the computer—multiple imputation has introduced many researchers into the world of jobs that take hours or days to run.
You can generally assume that the amount of time required will be proportional to the number of imputations used e. So here's our suggestion:.