line_up#
- normtest.ryan_joiner.line_up(x_data, cte_alpha='3/8', weighted=False, seed=42, correct=False)[source]#
This function exports the figure with the correlation graphs for the line up method [1].
- Parameters:
- x_datanumpy array
One dimension numpy array with at least
4observations.- cte_alphastr, optional
A str with the cte_alpha value that should be adopted. The options are:
“0”;
“3/8” (default);
“1/2”;
- weightedbool, optional
Whether to estimate the Normal order considering the repeats as its average (True) or not (False, default). Only has an effect if the dataset contains repeated values;
- seedint, optional
A numerical value that generates a new set or repeats pseudo-random numbers. Use a positive integer value to be able to repeat results. Default is
42;- correctbool, optional
Whether the x_data is to be drawn in red (False) or black (True, default);
- Returns:
- figmatplotlib.figure.Figure
A figure with the generated graphics;
Notes
This function is based on the line up method, where
20correlation graphs are generated. One of these graphs contains the graph obtained with the true data (x_data). The other 19 graphs are drawn from pseudo-random data obtained from the Normal distribution with a mean and standard deviation similar to x_data.The objective is to observe the 20 graphs at the same time and discover which graph is the *least similar to the behavior expected for the Normal distribution*.
If the identified graph corresponds to the true data, it can be concluded that the data set is not similar to a Normal distribution (with 95% confidence);
If the identified graph does not correspond to that obtained with real data, it can be concluded that the data set is similar to a Normal distribution (with 95% confidence);
References
[1]BUJA, A. et al. Statistical inference for exploratory data analysis and model diagnostics. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, v. 367, n. 1906, p. 4361–4383, 13 nov. 2009
Examples
The line-up method must be conducted in two steps. The first step involves generating a figure with 20 graphs from the data, without indicating which graph is the true one.
>>> from normtest import ryan_joiner >>> import numpy as np >>> import matplotlib.pyplot as plt >>> x_exp = np.array([5.1, 4.9, 4.7, 4.6, 5, 5.4, 4.6, 5, 4.4, 4.9, 5.4]) >>> fig = ryan_joiner.line_up(x_exp, seed=42, correct=False) >>> fig.tight_layout() >>> # plt.savefig("line_up.png", bbox_inches="tight") >>> plt.show()
The researcher must identify which of the 20 graphs deviates most significantly from what is expected for a Normal distribution. For instance, the graph located in the first row and second column.
The second step involves determining which graph corresponds to the true data set. This can be accomplished by simply changing parameter correct from False to True:
>>> fig = ryan_joiner.line_up(x_exp, seed=42, correct=True) >>> fig.tight_layout() >>> # plt.savefig("line_up_true.png", bbox_inches="tight") >>> plt.show()
click to reveal output
Given that the true data corresponds to the graph in the second row and first column, which was not identified as deviating from the others, we can conclude that the data set follows the Normal distribution, at least approximately.