Randomness

Interact

In the previous chapters we developed skills needed to make insightful descriptions of data. Data scientists also have to be able to understand randomness. For example, they have to be able to assign individuals to treatment and control groups at random, and then try to say whether any observed differences in the outcomes of the two groups are simply due to the random assignment or genuinely due to the treatment.

In this chapter, we begin our analysis of randomness. To start off, we will use Python to make choices at random. In numpy there is a sub-module called random that contains many functions that involve random selection. One of these functions is called choice. It picks one item at random from an array, and it is equally likely to pick any of the items. The function call is np.random.choice(array), where array is any array from which to make the choice.

Thus the following code evaluates to the string "treatment" with chance 50%, and the string "control" with chance 50%.

two_groups = make_array('treatment', 'control')
np.random.choice(two_groups)
'control'

The big difference between the code above and all the other code we have run thus far is that the code above doesn't always return the same value. It can return either treatment or control, and we don't know ahead of time which one it will pick. With any cell that involves random selection, it is a good idea to run the cell several times to get a sense of the variability in the result.

np.random.choice(two_groups)
'control'
np.random.choice(two_groups)
'treatment'

We can repeat the process in a single function call by providing a second argument, the number of times to repeat the process. In this case, np.random.choice returns an array containing the results of the repetitions.

np.random.choice(two_groups, 10)
array(['treatment', 'control', 'control', 'treatment', 'control',
       'treatment', 'control', 'treatment', 'treatment', 'treatment'], 
      dtype='<U9')

Why randomness?

Randomness has many more uses than simulating random assignments in experiments. Another application is that random choices may be in some sense "fair." To decide who has the advantage of playing first in a game of chess, you might flip a coin.

A more serious application of this is in jury selection. In the United States, a defendant is supposed to have a "jury of their peers," but obviously it is infeasible for every community member to judge every case. Instead, a small group of potential jurors is selected at random from the whole eligible population. We shall see later in what sense this might produce a "representative" panel of jurors for each trial.

More immediately, we can use simulations in Python to evaluate whether potential jurors really are selected at random from the eligible population. That is the topic of the next section.

results matching ""

    No results matching ""