Gender in Comic Books
In this example, we will explore statistics about characters in comic books. A 2014 article on FiveThirtyEight examined the representation of women in comics published by DC and Marvel, the two largest publishers of comic books. The authors collected a dataset about characters from the publishers' Wikia pages and made it publicly available on GitHub. Using Python, we can load the data directly from the web.
This example is meant to illustrate some of the broad themes of this text. You will see code examples like this throughout this text. For now, don't worry if the details of the program don't yet make sense. In fact, important details of this program have been hidden from view. Instead, focus on interpreting the results displayed below the code. Later sections of the text will describe most of the features of the Python programming language used below.
First, we read the data about each publisher. The #
symbol below starts a comment, which is ignored by the computer but helpful for people reading the code. The =
symbols assign a name on the left to the result of some computation described on the right. A uniform resource locator or URL is an address on the Internet for some content; in this case, a table of information about comic book characters. (In Python, a name cannot contain any spaces, and so we will often use an underscore _
to stand in for a space.) So we name the URL for the Marvel data marvel_url
, and we name the loaded dataset itself marvel
.
# Load datasets from FiveThirtyEight about comics:
marvel_url = "https://github.com/fivethirtyeight/data/raw/master/comic-characters/marvel-wikia-data.csv"
dc_url = "https://github.com/fivethirtyeight/data/raw/master/comic-characters/dc-wikia-data.csv"
marvel = load_and_clean_table(marvel_url)
dc = load_and_clean_table(dc_url)
To display one of the tables, we write its name:
marvel
dc
Each table has a record (displayed in one horizontal row) for every character in the dataset from its respective publisher. Each vertical column contains a piece of information about each character. We can see, for example, that Batman has appeared 3,093 times in DC comics since his appearance in May (month 5) 1939.