# Build URL paths to each script
<- "https://www.st-minutiae.com/resources/scripts/"
base_url <- 102:277 # ep 1 & 2 combined, so starts at 102
ep_numbers <- paste0(base_url, ep_numbers, ".txt")
ep_paths
# Preallocate a list to fill with each script
<- vector("list", length(ep_numbers))
scripts names(scripts) <- ep_numbers
# For each URL path, read the script and add to the list
for (ep in seq_along(ep_paths)) {
<- readLines(ep_paths[ep], skipNul = TRUE)
txt <- tools::file_path_sans_ext(basename(ep_paths[ep]))
ep_num <- txt
scripts[[ep_num]] }
Captain’s log
Star date 71750.51. Our mission is to use R statistical software to extract star dates mentioned in the captain’s log from the scripts of Star Trek: The Next Generation and observe their progression over the course of the show’s seven seasons. There appears to be some mismatch in the frequency of digits after the decimal point – could this indicate poor ability to choose random numbers? Or something more sinister? We shall venture deep into uncharted territory for answers…
We’re going to:
- iterate reading in text files – containing ‘Star Trek: The Next Generation’ (ST:TNG) scripts – to R and then extract stardates using the {purrr} and {stringr} packages
- web scrape episode names using the {rvest} package and join them to the stardates data
- tabulate and plot these interactively with {ggplot2}, {plotly} and {DT}
Also, very minor spoiler alert for a couple of ST:TNG episodes.
Lieutenant Commander Data
I’m using the the Star Trek Minutiae website to access all the ST:TNG scripts as text files. You can download the scripts as zipped folder with 176 text files.
Each episode has a dedicated URL where we can read the script from with readLines()
. We can loop over each episode to get a list element per script. This will take a few moments to run.
We can take a look at some example lines from the title page of the first script.
"102"]][17:24] scripts[[
[1] " STAR TREK: THE NEXT GENERATION "
[2] " "
[3] " \"Encounter at Farpoint\" "
[4] " "
[5] " by "
[6] " D.C. Fontana "
[7] " and "
[8] " Gene Roddenberry "
Our first example of a star date is in the Captain’s log voiceover in line 47 of the first script. (The \t
denotes tab space.)
"102"]][46:47] scripts[[
[1] "\t\t\t\t\tPICARD V.O."
[2] "\t\t\tCaptain's log, stardate 42353.7."
Engage!
We want to extract stardate strings from each script in our list. As you can see from Picard’s voiceover above, these are given in the form ‘XXXXX.X’, where each X is a digit.
We can extract these with str_extract_all()
from the {stringr} package, using a regular expression (regex).
Our regex is written date[:space:][[:digit:]\\.[:digit:]]{7}
. This means:
- find a string that starts with the word date and is followed by a space (i.e.
date
) - which is followed by a string that contains digits (
[:digit:]
) with a period (\\.
) inside - with a total length of seven characters (
{7}
)’
This creates a list object with an element for each script that contains all the regex-matched strings.
library(stringr)
# Collapse each script to a single element
<- lapply(scripts, paste, collapse = " ")
scripts_collapsed
# Declare the regex
<- "date[:space:][[:digit:]\\.[:digit:]]{7}"
stardate_regex
# For each script, extract all the stardates
<- lapply(
stardate_extract
scripts_collapsed, function(script) str_extract_all(script, stardate_regex)[[1]]
)
1:3] stardate_extract[
$`102`
[1] "date 42353.7" "date 42354.1" "date 42354.2" "date 42354.7" "date 42372.5"
$`103`
[1] "date 41209.2" "date 41209.3"
$`104`
[1] "date 41235.2" "date 41235.3"
We’re now going to make the data into a tidy dataframe and clean it up so it’s easier to work with. We can use some tidyverse packages for this.
library(dplyr, warn.conflicts = FALSE)
library(tibble)
library(tidyr)
<- stardate_extract %>%
stardate_tidy enframe() %>% # list to dataframe (one row per episode)
unnest(cols = value) %>% # dataframe with one row per stardate
transmute( # create columns and retain only these
episode = as.numeric(name),
stardate = str_replace(value, "date ", "")
%>%
) mutate(
stardate = str_replace(stardate, "\\.\\.$", ""),
stardate = as.numeric(stardate)
)
head(stardate_tidy)
# A tibble: 6 × 2
episode stardate
<dbl> <dbl>
1 102 42354.
2 102 42354.
3 102 42354.
4 102 42355.
5 102 42372.
6 103 41209.
Now we can add a couple more columns for convenience: each episode’s season number and the number after the decimal point in each stardate.
<- stardate_tidy %>%
stardate_tidy_plus mutate(
season = case_when(
%in% 102:126 ~ 1,
episode %in% 127:148 ~ 2,
episode %in% 149:174 ~ 3,
episode %in% 175:200 ~ 4,
episode %in% 201:226 ~ 5,
episode %in% 227:252 ~ 6,
episode %in% 253:277 ~ 7
episode
),stardate_decimal = str_sub(stardate, 7, 7) # 7th character is the decimal
)
head(stardate_tidy_plus)
# A tibble: 6 × 4
episode stardate season stardate_decimal
<dbl> <dbl> <dbl> <chr>
1 102 42354. 1 7
2 102 42354. 1 1
3 102 42354. 1 2
4 102 42355. 1 7
5 102 42372. 1 5
6 103 41209. 1 2
Prepare a scanner probe
We could extract episode names from the scripts, but another option is to scrape them from the ST:TNG episode guide on Wikipedia.
If you visit that link, you’ll notice that the tables of episodes actually give a stardate, but they only provide one per episode – our script-scraping shows that many episodes have multiple instances of stardates.
We can use the {rvest} package by Hadley Wickham to perform the scrape. This works by supplying a website address and the path of the thing we want to extract – the episode name column of tables on the Wikipedia page. I used SelectorGadget – a point-and-click tool for finding the CSS selectors for elements of webpages – for this column in each of the tables on the Wikipedia page (.wikiepisodetable tr > :nth-child(3)
). A short how-to vignette is available for {rvest} + SelectorGadget.
library(rvest)
# store website address
<- read_html(
tng_ep_wiki "https://en.wikipedia.org/wiki/List_of_Star_Trek:_The_Next_Generation_episodes"
)
# extract and tidy
<- tng_ep_wiki %>% # website address
tng_ep_names html_nodes(".wikiepisodetable tr > :nth-child(3)") %>% # via SelectorGadget
html_text() %>% # extract text
tibble() %>% # to dataframe
rename(episode_title = ".") %>% # sensible column name
filter(episode_title != "Title") %>% # remove table headers
mutate(episode = row_number() + 101) # episode number (join key)
head(tng_ep_names)
# A tibble: 6 × 2
episode_title episode
<chr> <dbl>
1 "\"Encounter at Farpoint\"" 102
2 "\"The Naked Now\"" 103
3 "\"Code of Honor\"" 104
4 "\"The Last Outpost\"" 105
5 "\"Where No One Has Gone Before\"" 106
6 "\"Lonely Among Us\"" 107
So now we can join the episode names to the dataframe generated from the scripts. This gives us a table with a row per stardate extracted, with its associated season, episode number and episode name.
<- stardate_tidy_plus %>%
stardate_tidy_names left_join(tng_ep_names, by = "episode") %>%
select(season, episode, episode_title, stardate, stardate_decimal)
We can make these data into an interactive table with the DT::datatable()
htmlwidget.
library(DT)
datatable(
stardate_tidy_names,rownames = FALSE,
options = list(pageLength = 5, autoWidth = TRUE)
)
So that’s a searchable list of all the stardates in each episode.
On screen
Let’s visualise the stardates by episode.
We can make this interactive using the {plotly} package – another htmlwidget for R – that conveniently has the function ggplotly()
that can turn a ggplot
object into an interactive plot. You can hover over each point to find out more information about it.
Obviously there’s a package ({ggsci}) that contains a discrete colour scale based on the shirts of the Enterprise crew. Obviously we’ll use that here.
library(ggplot2) # basic plotting
library(plotly, warn.conflicts = FALSE) # make plot interactive
library(ggsci) # star trek colour scale
library(ggthemes) # dark plot theme
# create basic plot
<- stardate_tidy_names %>%
stardate_dotplot mutate(season = as.character(season)) %>%
ggplot() +
geom_point( # dotplot
aes(
x = episode - 100,
y = stardate,
color = season, # each season gets own colour
group = episode_title
)+
) labs(title = "Stardates are almost (but not quite) chronological") +
theme_solarized_2(light = FALSE) + # dark background
theme(legend.position = "none") +
scale_color_startrek() # Star Trek uniform colours
We can make this interactive with {plotly} You can hover over the points to see details in a tooltip and use the Plotly tools that appear on hover in the top-right to zoom, download, etc.
# make plot interactive
%>%
stardate_dotplot ggplotly() %>%
layout(margin = list(l = 75)) # adjust margin to fit y-axis label
So there were some non-chronological stardates between episodes of the first and second series and at the beginning of the third, but the stardate-episode relationship became more linear after that.
Three points seem to be anomalous with stardates well before the present time period of the episode. Without spoiling them (too much), we can see that each of these episodes takes place in, or references, the past.
‘Identity Crisis’ (season 4, episode 91, stardate 40164.7) takes place partly in the past:
91]][127:129] scripts[[
[1] "\tGEORDI moves into view, holding a Tricorder. (Note:"
[2] "\tGeordi is younger here, wearing a slightly different,"
[3] "\tearlier version of his VISOR.)"
‘Dark Page’ (season 7, episode 158, stardate 30620.1) has a scene involving a diary:
158]][c(2221:2224, 2233:2235)] scripts[[
[1] "\t\t\t\t\tTROI"
[2] "\t\t\tThere's a lot to review. My"
[3] "\t\t\tmother's kept a journal since she"
[4] "\t\t\twas first married..."
[5] "\t\t\t\t\tPICARD"
[6] "\t\t\tThe first entry seems to be"
[7] "\t\t\tStardate 30620.1."
‘All Good Things’ (season 7, episode 176, stardate 41153.7) involves some time travel for Captain Picard:
176]][1561:1569] scripts[[
[1] "\t\t\t\t\tPICARD (V.O.)"
[2] "\t\t\tPersonal Log: Stardate 41153.7."
[3] "\t\t\tRecorded under security lockout"
[4] "\t\t\tOmega three-two-seven. I have"
[5] "\t\t\tdecided not to inform this crew of"
[6] "\t\t\tmy experiences. If it's true that"
[7] "\t\t\tI've travelled to the past, I"
[8] "\t\t\tcannot risk giving them"
[9] "\t\t\tforeknowledge of what's to come."
Speculate
So stardates are more or less chronological across the duration of ST:TNG’s seven series, implying that the writers had a system in place. A few wobbles in consistency appear during the first few season suggest that it took some time to get this right. None of this is new information (see the links in the ‘Open Channel!’ section below).
It seems the vast majority of episodes take place in the programme’s present with a few exceptions. We may have missed some forays through time simply because the stardate was unknown or unmentioned.
Open channel
Only too late did I realise that there is an RTrek GitHub organisation with a Star Trek package, TNG datasets and some other functions.
A selection of further reading:
- ‘Memory Alpha is a collaborative project to create the most definitive, accurate, and accessible encyclopedia and reference for everything related to Star Trek’, including stardates
- ‘The STArchive is home to the… Ships and Locations lists… [and] a few other technical FAQs’, including a deep-dive into the theories in a Stardates in Star Trek FAQ
- Trekguide’s take on the messiness of stardates also includes a stardate converter
- There’s a handy universal stardate converter at Redirected Insanity
- The scripts were downloaded from Star Trek Minutiae, a site that has ‘obscure references and little-known facts’ and ‘explore[s] and expand[s] the wondrous multiverse of Star Trek’
- A simpler guide to stardates can be found on Mentalfloss
- You can find the full list of The Next Generation episodes on Wikipedia
Full stop!
Environment
Session info
Last rendered: 2023-08-09 23:26:14 BST
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggthemes_4.2.4 ggsci_3.0.0 plotly_4.10.2 ggplot2_3.4.2 DT_0.28
[6] rvest_1.0.3 tidyr_1.3.0 tibble_3.2.1 dplyr_1.1.2 stringr_1.5.0
loaded via a namespace (and not attached):
[1] sass_0.4.7 utf8_1.2.3 generics_0.1.3 xml2_1.3.5
[5] stringi_1.7.12 digest_0.6.33 magrittr_2.0.3 evaluate_0.21
[9] grid_4.3.1 fastmap_1.1.1 jsonlite_1.8.7 httr_1.4.6
[13] purrr_1.0.1 fansi_1.0.4 selectr_0.4-2 viridisLite_0.4.2
[17] crosstalk_1.2.0 scales_1.2.1 lazyeval_0.2.2 jquerylib_0.1.4
[21] cli_3.6.1 rlang_1.1.1 ellipsis_0.3.2 munsell_0.5.0
[25] withr_2.5.0 cachem_1.0.8 yaml_2.3.7 tools_4.3.1
[29] colorspace_2.1-0 curl_5.0.1 vctrs_0.6.3 R6_2.5.1
[33] lifecycle_1.0.3 htmlwidgets_1.6.2 pkgconfig_2.0.3 pillar_1.9.0
[37] bslib_0.5.0 gtable_0.3.3 glue_1.6.2 data.table_1.14.8
[41] xfun_0.39 tidyselect_1.2.0 rstudioapi_0.15.0 knitr_1.43.1
[45] htmltools_0.5.5 labeling_0.4.2 rmarkdown_2.23 compiler_4.3.1
Footnotes
The star date for today’s date (14 April 2018) as calculated using the trekguide.com method; this ‘would be the stardate of this week’s episode if The Next Generation and its spinoffs were still in production’.↩︎