<- function(
assign_abstracts
abstracts_df,
sifters_df,assignment_cap = 2,
max_iterations = 1000
) {
# Set up named vectors
<- with(abstracts_df, setNames(affiliation, name))
abstracts <- with(sifters_df, setNames(affiliation, name))
sifters <- with(sifters_df, setNames(cap, name))
sifter_caps <- sifter_caps[!is.na(sifter_caps)]
sifter_caps
# Set up starting variables
<- length(abstracts)
n_abstracts <- seq_len(n_abstracts)
seq_abstracts <- setNames(vector("list", length(sifters)), names(sifters))
sifter_assignments <- rep(0, n_abstracts)
assignment_counts <- 0
iter
repeat {
for (name in names(sifter_assignments)) {
# 1. Find the pool of abstracts available to this sifter (if any)
# a. Check if sifter cap has been met
<- name %in% names(sifter_caps)
sifter_has_cap if (sifter_has_cap) {
<- sifter_caps[[name]]
sifter_cap <- length(sifter_assignments[[name]])
sifter_assignment_count
}if (sifter_has_cap && sifter_assignment_count == sifter_cap) next
# b. Add abstracts to pool if they have <n assignments
<- which(assignment_counts < assignment_cap)
abstracts_under_cap if (length(abstracts_under_cap) == 0) next
# c. Remove abstracts that are already assigned to this sifter
<- sifter_assignments[[name]]
already_assigned_to_sifter <-
abstracts_available !abstracts_under_cap %in% already_assigned_to_sifter]
abstracts_under_cap[if (length(abstracts_available) == 0) next
# d. Remove abstracts by the named sifter
<- which(name == names(abstracts[abstracts_available]))
abstracts_by_sifter if (length(abstracts_by_sifter) > 0) {
<- abstracts_available[-abstracts_by_sifter]
abstracts_available
}if (length(abstracts_available) == 0) next
# e. Remove abstracts with the same affiliation as the sifter
<- unname(sifters[name])
sifter_affiliation <-
abstracts_by_same_affiliation which(sifter_affiliation == unname(abstracts[abstracts_available]))
if (length(abstracts_by_same_affiliation) > 0) {
<-
abstracts_available -abstracts_by_same_affiliation]
abstracts_available[
}if (length(abstracts_available) == 0) next
# 2. Select randomly from pool and assign to sifter
<- .resample(abstracts_available, 1)
abstract_selected <-
sifter_assignments[[name]] c(sifter_assignments[[name]], abstract_selected)
# 3. Increment count for sampled abstract
<-
assignment_counts[abstract_selected] + 1
assignment_counts[abstract_selected] if (all(assignment_counts == assignment_cap)) break
}
# Reorder so sifter with fewest assignments gets next assignment first
<- sifter_assignments[order(lengths(sifter_assignments))]
sifter_assignments
<- iter + 1
iter
if (all(assignment_counts == assignment_cap)) break
if (iter == max_iterations) {
message("max_iterations reached")
break
}
}
<- lapply(sifter_assignments, sort)
sifter_assignments order(names(sifter_assignments))]
sifter_assignments[
}
<- function(x, ...) x[sample.int(length(x), ...)] .resample
tl;dr
A quick and dirty R function to assign abstracts to sifters.
Assign me up
I hacked together a function to assign submitted conference abstracts to sifters for assessment. At its simplest you give it two dataframes: one with a row per abstract, one with a row per sifter. You receive back a list, one element per sifter, with the abstracts they’ve been assigned.
There were several criteria that complicated things. The function:
- tries to make assignment counts equal between sifters where possible
- results in each abstract being assigned n times, to assure fairness in assessment
- ensures each sifter receives a unique set of abstracts
- prevents the sifter seeing their own abstracts, if they submitted any
- prevents the sifter seeing abstracts by authors with the same affiliation as the sifter, if relevant
- respects a maximum assignment ‘cap’, if the sifter has one
- allows the user to set a maximum number of iterations to prevent infinite looping (may be possible under certain conditions)
Process
The function itself is split into three main parts:
- Setup of variables and counters to be used in the iterations.
- A
repeat
loop that will keep assigning abstracts to the set of sifters until the total abstract pool is exhausted and the various criteria are met. - Within the
repeat
loop, afor
loop that iterates over each sifter to assign them an abstract from their pool of viable abstracts.
Within the for
loop are three main steps:
- Find the pool of abstracts available to the sifter given various criteria (if any).
- Select randomly an abstract from the pool and assign it to the sifter.
- Increment the assignment counter for the selected abstract.
The for
loop will go to the next
sifter if the pool of abstracts for the current sifter is zero. The repeat
loop will break
if all of the abstracts have been assigned n number of times, according to the assignment_cap
argument. It will also break
if the number of iterations given by max_iterations
has been met.
Definition
The function was developed quickly, is not optimised, is not fully tested and has no defensive programming. But it fulfilled the requirements for the task. I’m recording it here for posterity.
I’ve added some comments and tried to make variable names informative. The abstracts_df
and sifters_df
inputs are dataframes that have columns for the name
and affiliation
, along with a maximum-assignment cap
column in the sifters_df
.
Of course, it’s too big and should be broken into smaller functions, particularly each of the steps in the for
loop. Also, you feed in dataframes, but these are converted immediately to named vectors for processing. In part this reflects the ease of handling named vectors, but is also a legacy of when the requirements were far simpler. The requirements grew more complicated over time, so it became a Frankenfunction
Note also the bespoke .resample()
function because sample()
operates differently depending on whether you give it a vector or a single value1. If only one abstract is left in the pool, e.g. abstract number 13, then sample(13)
won’t output 13, it will actually output a value from 1 to 13.
Example
Demo data
Let’s create some fake data using the {charlatan} package. Let’s imagine we have some sifters and their affiliations. One sifter only has time to do 10 assessments, so they have an assignment cap
value.
set.seed(1)
<- 5
n_sifters <- charlatan::ch_company(n_sifters)
sifter_companies <- charlatan::ch_name(n_sifters)
sifter_names
<- data.frame(
(sifters_df name = sifter_names,
affiliation = sifter_companies,
cap = c(10, rep(NA_real_, n_sifters - 1))
))
name affiliation cap
1 Justen Powlowski Paucek Inc 10
2 Jon Blick-Erdman Ziemann-Ziemann NA
3 Cannon Hegmann Wyman-Wyman NA
4 Nichelle Schoen Daugherty, Daugherty and Daugherty NA
5 Earley Monahan Walsh-Walsh NA
Now let’s create some fake abstracts, again with names and affiliations. The abstract titles here are just random species names, so let’s pretend we’re at a taxonomists’ conference or something. Let’s make it so the sifters have each submitted an abstract of their own and that there’s at least one other submission from their organisation.
Of course, your abstract dataset is likely to contain additional information, like the actual text of the abstract and other details like the author’s geographic location and talk-type preference (poster, plenary, etc). If you’ve used an online survey service then you can usually download a CSV of the results or connect to their API to get the data.
<- 30
total_abstracts
<- data.frame(
abstracts_df name = c(sifter_names, charlatan::ch_name(total_abstracts - n_sifters)),
affiliation = c(
rep(sifter_companies, 2),
::ch_company(total_abstracts - (2 * n_sifters))
charlatan
),title = charlatan::ch_taxonomic_species(total_abstracts)
)
<- abstracts_df[sample(nrow(abstracts_df)), ] # shuffle
abstracts_df row.names(abstracts_df) <- NULL
head(abstracts_df)
name affiliation title
1 Jon Blick-Erdman Ziemann-Ziemann Coniogramme euantha
2 Tavaris Reilly Wyman-Wyman Tristemon egena
3 Shavonne Ziemann PhD Satterfield-Satterfield Pichleria majungense
4 Nichelle Schoen Daugherty, Daugherty and Daugherty Zelkova vigilans
5 Durrell Mertz Jaskolski-Jaskolski Papuechites denutatum
6 Mikayla Rau DDS Stark-Stark Oldfieldia mohriana
Run
Let’s provide the abstracts_df
and sifters_df
dataframes to the function, along with the number of times each abstract will need to be assessed.
<- 2
n
<- assign_abstracts(
assignments
abstracts_df,
sifters_df,assignment_cap = n
)
Here’s what the output looks like. It’s a named list with one element per sifter. The values are the index of that abstract in the vector provided to the abstracts_df
argument.
assignments
$`Cannon Hegmann`
[1] 1 4 6 8 10 12 16 19 20 26 28 29
$`Earley Monahan`
[1] 1 4 5 7 10 12 13 17 22 25 27 28
$`Jon Blick-Erdman`
[1] 7 8 11 14 15 16 17 18 21 22 23 24 30
$`Justen Powlowski`
[1] 2 3 5 9 14 18 20 23 26 29
$`Nichelle Schoen`
[1] 2 3 6 9 11 13 15 19 21 24 25 27 30
These indices can be matched back to the original dataset. Here’s an example for the first sifter.
<- abstracts_df[assignments[[1]], ]
assignment_df head(assignment_df)
name affiliation
1 Jon Blick-Erdman Ziemann-Ziemann
4 Nichelle Schoen Daugherty, Daugherty and Daugherty
6 Mikayla Rau DDS Stark-Stark
8 Earley Monahan Walsh-Walsh
10 Heriberto Feeney Ziemann-Ziemann
12 Muhammad Stoltenberg-Hermiston Tillman LLC
title
1 Coniogramme euantha
4 Zelkova vigilans
6 Oldfieldia mohriana
8 Isoetella flaccidum
10 Tylophora serrulata
12 Microtropis turubalense
You could wrangle this into an anonymised dataframe with columns for the sifter to provide their assessment.
<- assignment_df[, "title", drop = FALSE]
anon_df $score <- NA_real_
anon_df$comments <- NA_character_
anon_df anon_df
title score comments
1 Coniogramme euantha NA <NA>
4 Zelkova vigilans NA <NA>
6 Oldfieldia mohriana NA <NA>
8 Isoetella flaccidum NA <NA>
10 Tylophora serrulata NA <NA>
12 Microtropis turubalense NA <NA>
16 Coussapoa anatuyana NA <NA>
19 Chiloglottis brideliifolius NA <NA>
20 Crocus candollei NA <NA>
26 Pistacia weinmannii NA <NA>
28 Baillaudea dodsonii NA <NA>
29 Syngonium tangutica NA <NA>
And then you can return this back to the sifter. The low-tech mechanism would be to put this into a spreadsheet output with {openxlsx}, for example. Much better would be to create a simple Shiny app hosted on Posit Connect or something, allowing each sifter to see their assigned abstracts and submit their assessments.
Check
Great, but the output actually meet the initial requirements for the system? Let’s take a look.
Was each abstract assigned the number of times specified by assignment_cap
?
all(table(unlist(assignments)) == n)
[1] TRUE
Here you can see that sifters received a near-equal number of abstracts, apart from the sifter who had a specified maximum-assignment cap.
lengths(assignments)
Cannon Hegmann Earley Monahan Jon Blick-Erdman Justen Powlowski
12 12 13 10
Nichelle Schoen
13
Was each sifter assigned a unique set of abstracts?
all(lengths(lapply(assignments, unique)) == lengths(assignments))
[1] TRUE
Did anyone receive their own abstract?
<- sifters_df[sifters_df$name %in% names(assignments), "name"]
sifter_names
<- vector("list", length = length(sifter_names)) |>
has_own_abstract setNames(sifter_names)
for (i in seq_along(sifter_names)) {
<- sifter_names[i]
sifter_name <- abstracts_df$name[assignments[[i]]]
abstract_names <- all(sifter_name == abstract_names)
has_own_abstract[[i]]
}
unlist(has_own_abstract)
Justen Powlowski Jon Blick-Erdman Cannon Hegmann Nichelle Schoen
FALSE FALSE FALSE FALSE
Earley Monahan
FALSE
Did any of the sifters get assigned abstracts from their own affiliation?
<-
affiliations $name %in% names(assignments), "affiliation"]
sifters_df[sifters_df
<- vector("list", length = length(assignments)) |>
has_affiliate_abstract setNames(affiliations)
for (i in seq_along(affiliations)) {
<- affiliations[i]
sifter_affiliation <- abstracts_df$affiliation[assignments[[i]]]
abstract_affiliations <- all(sifter_affiliation == abstract_affiliations)
has_affiliate_abstract[[i]]
}
unlist(has_affiliate_abstract)
Paucek Inc Ziemann-Ziemann
FALSE FALSE
Wyman-Wyman Daugherty, Daugherty and Daugherty
FALSE FALSE
Walsh-Walsh
FALSE
Okey-doke.
What now?
This could definitely be better.
As mentioned, there’s a lot of refactoring that could be done, recognising that it was developed rapidly with changing requirements. I’m reflecting on it now that it’s solved the problem, but eventually it may be refactored or rewritten from scratch.
This would make sense if we (or you) want to use it in other scenarios or as part of a more generic package in future.
Or, as usual, this functionality probably exists in some package already and you can tell me all about it.
Environment
Session info
Last rendered: 2024-03-27 15:42:32 GMT
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] digest_0.6.33 utf8_1.2.4 R6_2.5.1 fastmap_1.1.1
[5] xfun_0.41 magrittr_2.0.3 glue_1.7.0 tibble_3.2.1
[9] knitr_1.45 pkgconfig_2.0.3 htmltools_0.5.6.1 rmarkdown_2.25
[13] lifecycle_1.0.4 cli_3.6.2 fansi_1.0.6 vctrs_0.6.5
[17] compiler_4.3.1 rstudioapi_0.15.0 tools_4.3.1 whisker_0.4.1
[21] pillar_1.9.0 evaluate_0.23 charlatan_0.5.1 yaml_2.3.8
[25] rlang_1.1.3 jsonlite_1.8.7 htmlwidgets_1.6.2
Footnotes
The ‘single sample switch’ as Patrick Burns puts it in The R Inferno (section 8.2.33).↩︎