Streamlining Midsemester Reports With The Tidyverse

r
tidyverse
Working with LMS data using the tidyverse
Published

March 11, 2022

At GSU, we have Early Alert that is meant to connect students with resources if they are not doing well in the first few weeks of classes. While setting up your rules of thumb is up to you, this can quickly soak up an entire day if you are going row by row in your class of 60 or more students. To streamline the process I turned to R because it is a fairly simple data cleaning task.

Our learning management software likes to add lots of extra stuff to the column names in our data. While most of us would come up with a more concise name like Chapter_4 our software exports the column names as assignment name, grading scheme, and the weights that are assigned. So it looks like Chapter 4.2 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>. We could use janitor::clean_names to eliminate some of the extraneous stuff, however iCollege will get grumpy at us because they do not match the names in the gradebook. We could do this in Excel, and I have done it in the past, I figured I could speed this up in R while avoiding the headaches of ensuring each and every column is where it should be.

pacman::p_load(tidyverse, DT)

The downloaded binary packages are in
    /var/folders/0k/jvdgmwf56zg1dw7n9mm2wzvc0000gp/T//RtmphLVZMW/downloaded_packages
set.seed(1994)

students = 26

dat = tibble(id = 1:26,
             Students = LETTERS,
             `Chapter 4.2 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>` = rnorm(students, mean = 90, sd = 5),
            `Chapter 4.3 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>` = rnorm(students, mean = 90, sd = 5),
            `Chapter 4.4  Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>` = rnorm(students, mean = 90, sd = 5),
            `Chapter 4.5  Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>` = rnorm(students, mean = 90, sd = 5),
            `Exam 1 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams` = rnorm(students, mean = 85, sd = 3))  

In this simple example there are only 5 columns they have annoying names sure but it is not that bad. We can probably copy and paste them and we will be fine. However, in my real data there are 13 or so chapters with a few subsections in each of them. So this can get out from under us kind of quick and copy and pasting does not make our lives any easier. We also usually get columns that do not help us. Our ID variable is not doing anything other than providing the same info in a less transparent way than the student name variable, and more minor items like surveys which do not have a lot of weight on their final grade.

I just used rnorm for convenience; however your data is more likely to have some missing values because students did not do stuff so it looks like this.

id Students Chapter 4.2 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23> Chapter 4.3 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23> Chapter 4.4 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23> Chapter 4.5 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23> Exam 1 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams surveys
1 A 83.56343 81.70468 85.71587 85.97115 83.71489 100
2 B 91.44406 86.12473 91.53892 91.80590 84.62108 100
3 C 98.50185 89.76872 91.79794 NA 85.11662 100
4 D 96.99304 88.53903 83.42538 92.42625 86.08429 100
5 E 90.49552 92.64055 93.27846 89.24096 80.39882 100
6 F 83.30126 86.91250 90.75522 83.04509 78.98100 100
7 G 94.03755 81.34807 87.57366 99.04721 80.11158 100
8 H 89.47185 89.62792 91.37094 96.30433 89.58399 100
9 I 92.78107 NA 96.44479 87.20809 81.31118 100
10 J 83.04019 88.74052 NA 96.58848 85.93511 100
11 K 91.57146 NA 87.94661 93.73501 88.43763 100
12 L 94.18791 89.64253 90.53395 88.29499 86.82630 100
13 M 95.96341 94.94366 91.53490 85.17583 82.57437 100
14 N 84.91042 90.50202 93.41616 87.64364 83.46703 100
15 O 92.76743 81.32158 84.05944 103.32883 87.00560 100
16 P 92.48953 93.73259 91.46155 93.64285 87.11260 100
17 Q 91.88890 91.99284 95.01500 95.61172 84.63614 100
18 R 87.25405 89.89440 91.07664 91.93496 89.64706 100
19 S 83.62513 82.48662 NA 92.74020 90.45662 100
20 T 87.91310 92.87887 95.78937 84.74539 86.05332 100
21 U 91.36955 96.67913 90.59384 87.86449 79.76733 100
22 V 91.30800 87.35611 88.29575 90.40188 89.97015 100
23 W NA 89.35205 97.02888 NA 82.47724 100
24 X 94.06035 83.28452 97.71377 82.26974 90.26051 100
25 Y NA 91.24920 96.15636 88.80131 90.26133 100
26 Z 94.57923 93.57157 91.66163 91.83234 84.33456 100

So for the purpose of the report I treat NA’s as zeros. If you are dealing with multiple columns this is a pretty easy step just use mutate(across) and using some combination of starts_with, contains, everything, or ends_with to achieve the desired goal.

imputed =  dat_miss %>%
  mutate(across(c(starts_with("exam"), starts_with("chapter"), 
                  ), ~replace(., is.na(.), 0)))

So that should take care of the NA's but we still need to generate our indicators. The assignments that carry the most weight are exams and the chapters, so I focus the most on those. In my use case, taking the sum makes sense, but for yours the average is probably the better option. Thankfully, while the learning management software names are a bit cumbersome, they do share something in common. So we can use mutate(across) and rowwise to make our life easier. rowwise is a pretty neat little function that works perfectly for this task where you are trying to do things for each student. Then you can use case_when or ifelse to generate a logical to create your flag. This is a toy example, but we can quickly start to build it out for our specific use cases. Using a mixture of apply and select you can achieve the same thing.

 flag = imputed %>% 
  rowwise() %>% 
  mutate(flag_dplyr = round(sum(across(starts_with("chapter")))))


flag$flag_apply = imputed %>% 
  select(contains("chapter")) %>% 
  apply(., 1, function(row){
    round(sum(row))
  })

Cool, we can use this for our basic stuff, but I tend to weigh exams by how well students did. So your highest exam score counts for more, and your lowest exam has the least amount of weight. As with lots of things in R you can do this a few ways. There is probably a more concise way of doing this with apply it is ugly but works.

exams_complete = imputed %>% 
  mutate(`Exam 2 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams` = rnorm(students, mean = 70, sd = 11),
         `Exam 3 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams` = rnorm(students, mean = 75, sd = 11))



exams_complete$higest_exam_apply = exams_complete %>%  
  select(contains("Exam")) %>% 
  apply(., 1, function(row){
    round(max(row))
  })


exams_complete = exams_complete %>% 
  rowwise() %>% 
  mutate(hig_exam_dplyr = round(max(c_across(contains("Exam")))))

exams_complete$lowest_exam_apply = exams_complete %>%  
  select(contains("Exam")) %>% 
  apply(., 1, function(row){
    round(min(row))
  })

exams_complete = exams_complete %>% 
  rowwise() %>% 
  mutate(low_exam_dplyr = round(min(c_across(contains("Exam")))))

So this is easy enough because we are just changing what we are doing with our summary function, but what about the second highest exam score? In this case you are going to have to use some trickery to get what you want

exams_complete$second_highest = exams_complete %>% 
  select(starts_with("Exam")) %>% 
  apply(., 1, function(row){
    round(sort(row, decreasing = TRUE)[2])
  })

This is simple enough and you can just use select and filter to get the info you want. However, as we all know we have to do some grading. You can use all your favorite dplyr tricks to grade and impute grades. This is the easy part, and now you can start to expand this out to using R to automate calculating grades. One super neat assignment all Intro to Government students at Georgia State do is assigning an adult field trip of sorts that is free for them. The students go to the National Center for Civil and Human Rights and do a tour and simulation of the lunch counter sit-ins. There are a few components to this: they submit a unique code as part of the proof that they have done it. Naturally, as is the case, some students just did not do it, but that nbd just use our friend left join, but to retain all the students be sure to include keep = TRUE so each student gets graded.

codes_data = tibble(id = 1:10,
             Students = LETTERS[1:10],
             code = 100)


exams_complete = mutate(exams_complete, Code = NA)


grades_with_codes = left_join(exams_complete, codes_data, by = "Students", keep = TRUE) %>% 
  mutate(Code = code,
         Code = ifelse(is.na(Code), 0, Code)) %>% 
  rename(id = id.x,
         students = Students.x) %>% 
  select(-id.y, -Students.y, -code)

In the real data, I join by using last names, which works for the most part. But you may need to check to make sure that your LMS has correctly spelled your students’ last name or, just as importantly, they spelled their last name correctly. Hopefully, this helps somebody. If not, at least it is tucked in a nice blog post. Be sure to check everything to make sure it is working, but if it works correctly then hopefully , you get a nice graded dataset

id students Chapter 4.2 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23> Chapter 4.3 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23> Chapter 4.4 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23> Chapter 4.5 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23> Exam 1 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams surveys Exam 2 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams Exam 3 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams higest_exam_apply hig_exam_dplyr lowest_exam_apply low_exam_dplyr second_highest Code
1 A 83.56343 81.70468 85.71587 85.97115 83.71489 100 58.06166 68.83867 84 84 58 58 69 100
2 B 91.44406 86.12473 91.53892 91.80590 84.62108 100 69.96645 73.51084 85 85 70 70 74 100
3 C 98.50185 89.76872 91.79794 0.00000 85.11662 100 66.30306 74.42536 85 85 66 66 74 100
4 D 96.99304 88.53903 83.42538 92.42625 86.08429 100 65.79288 85.94456 86 86 66 66 86 100
5 E 90.49552 92.64055 93.27846 89.24096 80.39882 100 88.72094 64.87935 89 89 65 65 80 100
6 F 83.30126 86.91250 90.75522 83.04509 78.98100 100 80.41976 73.06732 80 80 73 73 79 100
7 G 94.03755 81.34807 87.57366 99.04721 80.11158 100 86.56743 66.79291 87 87 67 67 80 100
8 H 89.47185 89.62792 91.37094 96.30433 89.58399 100 87.48637 74.13581 90 90 74 74 87 100
9 I 92.78107 0.00000 96.44479 87.20809 81.31118 100 69.63665 68.36826 81 81 68 68 70 100
10 J 83.04019 88.74052 0.00000 96.58848 85.93511 100 69.71254 71.65925 86 86 70 70 72 100
11 K 91.57146 0.00000 87.94661 93.73501 88.43763 100 71.99278 78.33438 88 88 72 72 78 0
12 L 94.18791 89.64253 90.53395 88.29499 86.82630 100 66.30946 99.16532 99 99 66 66 87 0
13 M 95.96341 94.94366 91.53490 85.17583 82.57437 100 71.14509 62.94670 83 83 63 63 71 0
14 N 84.91042 90.50202 93.41616 87.64364 83.46703 100 66.72115 67.17081 83 83 67 67 67 0
15 O 92.76743 81.32158 84.05944 103.32883 87.00560 100 88.98898 66.12589 89 89 66 66 87 0
16 P 92.48953 93.73259 91.46155 93.64285 87.11260 100 63.42139 77.70564 87 87 63 63 78 0
17 Q 91.88890 91.99284 95.01500 95.61172 84.63614 100 70.82987 53.49421 85 85 53 53 71 0
18 R 87.25405 89.89440 91.07664 91.93496 89.64706 100 69.95649 86.20525 90 90 70 70 86 0
19 S 83.62513 82.48662 0.00000 92.74020 90.45662 100 78.62938 64.34364 90 90 64 64 79 0
20 T 87.91310 92.87887 95.78937 84.74539 86.05332 100 84.55215 80.42847 86 86 80 80 85 0
21 U 91.36955 96.67913 90.59384 87.86449 79.76733 100 75.08180 64.40788 80 80 64 64 75 0
22 V 91.30800 87.35611 88.29575 90.40188 89.97015 100 68.53960 79.79600 90 90 69 69 80 0
23 W 0.00000 89.35205 97.02888 0.00000 82.47724 100 88.95738 70.28810 89 89 70 70 82 0
24 X 94.06035 83.28452 97.71377 82.26974 90.26051 100 63.12942 78.58372 90 90 63 63 79 0
25 Y 0.00000 91.24920 96.15636 88.80131 90.26133 100 88.82365 86.03999 90 90 86 86 89 0
26 Z 94.57923 93.57157 91.66163 91.83234 84.33456 100 65.54887 82.58063 84 84 66 66 83 0