Streamlining Midsemester Reports With The Tidyverse

tidyverse

Working with LMS data using the tidyverse

Published

March 11, 2022

At GSU, we have Early Alert that is meant to connect students with resources if they are not doing well in the first few weeks of classes. While setting up your rules of thumb is up to you, this can quickly soak up an entire day if you are going row by row in your class of 60 or more students. To streamline the process I turned to R because it is a fairly simple data cleaning task.

Our learning management software likes to add lots of extra stuff to the column names in our data. While most of us would come up with a more concise name like Chapter_4 our software exports the column names as assignment name, grading scheme, and the weights that are assigned. So it looks like Chapter 4.2 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>. We could use janitor::clean_names to eliminate some of the extraneous stuff, however iCollege will get grumpy at us because they do not match the names in the gradebook. We could do this in Excel, and I have done it in the past, I figured I could speed this up in R while avoiding the headaches of ensuring each and every column is where it should be.

pacman::p_load(tidyverse, DT)


The downloaded binary packages are in
    /var/folders/0k/jvdgmwf56zg1dw7n9mm2wzvc0000gp/T//RtmphLVZMW/downloaded_packages

set.seed(1994)

students = 26

dat = tibble(id = 1:26,
             Students = LETTERS,
             `Chapter 4.2 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>` = rnorm(students, mean = 90, sd = 5),
            `Chapter 4.3 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>` = rnorm(students, mean = 90, sd = 5),
            `Chapter 4.4  Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>` = rnorm(students, mean = 90, sd = 5),
            `Chapter 4.5  Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>` = rnorm(students, mean = 90, sd = 5),
            `Exam 1 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams` = rnorm(students, mean = 85, sd = 3))

In this simple example there are only 5 columns they have annoying names sure but it is not that bad. We can probably copy and paste them and we will be fine. However, in my real data there are 13 or so chapters with a few subsections in each of them. So this can get out from under us kind of quick and copy and pasting does not make our lives any easier. We also usually get columns that do not help us. Our ID variable is not doing anything other than providing the same info in a less transparent way than the student name variable, and more minor items like surveys which do not have a lot of weight on their final grade.

I just used rnorm for convenience; however your data is more likely to have some missing values because students did not do stuff so it looks like this.

id	Students	Chapter 4.2 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>	Chapter 4.3 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>	Chapter 4.4 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>	Chapter 4.5 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>	Exam 1 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams	surveys
1	A	83.56343	81.70468	85.71587	85.97115	83.71489	100
2	B	91.44406	86.12473	91.53892	91.80590	84.62108	100
3	C	98.50185	89.76872	91.79794	NA	85.11662	100
4	D	96.99304	88.53903	83.42538	92.42625	86.08429	100
5	E	90.49552	92.64055	93.27846	89.24096	80.39882	100
6	F	83.30126	86.91250	90.75522	83.04509	78.98100	100
7	G	94.03755	81.34807	87.57366	99.04721	80.11158	100
8	H	89.47185	89.62792	91.37094	96.30433	89.58399	100
9	I	92.78107	NA	96.44479	87.20809	81.31118	100
10	J	83.04019	88.74052	NA	96.58848	85.93511	100
11	K	91.57146	NA	87.94661	93.73501	88.43763	100
12	L	94.18791	89.64253	90.53395	88.29499	86.82630	100
13	M	95.96341	94.94366	91.53490	85.17583	82.57437	100
14	N	84.91042	90.50202	93.41616	87.64364	83.46703	100
15	O	92.76743	81.32158	84.05944	103.32883	87.00560	100
16	P	92.48953	93.73259	91.46155	93.64285	87.11260	100
17	Q	91.88890	91.99284	95.01500	95.61172	84.63614	100
18	R	87.25405	89.89440	91.07664	91.93496	89.64706	100
19	S	83.62513	82.48662	NA	92.74020	90.45662	100
20	T	87.91310	92.87887	95.78937	84.74539	86.05332	100
21	U	91.36955	96.67913	90.59384	87.86449	79.76733	100
22	V	91.30800	87.35611	88.29575	90.40188	89.97015	100
23	W	NA	89.35205	97.02888	NA	82.47724	100
24	X	94.06035	83.28452	97.71377	82.26974	90.26051	100
25	Y	NA	91.24920	96.15636	88.80131	90.26133	100
26	Z	94.57923	93.57157	91.66163	91.83234	84.33456	100

So for the purpose of the report I treat NA’s as zeros. If you are dealing with multiple columns this is a pretty easy step just use mutate(across) and using some combination of starts_with, contains, everything, or ends_with to achieve the desired goal.

imputed =  dat_miss %>%
  mutate(across(c(starts_with("exam"), starts_with("chapter"), 
                  ), ~replace(., is.na(.), 0)))

So that should take care of the NA's but we still need to generate our indicators. The assignments that carry the most weight are exams and the chapters, so I focus the most on those. In my use case, taking the sum makes sense, but for yours the average is probably the better option. Thankfully, while the learning management software names are a bit cumbersome, they do share something in common. So we can use mutate(across) and rowwise to make our life easier. rowwise is a pretty neat little function that works perfectly for this task where you are trying to do things for each student. Then you can use case_when or ifelse to generate a logical to create your flag. This is a toy example, but we can quickly start to build it out for our specific use cases. Using a mixture of apply and select you can achieve the same thing.

 flag = imputed %>% 
  rowwise() %>% 
  mutate(flag_dplyr = round(sum(across(starts_with("chapter")))))


flag$flag_apply = imputed %>% 
  select(contains("chapter")) %>% 
  apply(., 1, function(row){
    round(sum(row))
  })

Cool, we can use this for our basic stuff, but I tend to weigh exams by how well students did. So your highest exam score counts for more, and your lowest exam has the least amount of weight. As with lots of things in R you can do this a few ways. There is probably a more concise way of doing this with apply it is ugly but works.

exams_complete = imputed %>% 
  mutate(`Exam 2 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams` = rnorm(students, mean = 70, sd = 11),
         `Exam 3 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams` = rnorm(students, mean = 75, sd = 11))



exams_complete$higest_exam_apply = exams_complete %>%  
  select(contains("Exam")) %>% 
  apply(., 1, function(row){
    round(max(row))
  })


exams_complete = exams_complete %>% 
  rowwise() %>% 
  mutate(hig_exam_dplyr = round(max(c_across(contains("Exam")))))

exams_complete$lowest_exam_apply = exams_complete %>%  
  select(contains("Exam")) %>% 
  apply(., 1, function(row){
    round(min(row))
  })

exams_complete = exams_complete %>% 
  rowwise() %>% 
  mutate(low_exam_dplyr = round(min(c_across(contains("Exam")))))

So this is easy enough because we are just changing what we are doing with our summary function, but what about the second highest exam score? In this case you are going to have to use some trickery to get what you want

exams_complete$second_highest = exams_complete %>% 
  select(starts_with("Exam")) %>% 
  apply(., 1, function(row){
    round(sort(row, decreasing = TRUE)[2])
  })

This is simple enough and you can just use select and filter to get the info you want. However, as we all know we have to do some grading. You can use all your favorite dplyr tricks to grade and impute grades. This is the easy part, and now you can start to expand this out to using R to automate calculating grades. One super neat assignment all Intro to Government students at Georgia State do is assigning an adult field trip of sorts that is free for them. The students go to the National Center for Civil and Human Rights and do a tour and simulation of the lunch counter sit-ins. There are a few components to this: they submit a unique code as part of the proof that they have done it. Naturally, as is the case, some students just did not do it, but that nbd just use our friend left join, but to retain all the students be sure to include keep = TRUE so each student gets graded.

codes_data = tibble(id = 1:10,
             Students = LETTERS[1:10],
             code = 100)


exams_complete = mutate(exams_complete, Code = NA)


grades_with_codes = left_join(exams_complete, codes_data, by = "Students", keep = TRUE) %>% 
  mutate(Code = code,
         Code = ifelse(is.na(Code), 0, Code)) %>% 
  rename(id = id.x,
         students = Students.x) %>% 
  select(-id.y, -Students.y, -code)

In the real data, I join by using last names, which works for the most part. But you may need to check to make sure that your LMS has correctly spelled your students’ last name or, just as importantly, they spelled their last name correctly. Hopefully, this helps somebody. If not, at least it is tucked in a nice blog post. Be sure to check everything to make sure it is working, but if it works correctly then hopefully , you get a nice graded dataset

id	students	Chapter 4.2 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>	Chapter 4.3 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>	Chapter 4.4 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>	Chapter 4.5 Points Grade <Numeric MaxPoints:100 Weight:3.125 Category:Area9 Chapters CategoryWeight:23>	Exam 1 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams	surveys	Exam 2 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams	Exam 3 Points Grade <Numeric MaxPoints:55 Weight 10 Category: Exams	higest_exam_apply	hig_exam_dplyr	lowest_exam_apply	low_exam_dplyr	second_highest	Code
1	A	83.56343	81.70468	85.71587	85.97115	83.71489	100	58.06166	68.83867	84	84	58	58	69	100
2	B	91.44406	86.12473	91.53892	91.80590	84.62108	100	69.96645	73.51084	85	85	70	70	74	100
3	C	98.50185	89.76872	91.79794	0.00000	85.11662	100	66.30306	74.42536	85	85	66	66	74	100
4	D	96.99304	88.53903	83.42538	92.42625	86.08429	100	65.79288	85.94456	86	86	66	66	86	100
5	E	90.49552	92.64055	93.27846	89.24096	80.39882	100	88.72094	64.87935	89	89	65	65	80	100
6	F	83.30126	86.91250	90.75522	83.04509	78.98100	100	80.41976	73.06732	80	80	73	73	79	100
7	G	94.03755	81.34807	87.57366	99.04721	80.11158	100	86.56743	66.79291	87	87	67	67	80	100
8	H	89.47185	89.62792	91.37094	96.30433	89.58399	100	87.48637	74.13581	90	90	74	74	87	100
9	I	92.78107	0.00000	96.44479	87.20809	81.31118	100	69.63665	68.36826	81	81	68	68	70	100
10	J	83.04019	88.74052	0.00000	96.58848	85.93511	100	69.71254	71.65925	86	86	70	70	72	100
11	K	91.57146	0.00000	87.94661	93.73501	88.43763	100	71.99278	78.33438	88	88	72	72	78	0
12	L	94.18791	89.64253	90.53395	88.29499	86.82630	100	66.30946	99.16532	99	99	66	66	87	0
13	M	95.96341	94.94366	91.53490	85.17583	82.57437	100	71.14509	62.94670	83	83	63	63	71	0
14	N	84.91042	90.50202	93.41616	87.64364	83.46703	100	66.72115	67.17081	83	83	67	67	67	0
15	O	92.76743	81.32158	84.05944	103.32883	87.00560	100	88.98898	66.12589	89	89	66	66	87	0
16	P	92.48953	93.73259	91.46155	93.64285	87.11260	100	63.42139	77.70564	87	87	63	63	78	0
17	Q	91.88890	91.99284	95.01500	95.61172	84.63614	100	70.82987	53.49421	85	85	53	53	71	0
18	R	87.25405	89.89440	91.07664	91.93496	89.64706	100	69.95649	86.20525	90	90	70	70	86	0
19	S	83.62513	82.48662	0.00000	92.74020	90.45662	100	78.62938	64.34364	90	90	64	64	79	0
20	T	87.91310	92.87887	95.78937	84.74539	86.05332	100	84.55215	80.42847	86	86	80	80	85	0
21	U	91.36955	96.67913	90.59384	87.86449	79.76733	100	75.08180	64.40788	80	80	64	64	75	0
22	V	91.30800	87.35611	88.29575	90.40188	89.97015	100	68.53960	79.79600	90	90	69	69	80	0
23	W	0.00000	89.35205	97.02888	0.00000	82.47724	100	88.95738	70.28810	89	89	70	70	82	0
24	X	94.06035	83.28452	97.71377	82.26974	90.26051	100	63.12942	78.58372	90	90	63	63	79	0
25	Y	0.00000	91.24920	96.15636	88.80131	90.26133	100	88.82365	86.03999	90	90	86	86	89	0
26	Z	94.57923	93.57157	91.66163	91.83234	84.33456	100	65.54887	82.58063	84	84	66	66	83	0