Italian inventor Bill Marconi established himself in Chelmsford to launch his trans-Atlantic wireless communications enterprise. The exhibits show the evolution of wireless radio signals broadcast from wide-spectrum spark generators, the signal being received by a cat’s whisker tickling a crystal. Subsequent exhibits show television, radar, and my god-parents’ enterprise, infrared engineering.
Marconi chose Chelmsford to launch his enterprise because it had a reliable supply of electricity and skilled technical workers already engaged in manufacturing electrical equipment.
Image via Wikipedia
Wholesale grocer supplier William Lever ( b. 1851) was sufficiently wealthy in his early 30’s so was considering retiring. However, he decided to sell a product of his own: soap affordable for the working classes. He bought, packed and branded soap, and determined to sell it to every housefwife in the country. The product’s success lead to Lever manufacturing soap in rented facilities. Later, a purpose-built factory was established in marshy swamplands surrounded by a workers village: Port Sunlight.
Port Sunlight was distinctive for its architectural diversity and workers’ facilities such as school, library, hospital, theatre, and museum. The village contrasted starkly with the cheap industrial housing slums built near factories elsewhere in English towns during the industrial revolution.
Boumphrey, I. (2009). Port Sunlight: A pictorial history 1888-1953. Yesterday’s Wirral. Ian & Marilyn Boumphrey. Retrieved from www.yesterdayswirral.co.uk
These photo albums provide a tour of the village today, and the village museum.
Mellalieu, P. J. (2011). Port Sunlight. Retrieved from http://petermellalieu.zenfolio.com/f955526299
Image via Wikipedia
Image via Wikipedia
Predicting success, excellence, and retention from early course performance: a comparison of statistical and machine learning methods in a tertiary education programme
Part 1: Statistical analysis
Students entering a first-year tertiary course have yet to be inculcated into to the challenging requirements of contemporary Western tertiary educational approaches, the specific performance requirements of their learning institution and their qualification program, and the idiosyncrasies of the course and its teacher. However, success in first-year courses is crucial for building students’ capability. Perhaps the first year experience is more important for building students’ confidence for their future progress through their qualification. Consequently, I assert that it is prudent to provide performance feedback to students at the earliest possible stages of their tertiary academic career. Specifically, feedback on their performance and the likely success of their engagement in tertiary studies should they pursue their studies in their ‘business as usual’ fashion.
This study analyses data on students’ performance collected through a 12-week semester. Unremarkably, the data, analysed by traditional statistical regression and correlation analysis shows that it is possible to forecast with good precision students’ overall course grade from formative assessments conducted within the first three weeks of the semester. Armed with this information, future students - under wise guidance from their teachers - should be able to overcome their apparently predestined grade through undertaking specific and focussed intervention through their course of study.
The purpose of this study, however, is intended to extend beyond the use of rudimentary statistical procedures in predicting students’ future success from early, quasi-formative assessment results. Specifically, my quest is to explore how the more recent field of machine learning/data mining might provide more accurate and/or earlier precision about the ultimate performance of a specific student in a class. Furthermore, can the ‘knowledge’ extracted from this data mining exercise provide a student and teacher with greater clarity about the academic advice offered a student?
Last week, I continued to explore my new world of of data mining and machine learning. As part of my self-learning, I’ve begun a mini-project to compare and contrast my traditional statistical approaches to exploring data with my first attempts at using machine learning. Specifically, my objective is to explore the extent to which I can predict a student’s final grades in my courses from grades they gain early in the class. The earlier I can identify the likely best and weakest performers, then the earlier I can implement interventions to help students overcome the inevitable, should the prognosis be a poor grade. Furthermore, perhaps I could ‘buddy up’ the potentially high achieving students with weaker performers in the hope that the weaker performers might learn some relevant tricks of the trade from the stronger performers through a process of cascade learning or osmosis. I’ve specifically chosen to learn about data mining/machine learning using this type of data set, as I have regularly conducted semester reviews of the data from courses that I teach.
The data set
My data set contains about 50 instances. Each instance relates to the grades from one student. There are 16 data items for each instance (student), pertaining to the assessment items completed by the student over the semester. As is usual in an academic course, there is a strict arithmetic relationship between the grades and the final mark. The final mark is a weighted sum of the contribution from each of the assignment sub-components.
The assessment regime for the course comprises the elements in Table 1.
Table 1: Assessment components for BSNS 5391 Innovation & Entrepreneurship
Assignment 1 Case Study Analysis, 15 per cent comprising:
1a Case Study (In-progress) 3%
1b Case Study (Final) 11.4 %
1c Writing Quality Assessment (Final) 0.6%
2 Group Project, 40 per cent comprising:
2a Workshop Presentation 20%
2b Multi-Media Resource 10%
2c Test 10%
3 Professional Learning Agenda (PLA ) 25 per cent, comprising:
3a a Strengths Quest Assessment 6.25
3b PLA (In Progress) 6.25%
3c PLA and reflective essay (Final) 12.5%
4 Test, 20 per cent comprising:
4a Multi-Choice questions 10%
4b Short essays 10%
The final grade is calculated as follows:
Final grade = 0.15 x Ass 1 + 0.40 x Ass 2 + 0.25 x Ass 3 + 0.20 x Ass 4
Further details about the course BSNS 5391 are here:
The Course syllabus
Course structure and assessment strategy
Two pieces of assessment are completed within the first three weeks of the 15-week course, Assignment 1a and 3a. These two assessments are quasi-formative in the sense that they:
For example, in one of the submissions, Assignment 1a, the student presents draft answers to Part One of a three part Case Study assignment. The teacher provides written feedback on both the ideas submitted (content), and the quality of writing. However, the grade for Assessment 1a is assessed by a rubric that relates solely to the quality of writing. The rubric utilises the ‘Six-trait method for evaluating writing quality’. The traits include equal weighting for: ideas and content, word choice, grammatical conventions, organisation, voice (personality), and sentence fluency (Course Syllabus, Mellalieu 2010, pp. 23-24, modified from Norton, modified from Maryvale Elementary, Mobile, TX).
When students submit their complete Assignment 1 (classified as Assignment 1b), they are required (permitted) to rewrite Assignment 1a to accommodate the feedback provided by the teacher. Appropriate writing quality earns the student little credit for submitted course work: just 5 per cent of the weight of Assignment 1c. In effect, this is just 0.6% of the entire course weighting. However, their is ‘devil in the detail’. Appropriate professional writing standards are expected as an absolute requirement for the course. If a student submits an assignment with a writing quality of less than 24/30 on the six-trait rubric mentioned earlier then two MAJOR consequences arise for the student. First, the student receives NO credit for the assignment until the assignment is resubmitted to an acceptable wring quality. Second, the original assessment grade stands: there is no extra credit gained for the repeat submission. This policy is derived from Haswell’s minimal Marking Policy (Haswell, 1983).
The second “staging point” in students’ progress through the course occurs mid-way through the semester. At this point, the students submit two further pieces of assessment, progress work - their Professional Learning Agenda (Assignment 3b), and their final submission of Assignment 1 (Assignment 1b, with 1c being the writing quality component of that assignment)
Over the remaining six weeks of the semester the students present the results of their Group Projects. Their study culminates in their Final Test. Some students choose not to sit the Final Test. Their reason is that they may be international study abroad students who need only gain a pass grade in the course. They prefer to allocate their study time to another course … or take early leave for a tour of the delights of Middle Earth/New Zealand. Consequently, the data set contains missing value for these students Final Test results.
My usual approach to analysing student grades at the conclusion of a course is to construct scatter plots between the various assignment elements. I add a trend line and calculate the correlation coefficient. These results are presented in the Figures.
By inspection, several observations are salient:
No association were found in these situations:
Assignment 2 is a group project, so one would not expect a strong association at all between the two assignments, particularly since I allocated the students to their teams.
A high grade in the course BSNS 5391 is associated STRONGLY with students who gain high grades in the Case Study, Assignment 1 (R² = 0.47). This is despite the fact that the weighting of the Case Study assignment is just 15% of the overall assessment weighting for the course. Students’ grades from the Case Study assignment, therefore, are a strong predictor of general academic performance in three other quite different academic assessments: the Group Project (Assignment 2), the Professional Learning Agenda (Assignment 3), and the Final Test (Assignment 4).
Given this key finding, how early in the course can we predict students’ likely performance through the remainder of the course: strong or weak? The statistical analysis shows that once the teacher has assessed the submission of students’ draft Case Study assignment 1a submitted by week 3, the students are predestined towards a high or low grade in the course overall through the following chain of events:
High formal writing quality evidenced in a students draft is STRONGLY likely to be evidenced in the writing quality of their final Assignment 1 submission. Furthermore, that high writing quality contributes strongly - but indirectly - to a high grade for the assignment overall. I state’ indirectly’ because the weighting attributed to writing quality for the assignment is just 3.6 % for the course overall, and just 0.6 % for the final submission. From a grade contribution point of view, students’ motivation to write adequately is not strong - apart from the teacher’s implementation of the fiendish Haswell Minimal Marking strategy that requires students to resubmit if their writing does not meet the professional standard of 24/30 on the six-trait writing evaluation rubric mentioned earlier.
Secondly, a high grade for Assignment 1 is slightly associated (R² = 0.15) with a high grade for the subsequently-submitted Group Project, Assignment 2. This is curious since the teacher attempted to create seven-person groups through a stratified, uniformly random scheme. However, some subterranean reconfiguration of group membership became apparent in several instances. An All-Chinese team emerged which was certainly not how I had configured the initial groups. Secondly, the large teams of seven were permitted to bifurcate into two smaller teams by mutual negotiation.
Thirdly, a high grade for Assignment 1 is slightly associated (R² = 0.16) with a high grade for the subsequently-submitted Assignment 3.
Finally, a high grade for Assignment 1 is associated slightly (R² = 0.15) with high grades for the Final test, specifically the multi-choice component of the Final Test. The latter point is somewhat surprising, as the BSNS 5391 multi-choice test depended primarily on reading and course material recall, rather than the careful reading, demanding analysis, and professional report-style writing required in the Case Study assessment.
A curiosity remains that their was NO correlation between writing quality of students first assignment (1c), and the grade of the short (five paragraph essays) written in the final test.
Predicating a students specific final grade
Overall, the regression analysis (Figure 5) means that we can use the following equation to predict a student’s overall course grade from the writing quality of their first draft assessment, Assignment 1a as follows:
final = 0.86 q + 61
q = writing quality on a scale 1… 30 assessed by the six-trait writing evaluation rubric
final = the course grade on a scale 0 … 100.
Example: A student gains a writing quality, q, of 20/30. Their likely overall course grade will be: 0.86 x 20 + 61 = 17 + 61 = 78. However, given the moderate correlation coefficient, R² = 0.24, there is still plenty of opportunity for the student to do better - or worse - than this result!
Once the student has submitted All of Assignment 1, their overall course grade can be predicted with greater precision from the equation:
final = 0.29 assignment1 + 57
assignment 1 = the grade for assignment 1 (all components), on a scale 0… 100.
Example: A student gains a grade of 71 in Assignment 1. Their likely overall course grade will be: 0.29 x 71 + 57 = 21 + 57 = 78. However, given the higher correlation coefficient, R² = 0.47, the student will now need to work very diligently in their remaining assignments to gain a superior mark. They are now almost on a predestined path … unless they are lazy and fail the remaining assignments.
Summary and conclusion
Statistical regression and correlation analysis of student grade results were used to explore the relationship between early semester and late semester performance in a first year course of tertiary study. The analysis revealed it is possible to identify students who are likely to be strong or weak academic performers based on the grade resulting from submission of students’ draft work for a Case Study assignment submitted in week three of the 12-week course. Specifically, the assessment of this submission was based solely on using a generic six trait method for evaluating writing quality. This information provides the teacher with the ability to provide specific advice to the student about their prospective future grade and why that grade is likely to ensue. Furthermore, the teacher can prescribe options the student can pursue to improve their performance superior to that predestined apparently from the statistical prediction of their result. In general, as a minimum, the weak students need guidance to improve their formal writing skill.
Curiously, formal writing skill was required to be demonstrated for just 30 per cent of the overall course. Nevertheless, students demonstrating strong writing skill in Assignment 1 demonstrated higher performance in several subsequent assignment components including a complex group project, the construction of a Professional Learning Agenda, and the final multi-choice test. These tasks, whilst not requiring formal writing skills, do require interpersonal oral communication skills, good comprehension of written course study materials, and good self-study skills.
This investigation used relatively straight forward statistical regression and correlation analysis of student grade results to explore the relationship between early semester and late semester performance. Future investigations will explore the extent to which data mining/machine learning can extract more subtle nuances from the data. Comparisons will be made between the insights gained, and the practicalities of using a data mining approach by an intelligent teacher without specialised training in data mining.
I could attempt more sophisticated statistical analysis using multiple regression and discriminant analysis. But now its time to explore machine learning. Probably a Naive Bayes classifier will be an early tool in my exploration.
At this stage of the project, resources used as follows:
Statistical analysis conducted using NeoOffice 3.0 Patch 5.
Mellalieu, P. J. (2010, August 21). Course Handbook and Syllabus Unitec BSNS 5391 Innovation and Entrepreneurship. Scribd. Retrieved August 21, 2010, from http://www.scribd.com/doc/36191676/Course-Handbook-and-Syllabus-Unitec-BSNS-5391-Innovation-and-Entrepreneurship
The year 2007 had to have been one of the worst in the history of British Petroleum plc (BP). In the span of four months, two separate independent reports (the first one commissioned by BP itself) had identified a deeply rooted “culture of risk” within BP where money and profits were valued above worker and environmental safety. These reports were in response to an explosion in 2005 at an oil refinery in Texas City, in the United States, which killed 15 people and injured more than 180, but the reports also referred to pipeline leaks in Alaska as well as other serious safety lapses throughout BP’s global operations. The Texas City explosion was the worst but not the first major incident at a BP facility, and the revelations in the reports severely damaged the credibility the so-called super-major oil company had earned over the last decade. The job of restoring investor and stakeholder confidence as well as the firm’s reputation fell to the BP board and its star group chief executive, Lord John Browne. The B case, product 9B08M003, examines the role played by the board with respect to the personal integrity of Lord Browne. MORE »>
These cases were designed to examine the role of a board with respect to risk management and its social responsibilities to various stakeholders. The cases are primarily targeted towards a course on corporate governance but could easily be used in a course on corporate social responsibility or strategic management. The richness of these cases allow for a deep analysis of the types of risk in which a large, complex multinational organization is engaged. One of the main learning points is to identify and discuss risk minimization strategies for the following types of risk and the role of the board in each: environmental, safety, labour, reputation/image, legitimacy, political, country, asset appropriation, price of supply and production volatility, joint venture (JV) partner compatibility. Additional learning points include: How the board’s responsibility extends to the maintenance or development or alteration of corporate culture; The board’s role in a crisis situation; The importance of legitimacy and the board’s role in stakeholder management.