DAVID WILEY: So George, welcome back. Week 6.

GEORGE SIEMENS: Thank you. It's good to see you again, David.

DAVID WILEY: You too. It's been a long time.

GEORGE SIEMENS: It's been a tough slog, but rewarding.

DAVID WILEY: Almost a week. This week we're talking about data and algorithms and competencies, and I want to jump right in on the data side. As we have said in the weeks leading up to this, when you're using content, when you're interacting with other people, when you're doing that in technology-mediated ways, those interactions all throw off a lot of data.

What sometimes is called data exhaust, right? It's just the data byproduct of the primary activities we're engaged in. You've been thinking about this for a very long time, as you talked about in your introduction all the way back in week 1, that analytics is part of your path. It's how you've grown through Open. And I wonder if you'd talk a little bit more about the relationship between Open and analytics, and where the field is right now.

GEORGE SIEMENS: Well as a field, I think learning analytics has developed fairly quickly. Initially in 2011 with the first conference, it was essentially a bricolage field, meaning we stole a lot of ideas from other areas. So we spent time looking at things like discourse analysis applied in education.

We looked at social network analysis to get a bit of a sense of what was going on in different parts-- taking from sociology to apply what happened in a classroom and so on. I think more recently it's become more sophisticated, just as the entire big data ecosystem, if you want to call it that, has matured as well. Now certainly in, even a MOOC like this, so you look at edX courses.

They've been quite valuable in helping researchers start understanding student interaction and student related data at a scale you've never had before. So you've had courses on edX that will have 200,000, 300,000 people, even though the norm today is probably closer to 3,000 to 5,000 to 7,000 students in a course. So when you have that many students, consider a single faculty member teaching one MOOC will interact with more students than she would in a lifetime as a traditional faculty member.

The difficulty that we have then with that is, How do you make sense of those things? It's not Google-style big data, where you've got terabytes of data. But you're dealing with gigabytes of data, and you're dealing with-- just look at an example in this course.

So for a student that's been involved in this course, when they've read a piece of text, there's a click there. When they've logged on, there's a notation of that log on. When they've moved to a video, when they've paused a video, there's a data point captured at each of those levels.

Pre-course surveys collect data. If there's a student that decides to engage in social media around the course, there's a hashtag. And then you can start to bridge those different profiles and different spaces to get a sense of what students are doing.

Now a lot of that data, as you and I have talked about before, but it's this idea of proxy. The educational research that we're doing quite often are dealing with proxy elements. We can't get directly into a student's brain and massage their neurons and see exactly what they're thinking,

So we rely on things and sometimes it's a direct proxy that said, George logged on, he watched 3 minutes and 12 seconds of this tremendously engaging video with David Wiley, then he clicked, and he did a quick survey or a quiz, and then he read an article. So these things are all captured. Now what does that mean?

So from an analytics perspective, that's just basically your peer data that you're looking at to understand student behavior. Almost everything we do though is computed or derived variables, so we would say, as an example, if we're trying to understand what is engagement, what does it mean to be engaged in a course. Well there's no direct answer-- we have a few data points that say George spent 30 minutes in edX, he spent 3 minutes on the video, he spent a few minutes reading an article, he posted a discussion forum.

All of those things are our base data elements, but they don't tell us anything about the things that we're going to compute later on, which is, is George engaged? If you're trying to get at some affective dimensions, is he is experiencing a course in a positive way or is he writing angry comments Stephen Downes in the discussion forum? So that's where you start to get the more sophisticate element of analytics, which is we're maturing into.

We have tortured log files, we have tortured Twitter hashtags, and now we're starting to move into more sophisticated analysis that will get at emotion and affect and engagement. The difficulty there is we begin to create student models, and student models are effective to a point, but they're always a complex set of if-then statements. And so variables can change, so this idea of a generic student model that tells us about a student's level of engagement may be true in one condition, but MOOCs are global-- you've got 120 countries represented in a course like this.

What does that reflect, and how do we get a better understanding of that? So I think, getting at the core of it, the data is valuable. It's going to be a tremendous contributor to research. However, more and more of the work that's been happening with the analytics end is moving behind the scenes and into closed algorithmic structures.

DAVID WILEY: Yes, so let's pick up on that. The researcher in me just cringes at the idea that this algorithm is a complete black box. I can't audit it, I can't understand it, I can't look to see how it works.

In fact, I don't know is that algorithm even behaving in ways that are completely ethical? Is it looking at me and saying, Well based on your race and based on this, I'm going to make a recommendation to you in this way. Without the capability for peer review, which is the fundamental mechanism of the advancement of knowledge, without peer review, where does this go? How do we deal with this issue of these algorithms being black boxes?

GEORGE SIEMENS: Well I'll give you one example, and I'm fine naming names. But there's a company that's quite well regarded, or was early on in the personalized learning space. It was Newton, and they've promoted a lot of their toolsets and platforms.

For a period, they were very tightly aligned with Pearson, even though there's been a bit of a change in that relationship. We've had through SoLAR, so the Society for Learning Analytics Research, we've had interactions asking if we can have a look at their data to determine whether what they're saying is accurate. Now this is not a slight on Newton as a company, but instead it presents this problem where if we are in a data-centric world, which we absolutely are, when the analysis of that data is closed, there's only a few eyes that can see what's happening, which is very much against the ethos of what we've been talking about this whole course. The argument has been, Science advances most effectively when ideas can be openly exchanged, that ideas can be critiqued and challenged and, more importantly, improved upon by others.

Now there's nothing that doesn't say that Newton has the world's greatest algorithms ever created in history and that will ever be created in future generations. That may be the case, but the ethos of science is critique, openness, transparency, collaboration. And that's, I think, the thing that concerns me the most because we can't validate when a company tells us, We had a 78% increase in learning gains defined by successful completion of a course due to the use of this personalized or adaptive learning strategy. If we can't quiz that, test it, validate it, we're moving away from what the spirit of science is.

DAVID WILEY: It seems like another big part of research in the academy is reproducibility, and there's been a lot of dialogue about that recently and how that's been problematic in some disciplines. But when does--

GEORGE SIEMENS: Psychology?

DAVID WILEY: [IMITATES COUGHING] Psychology. When the primary analytical tool is a secret, it's impossible to reproduce.

GEORGE SIEMENS: There's no reproducibility crisis then.

DAVID WILEY: It's like the whole field is Han Solo saying, Trust me. Yes.

GEORGE SIEMENS: Yes, exactly.

DAVID WILEY: That seems to be bad for the field.

GEORGE SIEMENS: I would absolutely agree, and an interesting point. So I think in some ways the battle for open education-- publishers have acquiesced at least part of the content battle. They've said, OK, we can't compete with that.

They've moved into the slightly higher level escalation, meaning they're looking at analytics as a competitive or value add. They're looking at different resources and tools and simulation that they provide to educators. So on the one hand, openness is winning at a content level.

But the war itself has escalated to a different level where I haven't seen a lot of individuals play in. And this gets to the point that I know you've talked about-- competencies. The idea of more granular assessments of what individuals know.

So rather than a course, you're going assess based on competencies. And in theory, it wouldn't matter if I acquired that competency by studying at BYU or by studying on my own watching YouTube videos or reading articles. The important thing is, Do I have the competency?

That's the thing that matters. But as you've noted, the challenge with competency-based education and the granularization of assessment to a level of competency rather than course level is that the pathways through curriculum, namely competency maps, they're not open. Explain why that's a concern to you.

DAVID WILEY: Yes, well content-- what's the right metaphor? Content are like ingredients in a kitchen, right? And they can all be there and they can be available, and if they're in your kitchen, presumably they're free, free to use.

But just having access to lots of raw ingredients doesn't turn out to be great food all the time. There's something about that recipe-- about directions, about how to bring together, what order to put it together in, how long to bake it at what temperature, and things like-- that we're just missing on the open educational resources side. There's lots of open content.

The most recent estimate that Creative Commons has published in conjunction with some search engines is about 1.1 billion openly licensed resources published online. But how do you take those and what do you do with them? And without competency statements or learning outcomes or whatever it is that you want to call them, without those outcomes and without some structure of those outcomes for a suggested path or three or eight suggested pathways through those, and without the content being aligned to those outcomes, and assessments also being aligned to the content being aligned to the outcomes, we're just left with a pile of content, but it's hard to figure out what to do with.

And I'm particularly interested in continuous improvement, and I think that learning analytics and OER here are so complimentary in this way. With learning analytics, we would traditionally look at students' early warning, early intervention, how can we keep the student on track? I think there's a great opportunity to also look at content.

Where is the content performing? Where's the content falling down? What kind of a job is the content doing in supporting student learning? And with OER, we have permission to make changes and improvements but when we only have OER, we don't know what to change or improve.

With learning analytics, we can have great insight into where the content is standing up and where it's falling down. But without permissions, we can't do anything about what the analytics tell us. And so bringing those two together in a you've got your chocolate in my peanut butter kind of way enables continuous improvement that's not possible otherwise.

But I also think that we'll find that there are sequencing effects. There are things that will flow from, Which content did I use first, and how did that set me up for success in understanding these topics later on? Without having the competency statements, the competency maps and assessments, and content aligned to those statements, it just seems like it's hard to make the kind of systematic, rigorous progress that we'd like to make in terms of understanding effective ways to support learning.