Friday, November 18, 2016

5 reasons why library analytics is on the rise

Since I've joined my new institution more than a year ago, I've focused a lot on this thing called "library analytics".

It's a new emerging field and books like "Library Analytics and Metrics, using data to drive decisions and services" , following by others are starting to emerge.

Still the definition and scope of  anything new is always hazy and as such my thoughts on the matter are going to be pretty unrefined, so please let me think aloud.

But why library analytics? Libraries have always collected data and analysed them (hopefully), so what's new this time around?

In many ways, interest in library analytics can be seen to arise from a confluence of many factors both from within and outside the academic libraries. Here are some reasons why.

Trend 1 :Rising interest in big data, data science and AI in general

I don't like to say what we libraries deal in is really big data (probably the biggest data sets we deal with is in ezproxy logs which can be manageable depending on the size of your institution) , but we are increasingly told that data scientists are sexy and we are seeing more and more data mining, machine learning, deep learning and all that to generate insights and aid decision making. 

Think glamour projects like IBM Watson and Google's AlphaGo. In Singapore, we have the Smart Nation initiative which leads to many opportunities to work with researchers and students who see the library has a rich source of data for collaboration. 

In case you think these are sky in the pie projects - already IBM Watson is threatening to replace Law librarians , and I've read of libraries starting projects to use IBM Watson at reference desks.

Academic libraries are unlikely to draw hard core data scientists as employees, but we are usually blessed to be situated near pockets of talent and research scientists who can collaborate with the library. 

As Universities start offering courses focusing on Analytics and data science, you will get hordes of students looking for clients to practice on and the academic library is a very natural target as a client to practice on.

Trend 2: Library systems are becoming more open and more capable at analytics

Recently, I saw someone tweeting that Jim Tallman who is CEO of Innovative Interfaces declaring that libraries are 8-10 years behind other industries in analytics.

Well if we are, a big culprit is the  integrated library system (ILS) that libraries have been using for decades. I haven't had much experience poking at the back-end of systems like Millennium (owned by Innovative), but I'm always been told that report generation is pretty much a pain beyond fixed standard reports.

As a sidenote, I always enjoy watching conventionally trained IT people come into the library industry and then hear them rant about ILS. :)

In any case, with the rise of Library Open service platforms like Alma, Sierra (though someone told me that all it does is basically adds SQL but that's a big improvement) etc more and more data is capable of being easily uncovered and exposed.

A good example is Ex Libris's Alma analytics system. Unlike in the old days where most library systems were black boxes and you had great difficulty generating all but the most simple reports, systems like Alma and other Library Service Platforms of its class, are built almost ground up to support analytics.

You don't even have to be a hard core IT person to drill into the data, though you can still use SQL commands if you want.

With Alma you can access COUNTER usage statistics uploaded with Ustat (eventually Ustat is to be absorbed into Alma) using Alma analytics. Add Primo Analytics, Google analytics or similar that most Universities use and a big part of the digital footprints of users is captured.

Alma analytics - COUNTER usage of Journals from one Platform 

Want to generate users and the number of loans by school made in Alma? A couple of clicks and you have it.

Unfortunately there still seems to be no easy way to track usage of electronic resources by users as COUNTER statistics are not granular enough. The only way is by mining ezproxy logs which can get complicated particularly if you are interested in downloads not just sessions.

This is still early days of course, but things will only get better with open APIs etc. 

Trend 3 : Assessment and increasing demand to show value are hot trends

A common trend on Top trends list for academic libraries in recent years (whether lists by ACRL or Horizon reports) is assessment and/or showing value and library analytics has potential to allow academic libraries to do so.

Both assessment (understanding to improve or make decisions) or advocacy (showing value) require data and analytics

For me, the most stereotypical way for a academic library to show value would be to run correlations showing high usage of library services would be highly correlated with good grades GPA. 

But that's not the only way. 

ACRL has led the way with reports like Value of academic libraries report , projects like Assessment in Action (AIA): Academic Libraries and Student Success to help librarians on the road towards showing value.

But as noted in the Assessment in Action: Academic Libraries and Student Success report, a lot of the value in such projects comes from the experience of collaborating with units outside the library.

Academic libraries that do such studies in isolation are likely to experience less success.

Trend 4 : Rising interest in learning analytics

A library focus on analytics also ties in nicely as universities themselves are starting to focus on learning analytics (with UK supported by JISC probably in the lead).

A lot of current learning analytics field focus on the LMS  (Learning management systems) data, as vendors such as  Blackboard, Desire2Learn, Moodle provide learning analytics modules that can be used.

But as libraries are themselves a rich store of data on learning (the move towards Reading list management software like Leganto, Talis Aspire and Rebus:List help too), many libraries like Nottingham Trent University find themselves involved in learning analytics approaches. 

So for example  Nottingham Trent University , provides all students with a engagement dashboard allowing them to benchmark themselves against others . Sources used to make up the engagement score include access of learning management systems, use of library and university buildings. 

Trend 5 : Increasing academic focus on managing research data provides synergy 

From the academic library side , we increasingly focus on the challenges of collecting, curating , managing and storing research data. There are rising fields like GIS, Digital Humanties that put the spotlight on data. We no longer focus not just on open access for articles, but on open data if not open science. 

While library analytics is a separate job from librarians who are involved in research data management , there is synergy to be had between the two job functions as both deal with data. Both jobs requires skills in  handling of large data sets, protection of sensitive data,  data visualization etc.

For example the person doing library analytics can act as a client for the research data management librarian to practice on when producing reports and research papers. In return, the later can gain experience handling relatively large datasets by doing analytics projects.

But what does library analytics entail?  Here are some common types of activities that might fall into that umbrella.

Assisting with operational aspects of decision making. 

Traditionally a large part of this involves collection development and evaluation.

In many institutions like mine it involves using alma analytics,Ezproxy logs, Google analytics, Gate counts and other systems that track user behavior etc.

This in many ways isn't anything new, though these days there are typically more of such systems to use and products are starting to compete on the quality of analytics available.

This type of activity can be opportunistic, ad hoc and in some libraries siloed within individual library areas.

Implementation and operational aspects of library dashboard projects 

A increasing hot trend, many libraries are starting to pull all their data together from diverse systems into one central dashboard using systems like Qlikview, Tableau, or free javascript libraries like D3.js

Typically such dashboard can be setup for public view or more commonly for internal users (usually within-library, ideally institution wide) but the main characteristic is that they go beyond showing data from one library system or function (so for example a Alma dashboard or a Google Analytics dashboard doesn't quite qualify as a library dashboard the way I defined it here).

Remember I mentioned above that library systems are becoming more "open" with APIs? This helps to keep dashboards up-to date without much manual work.

I'm aware of many academic libraries in Singapore and internationally creating library dashboards using commercial or opensource systems like Tableau, Qlikview etc but they tend to be private.

Here are my google sheet list of public ones.

Setting up the dashboard is relatively straightforward technically speaking, more important is sustaining it. What data should we present? How should we visualize the data? Is the data presented useful to decision makers? How can we tell? At what levels of decision makers are we targeting it at? Should the data be made public?

This type of activity breaks down barriers between library functions though it can still be siloed in the sense that it is just the work of a University Library separate from the rest of the University. 

Implementation or involvement in correlation studies, impact studies for value of libraries.

The idea of showing library impact by doing correlation studies of student success (typically GPA) and library usage seems to be popular these days with pioneers like libraries at the University of Huddersfield (with other UK libraries by JISC)University of WollongongUniversity of Minnesota etc leading the way.

Such studies could be one off studies, in which case arguably the value is much less as compared to a approach like University of Wollongong's Library Cube where a data warehouse is setup to provide dynamic uptodate data that people can use to explore the data.

Predictive analytics/learning analytics

Studies that show impact of library services on student success are well and good, but the next step beyond it I believe is getting involved in predictive analytics or learning analytics which will help people whether it be students, lecturers or librarians use the data to improve their own performance.

I've already mentioned Nottingham Trent University's engagement scores, where students can log into the learning management system to look at how well they do compared to their peers.

The dashboard also is able to tell them things like "Historically 80% of people who scored XYZ in engagement scores get Y results".

This type of analytics I believe is going to be the most impactful of all.

Hierarchy of analytics use in libraries

I propose that the activities I list above are listed in increasing levels of capability and perhaps impact.

It goes from

Level 1 - Any analysis done is library function specific. Typically ad-hoc analytics but there might be dashboard systems created for only one specific area (e.g collection dashboard for Alma or web dashboard for Google analytics)

Level 2 - A centralised library wide dashboard is created covering most functional areas in the library

Level 3 - Library "shows value" runs correlation studies etc

Level 4 - Library ventures into predictive analytics or learning analytics

Many academic libraries are at Level 1 or 2 and a few leaders are at level 3 or even level 4.

Analytics requires deep collaboration 

This way of looking at things I think misses a important element. I believe as you move up the levels, increasingly silos get broken & collaboration increases.

For instance while you can easily do analytics for specific library functions in a silos way (level 1), by building a library dashboard that covers library wide areas would break down the silos between library functions (level 2).

In fact, there are two ways to reach level 2.

Firstly, libraries can go their own way and implement a solution specific to just their library. Even better is if there is a University wide platform that the University is pushing for and the library is just one among various departments implementing dashboards.

The reason why the latter is better is if there is a University wide push for dashboards, the next stage is much easier to achieve because data is already on the University dashboard and University wide there is already familiarity with thinking about and handling of data.

Similarly at level 3, where you show value and run correlation studies and assessment studies you could do it in two ways. You could request for one off access to student data (particularly you need cooperation for many student outcome variables like GPA, though there can be public accessible data like class of degree and Honours' lists) or if there is already a University wide push towards a common dashboard platform, you could connect the data together creating a data warehouse. The later is more desirable of course.

By the time you reach level 4, it would be almost impossible for the library to go it alone.

Obviously I've presented a rosy picture of library analytics. But as always new emerging areas in libraries tend to be at the mercy of the hype cycle. Though conditions seem to be ripe for a focus on library analytics, it's unclear the best way to organize the library to push for it.

Should the library highlight one person who's sole responsibility is analytics? But beware of the Co-ordinator syndrome! Should it be a team? a standing committee? a taskforce? a intergroup? It's unclear.

Monday, November 7, 2016

Learning about proposed changes in copyright law - text data mining exceptions

Recently, a researcher I was talking to remarked to me that University staff can be jumpy around copyright questions and some would immediately duck for cover the moment they heard the word "copyright". I'm not that bad, but as a academic librarian my knowledge of copyright is not as good as I want it to be.

But last month, I attended a great engagement session at my library by  Intellectual Property Office of Singapore (IPOS) and Ministry of Law where the speakers gave a great talk on copyright in Singapore and addressed some of these proposed changes. They managed to concisely summarize the copyright law in Singapore, the current situation  (the irony of how the copyright law in Singapore pretty much copied the Australia one which itself is based on UK was not lost on the speaker) and the rationale for change.

Given that understanding basic copyright is going to be increasingly one of the fundamental skill sets needed by academic librarians, I benefited a great deal from attending.

There were many interesting and beneficial proposed changes for the education section but I was most captivated by proposed changes in the copyright laws with respect to Text data mining in Singapore designed to support the smart nation initiative in Singapore. 

This proposed change I believe is very similar to the one already in the UK , except that covers only non-commercial use. EU is also I believe mulling over a similar law.

Like in the UK law, I believe the proposed change will also disallow restriction of text data mining via contract.

Why is this proposed change important?

One of the most common issues we face today is the fact that increasingly many researchers are starting to do text data mining on content in our subscribed databases, they could be doing it in newspaper databases (e.g. Factiva) or journals (e.g. Sciencedirect) or other resources.

Many researchers I find aren't quite aware that for most part when the library signs an agreement for access, such rights exclude TDM (or do not state TDM as a allowed use).

Most databases we subscribe to also have a system to detect "mass downloads" and as such any TDM eis most likely going to be detected (though I believe some researchers may try to bypass this by scripting human-like behavior).

Businesses are never one to forgo a revenue opportunity and many databases require we pay an additional known expensive fee on top to allow TDM.

Others have a more "come talk to me and we will see" style policy and the rare few enlightened ones like JSTOR actually allow it up to certain limits. Many academic libraries have created guides like this and this  to try to keep track of things.

As text data mining can be more easily done via API through than scraping data, another approach is to offer a guide of the APIs that can be used. One example is MIT's libguide

The proposed law would have two effects. Firstly, the status of researcher's doing data mining of the open web was always hazy. In theory if you mine say reviews on blogger say and use it for your research, I understand content owners of the blog could possibly sue you for copyright infringement. The proposed changes clarify this and allow TDM of such data  (but not merely aggregation) of such data.

More interestingly for data that researchers have legitimate access to aka subscribed databases, there is no longer any distinction between reading an article and doing text data mining. And such a right cannot be excluded by contract by the vendors.
The data/position paper set out by the ministry of law/ipos here is a great read, and it points out that if such a change comes into effect, it is likely vendors who already charge for TDM will "price in" the cost of  TDM because they can no longer exclude these rights.

Will the exception disadvantage libraries that don't have users that won't do TDM?

There was an interesting Q&A afterwards mostly centering around the TDM exemption.

One of the more obvious points made was, is it necessarily desirable to put in these exemptions when it will lead to vendors "pricing-in" TDM rights for database packages automatically? While the bigger Universities and institutions would probably have staff that would do TDM, the smaller institutions would be unfairly affected resulting in higher prices for no benefit. Why not allow each institution to negotiate with vendors and allow exclusion of TDM depending on each institution's need?

I am sympathetic to this view point.

But my current gut feel is that overall this will be beneficial.

Let me try out this line of argument.

Libraries tend to be in a far weaker negotiation positions than the vendors (due to the fact that a lot of vendor material is unique) and what often happens is that under current law many libraries will simply play it safe, pay only for basic read access but not TDM because it's very hard to predict who will want to TDM even for big Universities. Some librarians will even refuse out of principle to pay for TDM.

So vendors will not be sure at first how much they are losing by not charging for TDM as whatever they getting now is probably less than true demand. 

The proposed changes package everything into one, and it turns the game into a game of chicken. While the vendor might want to price things as high as possible and to even recapture all the possible TDM revenue but there is a need to compromise (anchored around current prices that exclude TDM) or they will end up earning nothing.

That should put a cap on too exorbitant price increases at least initially (though in future periods they might be able to properly estimate the real TDM demand and price accordingly). I suspect the net effect is while prices will go up ,overall a lot more TDM will occur and if the intent is to encourage TDM that is a win and TDM generates sufficient benefits it will be a win.

But this is a wild guess.

I'm also wondering once the law forbids vendors from preventing TDM once libraries have paid for lawful access to the database, can they say "Okay, you can now do TDM but only via method A (probably API) and not via scraping or trying a script to do automatic download via the usual human facing interface?". This seems to suggest No.

It would be great if we could learn from the UK experience and I started asking around my usual international network of librarians but came up empty.

One librarian pointed out to me that even though the law was passed in 2014, given subscriptions cycles of 1 year or more, and research lag time, any such research probably is still in the works!

Still I ask readers of my blog, if you work in UK as a academic librarian what was your experience like? Did you find prices of databases that are most often targets of data mining start to rise even faster? Did the sales people reference the change in law as a reason? If you are a researcher in UK who has done TDM under this law, what was your experience like?

Even anecdotes would be nice. You can comment below or send me emails privately if you like and I will preserve any anonymity.

What law are the contracts signed under?

Another point that was brought up that was more damaging was that when libraries sign contracts with database vendors which jurisdiction of law will the contract be under? If the contract is to be under US law (fairly common?), the changes in the copyright act would have no sway over the breach of contract, effectively making it toothless. 

I'm not a lawyer so I do not know what will happen if a library was sued for breach of contract overseas outside Singapore and awarded damages.  

Other comments and questions

The Q&A was a good exchange of opinions and views between both the speakers and the audience (made up of faculty and librarians). Topics covered included open access (Gold open access is usually frowned upon by librarians in Asia which I think is quite different compared to the west), copyright for MOOs and more. 

One interesting point made by the speaker was that he was a bit surprised to see while there was organization on the author /creator side with organizations like The Copyright Licensing and Administration Society of Singapore Limited (CLASS),  Composers and Authors Society of Singapore (COMPASS)  representing the author rights, there wasn't such a group on the user side.

He suggested perhaps the Universities in Singapore band together to negotiate collectively on some agreed core content? Is this what we call a library consortium?

Then again Singapore is a really small market, so who knows perhaps the law would make little difference and vendors might just let it go?  

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...