Open data: an open debate?



The way science is communicated is changing fast, super-fast. 

 

Only a few years ago, scientists would access research within their field of interest, by reading published papers, accessed through subscriptions to journals. Recall that then, all was printed in paper (and black and white) and bounded in issues and volumes which were mailed (snail-mailed) to paying subscribers. In the larger universities and wealthier countries, libraries hosted most major journals and had agreements in place with other libraries to access works they did not harbor themselves. To access a paper, you just needed to walk into the library or ask the librarian for help. 

 

In the Global South (gee… I hate this term!) there were already in place, some basic loopholes. We knew some colleague or research groups (abroad for instance, but not exclusively) to whom we would write and would kindly ask to photocopy (and fax?) that critical paper they had and that we had found by spending hours going through Current Contents. These were tiny printed booklets listing all published work in a given period, organized by subject matter, authors, key words (yes!), etc., that most libraries received (and paid for, I believe). Worst case scenario, an email to the corresponding author (yes, another evolutionary vestige, if I may) who, in very friendly manner, would drop in the mail (snail mail again) a copy of his work. Most authors got, once their work had been published, a set of copies of clasped pages holding the paper to send out.  Needless to say, the number of papers published was significantly lower than what we see today. Also, research results were essentially non-existent, until the issue in which the paper reporting them was printed. Hence, the year of publication as we state in our CVs, was paramount.

 

This model, known as the “subscription model” evolved into the digital age. Journals slowly adopted online PDFs of papers and so were ‘widely’ available. Your subscription would allow you to download papers, which were easily found, or else you could pay-to read (a rental model) or pay-to download. Rates were (are) ridiculously high, under the argument that the publishing companies were struggling to make ends meet (more on this in another piece soon).  And paper issues were also still around for a time. Often, people would get involved in heated discussions over coffee, on how print was better than PDFs to read, or otherwise. This new version of the subscription model installed the term “paywall” which essentially refers to the fact that scientific findings were hidden behind an expensive fee. In the developed world, libraries advanced into providing access to staff via computer terminals and passwords, usable while within the work institution and occasionally allowing remote access after pressure (I assume here) from the scientific community. Again, in the Global South, some colleague befriended some other colleague abroad, who would kindly share their access code to their institutional library, or papers were downloaded and emailed by them. Also, online posting (illegally) of full papers or ‘proofs’ in personal websites began to grow and, of course, the notorious Sci-Hub website appeared.  This model allowed for really wide access to scientific publications. if to the grief of commercial publishers. And it also began to mess-up publication dates. Accepted papers began to be visible as ‘online-first’ or ‘early-view’ postings in the journals’ websites maybe several months before actual publication.

 

Enter Open Access. As the paywall model collected resistance (I assume mostly from the developed countries, as I explained before that cheating was/is simple, effective and widespread in the Global South), an alternative model arose, essentially because it violated a basic principle of science: sharing our discoveries with the World. By the way, it is interesting to note here that this conversation did not occur (to my knowledge), when journals were printed and mailed to subscribers.  Did we care about paywalls then?

 

The Open Access model, initially lead by a few journals such as PLOS (Public Library o Science) gained momentum. Here, authors would pay the costs of publication and anyone, in every corner of the world, would be able to read, download and cite published work. This model from its name, suggests a positive change, but this is far from true. As authors pay-and lot of money- to see their paper published, it is better termed the “authors-pay model” and brings about several controversial issues and queries: how does quality meet costs? Are journals, especial non-societal journals, pushed to publish more? Are bad papers published simply because they pay? Are good papers lost to science because they cannot pay? And, not less importantly, the model excludes the Global South. For starters, the cost of publishing -numbers which I struggle to fully grasp- is becoming unaffordable by growing number of world scientists (I need not explain how equity in income and investments in science among countries and is far from being reached). But also, cheating (i.e., Sci-Hub access) is no longer possible. An end result is the increasing ill-feelings about Open Access in the scientific ecosystem, and the dropping impact and attractiveness of journals that offer this model only. Still, it is well established and likely to be adopted throughout the science publishing world sooner than later.

 

And now, in parallel, we discuss Open Data. Here we are convinced that data sharing is so important to science that we hardly recall why we did not talk about it before. We are taught early on, that replicability of experiments is an essential component of research. And to do this we need to see the raw data, see the code used to analyze it. We also need this through the peer review process to guarantee the science is good and honest. Open data has clear advantages and yet this wheel is still only slowly spinning. Journals need to adopt it at different stages of the submission process and authors need to be convinced their data is ready to be shared. 

 

Why is it slow?

 

At least three issues arise. First, data is often sensitive and confidential and not ready to be shared. What data is actually sensitive? How do scientists working in different countries and cultures deal with this concept? Is it Ok, to share- say, data on a biodiversity hotspot in the Amazon rainforest-widely? Aren’t we allowing it to be accessed by the ‘bad-guys’? Or has a given data set some intrinsic value at a national/regional level (I will also write on this in another post. What do national boundaries and sovereignty mean in science) that could be affected by accessibility? Think here applied ecology, medicine, land management or geology.

 

Second, and very importantly, how does Open Data pair with Open Access? We need to pay to publish and- sometimes- also pay to harbor data in a repository. We worry enormously about sharing data but then, we need to pay to get our work published? Note that unpublished work means ‘no data’. Of course, there are pre-print servers and data may be shared outside a publication, but how has quality been checked there? (This is another topic altogether). 

 

Finally, uploading machine-readable spreadsheets upon submission or acceptance needs checking. Repositories need to be linked to submission websites. Repositories need to be kept clean and safe. For instance, only to ensure things are done properly upon submission, journal’s do or will need to, hire staff or seek help from seasoned data editors. This even if uploaded in free, government run repositories. Also, data editors are becoming more and more important and soon (if not now) will also run code and check for inconsistencies. Fraud has always been around, but more papers mean chances are higher. So, you can figure that all of this is not free in this World. The key question is, who will pay? Will Open Access rates increase? Back to round one: the costs of Open Access.

 

And then, there is the Global South. Is this mostly a ‘white people’s problem’? The rapid changes we are seeing in the publication world heed nimble behaviors from the science community. Should we discuss the way we do science today as we have adapted to paper publication dates in our CVs? In any case, conversations should include all (the Global South, all disciplines) as science is now more global and diverse than ever. While the ‘whys’ of Open Access and Open Data are clear to most, the ‘hows’ still remain nebulous. 

Comments

Popular posts from this blog

To cite or not to cite, that is the (authors) question!

The essence of scientific research

Global warming. So much progress, huh? (a message for the younger scientists)