Friday, January 18, 2013

OSK solves the journal problem

I've enjoyed writing little pieces for OSK. Unfortunately, as I leave Kentucky, my writing days for Kentucky's open science source of record must end. I want to leave this last piece documenting thoughts my colleagues and I had when starting this endeavor a year or two ago. We tried to bring it to fruition, but we simply did not have the time or resources to do so. We hope one of you will.

This is not a comprehensive characterization, but it's the beginning of a comprehensive solution. Big ups to my partners on this project, Adam Robison, Keevin Bybee, and Patrick Bybee. Thanks to many others who've contributed time and mind.


The academic publishing enterprise is archaic at best, unethical at worst. Read commentary here.

Below is our solution to this problem. It would be a huge undertaking to create it and we don’t have the requisite skills/time to do so. We just want to float it on the web so someone else might pick it up and start the process of making it a reality.

Crowd-sourced ratings systems, on a large enough scale, can be very effective. We envision a publishing system where scientists upload novel research to a cloud-based platform in somewhat traditional manuscript form, including introduction, methods, results, conclusions, limitations, etc. Peers will evaluate these manuscripts by the same criteria that are used today. Diverging from today, manuscripts will be linked to underlying data supporting said manuscripts. The value of the manuscript will be decided by the peer community, rather than by 3-5 anonymous “experts” requested by the manuscript’s author or selected by the journal’s editor. Non-anonymous researchers will evaluate the manuscript for quality/importance, leaving a “score” and supporting discussion for posterity (comments that will also be given a score by reviewers). Manuscript, based on an algorithm, will be given a comprehensive score that will be dynamic (more about algorithm below). Good manuscripts will make the “front page”, their visibility will increase as they’re targeted to “high value” lists, and poor quality manuscripts will fall to the bottom. However, all manuscripts will remain visible/searchable, allowing an idea ahead of its time to receive proper recognition with age. Authors will be listed as now, and each author will be linked to a historical research profile. Each manuscript/comment score is factored into a historical user score, which will serve to weight the importance of comments left on other manuscripts/comments.

In the end, manuscript submission is completely transparent, has supporting data, includes relevant critical commentary traditionally seen only by authors, eliminates the "third reviewer" phenomenon, rewards authors with novel ideas without regard to institution/financial backing/geography, and overcomes problems documented at the polymath project. It would have completely open access. It would eliminate user frees, freeing information to all. It would have other perks, but you get the idea.

We envision this system laid out in a three-tier model with a “dashboard” type user interface streamlining ease of use. The “bottom” level of this system would include all data sets. Datasets would include relevant epidemiological characteristics. Datasets would be linked to citations of data within articles that’d identify which pieces of data, equations, and interpretations were used therein. Checking for accurate interpretation of data by peers would be simple and limits could be built into citation software prohibiting inaccurate data utilization. All new manuscripts would be tied to data in a dataset, though the dataset used might not be new. An independent user might identify something novel in a dataset not identified by the original dataset poster, and he/she could create a separate manuscript from it, giving full credit to the original submitter with data-links. Original posters would be rewarded each time a dataset was used successfully via historical user scores mentioned throughout this article (encouraging submission of data for community processing). This system would reward transparency and thoroughness on original submission.

The “middle” level of the system would include manuscripts and their associated comments. We envision this level being a split screen, allowing users to read the new manuscript, while older manuscripts cited in the new manuscript would come directly into view on the second screen (this could be customizable to allow several manuscripts and data, multiple manuscripts, and outside sources to be viewed simultaneously for cross-referencing). Users could read new manuscripts, give them a rating, and leave comments with citations in the form of discussion here. Comments could address issues with new manuscripts, or the data underlying new manuscripts, directly. Consequently, rather than discrete manuscripts, new information would be disseminated as a novel dialogue, allowing important commentary on new ideas to be included in the historical record (ensuring accountability for new information, providing context, and facilitating quicker synthesis of new ideas with older ideas). As above, good manuscripts/comments would float to the top via a strong score, rewarding the submitter and the scientific community.

The “top” level of this system would be an information synthesis similar to Wikipedia. This level would allow users to cite and synthesize information from manuscripts and comments (as well as, initially, from outside sources) in discrete articles “about” scientific topics. This level eliminates the need for independent external publishing sources, as all discrete information at this level could be linked directly to relevant manuscripts/comments discussing it, making “truth” easily linked to primary sources.

Underlying this process is an algorithm accounting for the spectrum of user contributions. A user’s individual score would account for all of his historical contributions to the project, including scores for manuscripts provided, when/where/how often each manuscript is cited by other users, constructive comments left, and datasets submitted. Credentials (appointments, higher education, etc), similarly, could be accounted for in the individual user score. Value associated with comments he/she leaves on other manuscripts would take historical user score into account, weighting his/her comments fairly for historical contributions. Similarly, new manuscripts submitted by a user would receive higher/lower baseline ratings based on historical contribution. We envision this score being included on academic curricula vitae, replacing absurdly long lists of publications/abstracts that give no indication of the quality of individual manuscripts.

The beauty of this system is that it allows complete transparency while offering each user a rating assigned by his peers. It would include translation software, eliminating linguistic barriers. It would allow new literature to be targeted directly to user preferences, allowing users to customize lists of manuscripts they would like to read and review by key words, discipline, user score underlying a new manuscript, “hot’ manuscripts receiving a lot of attention from readers/commenters, country/state/neighborhood of origin, etc. It would facilitate streamlined and effective literature searches, allowing users to search for key words across the entire body of scientific knowledge, including comments and datasets. It would force improvement of data quality by making data subject to inspection along with commentary on that data. It would facilitate high quality meta-analyses by allowing users to compile original data for processing, rather than processed data. It would eliminate siloed departmental research by opening commentary to all disciplines and encouraging interdisciplinary collaborations.

This system could be supported financially by grants, government funding or other charitable contributions. Alternatively, we envision this system having the potential to support itself (and be quite lucrative) through advertising revenue. Because a user’s entire historical research archive would be posted, his/her search and review history would be available, his biographical sketch posted, his discipline noted, advertisers would have a wealth of information for targeting. For example, a device manufacturer selling pipette tips would know that user X recently posted a dataset in which 30,000 pipette tips were used. That company would also know that user X searched for manuscripts from users completing similar research. Consequently, it would be clear that user X purchased many, many pipette tips and advertising could reflect this.

As I said, this is not a complete characterization of this project. Leave your thoughts and questions in the comments!

No comments:

Post a Comment

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 United States License.