All the accumulated science knowledge at your fingertips for everyone!
Unfortunately, much of today's research is still stored largely in separate silos e.g. by conference and journal. Too many research articles are still available only if you willing to pay a steep price as an individual not affiliated with an institution.
The new database contains only snippets from the research papers up to five consecutive words. This is probably not too helpful!
Service like Google Scholar, Semantic Scholar, and Microsoft Academic make a difference and they make it easier, but it is still quite tedious to access research across authors, journals, conferences etc.
"... The catalogue, which was released on 7 October and is free to use, holds tables of more than 355 billion words and sentence fragments listed next to the articles in which they appear. It is an effort to help scientists use software to glean insights from published work even if they have no legal access to the underlying papers ...
Malamud says that because his index doesn’t contain the full text of articles, but only sentence snippets up to five words long, releasing it does not breach publishers' copyright restrictions on the re-use of paywalled articles."
Malamud says that because his index doesn’t contain the full text of articles, but only sentence snippets up to five words long, releasing it does not breach publishers' copyright restrictions on the re-use of paywalled articles."
"... The General Index consists of 3 tables derived from 107,233,728 journal articles. A table of n-grams, ranging from unigrams to 5-grams, is extracted using SpaCy. Each of the 355,279,820,087 rows of the n-gram table consists of an n-gram coupled with a journal article id. A second table is constructed using Yake and consists of 19,740,906,314 rows, each with a keywords and an article id. A third table associates an article id with metadata. ..."
The General Index (an archive.org project)
No comments:
Post a Comment