Thursday, November 26, 2020

TLDR: Extreme Summarization of Scientific Documents

This new summarization feature (TLDRs (Too Long; Didn't Read)) is so extreme to be almost useless like Twitter feeds of maximum 150 characters!

Semantic Scholar (SS) has recently introduced this new feature. As a very heavy, long time user of SS, I do not find it too helpful!

The idea behind this new feature makes sense: "Staying up to date with scientific literature is an important part of any researchers’ workflow, and parsing a long list of papers from various sources by reading paper abstracts is time-consuming."
However, reducing a whole research paper to about one regular sentence summary is going too far in the wrong direction!

Andres Ng: "... TLDR was able to summarize research articles that averaged 5,000 words long using around 20 words. ... We’re thinking: Some papers can be summed up in a couple of dozen words, but many are so complex that no single sentence can do them justice. We look forward to n-sentence summarizers. ... Why it matters: At least 3 million scientific papers are published annually, Semantic Scholar estimates, and a growing portion of them describe innovations in AI, according to the AI Index from Stanford Human-Centered Artificial Intelligence. ..."

Perhaps, Semantic Scholar should have focused to extract the novelty or innovations out of the new research papers instead of brutally summarizing them!

Least, but last: It appears that Semantic Scholar relied too heavily on supervised learning for this research project. This means it cannot be easily or effectively scaled up towards millions of papers. "... To facilitate study on this task [TLDR], we introduce SciTLDR, a new multi-target dataset of 5.4K TLDRs over 3.2K papers. SciTLDR contains both author-written and expert-derived TLDRs ..."

"... For the moment, the software generates sentences only for the ten million computer-science papers covered by Semantic Scholar, but papers from other disciplines should be getting summaries in the next month or so, once the software has been fine-tuned ..."

[2004.15011] TLDR: Extreme Summarization of Scientific Documents (I have not yet read the paper)

Here is the Nature article on this:
tl;dr: this AI sums up research papers in a sentence Search engine’s tool for summarizing studies promises easier skim-reading.

Andrew Ng just covered this paper in his most recent The Batch newsletter of 11/25/2020.

No comments: