This paper presents a novel application of a semantic annotation system, named Cerno, to analyze research publications in electronic format.
This paper presents a novel application of a semantic annotation system, named Cerno, to analyze research publications in electronic format.Specifically, we address the problem of providing automatic support for authors who need to deal with large volumes of research documents. company in an evolutionary way, demonstrating more clearly the cost- e ectiveness of ... of Computer and Management Sciences, University of Trento, Italy [email protected] School of Computing, Queens University, Kingston, Canada [email protected] Abstract.
Cerno is a semantic annotation system based on software code analysis techniques and tools proven effective in the software analysis domain for processing billions of lines of legacy software source code .
Cerno has been already applied in several case studies involving the analysis of differently structured documents from the tourism sector [2,3].
These services provide access to large knowledge bases of research publications, allowing a user to search for a paper and retrieve the details of its publication.
In digital libraries, it is also possible to see the abstract and citations present in the paper.
Section 3 explains how this method can be adapted for the domain of electronic literature analysis annotation, providing some insights on the implementation details.
Section 4 illustrates the document analysis process on a specific example.The tool performance has been evaluated on a set of papers and preliminary evaluation results are promising.The backend of Biblio uses a standard relational database to store the results.The system has demonstrated good performance and scalability while yielding good quality results.In this work we also present preliminary experimental results of the technique on a set of published papers.Document source files may be stored as PDF, MS Word, La Te X, Post Script, HTML and other electronic formats; - Page layout.Depending on the requirements of the publisher, the layout of a document varies; for example, pages may be organized in a one/two-column fashion; for papers published in journals, the header may be present on some pages; footnotes may be allowed or not, and so on; - Document structure.At first sight, one may assume that scientific documents are semi-structured; this means that the arrangement of document elements can be predicted with some certainty: for instance, first there is a title, then a list of authors with affiliations, abstract, introduction and other sections; unfortunately, such structure is not universally accepted; - Semantic analysis.The key elements that a reader (for example, a reviewer) looks for are the problem considered and the main contributions.In this work, we address all these issues and provide semantic and structural annotation of document sections.This paper presents Biblio, a tool that is based on Cerno and is intended to support the analysis of research articles.