An Introduction to excerptr

Andreas Dominik Cullmann

2019-04-23, 17:04:49

excerptr is an R interface to the python package excerpts. See there for more on the Why.

Suppose you have a script

path <- system.file("tests", "files", "some_file.R", package = "excerptr")
cat(readLines(path), sep = "\n")

#######% % All About Me
#######% % Me
####### The above defines a pandoc markdown header.
####### This is more text that will not be extracted.
#######% **This** is an example of a markdown paragraph: markdown 
#######% recognizes only six levels of heading, so we use seven or
#######% more levels to mark "normal" text.
#######% Here you can use the full markdown 
#######% [syntax](http://daringfireball.net/projects/markdown/syntax).
#######% *Note* the trailing line: markdown needs an empty line to end
#######% a paragraph.
#######%

#% A section
##% A subsection
### Not a subsubsection but a plain comment.
############% Another markdown paragraph.
############%
####### More text that will not be extracted.

and you would want to excerpt the comments marked by ‘%’ into a file giving you the table of contents of your script. Then

excerptr::excerptr(file_name = path, run_pandoc = FALSE, output_path = tempdir())

## [1] 0

gives you

cat(readLines(file.path(tempdir(), sub("\\.R$", ".md", basename(path)))), 
    sep = "\n")

% All About Me
% Me
**This** is an example of a markdown paragraph: markdown 
recognizes only six levels of heading, so we use seven or
more levels to mark "normal" text.
Here you can use the full markdown 
[syntax](http://daringfireball.net/projects/markdown/syntax).
*Note* the trailing line: markdown needs an empty line to end
a paragraph.

# A section
## A subsection
Another markdown paragraph.

If you have pandoc installed, you can convert the markdown output into html:

is_pandoc_installed <- nchar(Sys.which("pandoc")) > 0 &&
                              nchar(Sys.which("pandoc-citeproc")) > 0
is_pandoc_version_sufficient <- FALSE
if (is_pandoc_installed) {
    reference <- "1.12.3"
    version <- strsplit(system2(Sys.which("pandoc"), "--version", stdout = TRUE), 
                        split = " ")[[1]][2]
    if (utils::compareVersion(version, reference) >= 0)
        is_pandoc_version_sufficient <- TRUE
}
if (is_pandoc_version_sufficient) 
    excerptr::excerptr(file_name = path, pandoc_formats = "html", 
                       output_path = tempdir())

## [1] 0

This runs pandoc on your excerpted comments and generates an html file you can view via:

if (is_pandoc_version_sufficient) 
    cat(readLines(file.path(tempdir(), sub("\\.R$", ".html", basename(path)))), 
        sep = "\n")

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <meta name="author" content="Me" />
  <title>All About Me</title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<div id="header">
<h1 class="title">All About Me</h1>
<h2 class="author">Me</h2>
</div>
<p><strong>This</strong> is an example of a markdown paragraph: markdown recognizes only six levels of heading, so we use seven or more levels to mark &quot;normal&quot; text. Here you can use the full markdown <a href="http://daringfireball.net/projects/markdown/syntax">syntax</a>. <em>Note</em> the trailing line: markdown needs an empty line to end a paragraph.</p>
<h1 id="a-section"><span class="header-section-number">1</span> A section</h1>
<h2 id="a-subsection"><span class="header-section-number">1.1</span> A subsection</h2>
<p>Another markdown paragraph.</p>
</body>
</html>

You browse it via

browseURL(file.path(tempdir(), sub("\\.R$", ".html", basename(path))))