Text corpus data analysis, with full support for international text (Unicode). Functions for reading data from newline-delimited 'JSON' files, for normalizing and tokenizing text, for searching for term occurrences, and for computing term occurrence frequencies, including n-grams.
| Version: | 0.10.1 |
| Depends: | R (≥ 3.3) |
| Imports: | stats, utf8 (≥ 1.1.0) |
| Suggests: | knitr, Matrix, testthat |
| Enhances: | quanteda, tm |
| Published: | 2020-04-16 |
| Author: | Leslie Huang [cre, ctb], Patrick O. Perry [aut, cph], Finn Årup Nielsen [cph, dtc] (AFINN Sentiment Lexicon), Martin Porter and Richard Boulton [ctb, cph, dtc] (Snowball Stemmer and Stopword Lists), The Regents of the University of California [ctb, cph] (Strtod Library Procedure), Carlo Strapparava and Alessandro Valitutti [cph, dtc] (WordNet-Affect Lexicon), Unicode, Inc. [cph, dtc] (Unicode Character Database) |
| Maintainer: | Leslie Huang <lesliehuang at nyu.edu> |
| BugReports: | https://github.com/leslie-huang/r-corpus/issues |
| License: | Apache License (== 2.0) | file LICENSE |
| URL: | https://leslie-huang.github.io/r-corpus/, https://github.com/leslie-huang/r-corpus |
| NeedsCompilation: | yes |
| CRAN checks: | corpus results |
| Reference manual: | corpus.pdf |
| Vignettes: |
Chinese text handling Introduction to corpus Stemming Words Text data in Corpus and other packages |
| Package source: | corpus_0.10.1.tar.gz |
| Windows binaries: | r-devel: not available, r-release: corpus_0.10.1.zip, r-oldrel: corpus_0.10.1.zip |
| macOS binaries: | r-release: corpus_0.10.1.tgz, r-oldrel: corpus_0.10.1.tgz |
| Old sources: | corpus archive |
| Reverse imports: | GenEst, stylest |
Please use the canonical form https://CRAN.R-project.org/package=corpus to link to this page.