Corpus analysis software
Appearance
Back to: Software
Tool | Description | Categories | Platform | Pricing |
---|---|---|---|---|
@nnotate | Semi-automatic annotation of corpus data | annotation | Solaris, Linux | Free (with licence agreement) |
aConCorde | Multilingual concordance tool (English and Arabic) | concordancer | Linux, Mac, Windows | Free |
almaneser / SALTA | Semantic Parser/POS Tagger for English | parser, pos tagger, tagging | Free (with licence agreement) | |
AMALGAM | Tool for grammatical annotation (POS and phrase structure). Tagging a text that was entered via email. | annotation | Web | Free |
ANC2go | A web service that allows users to create custom sub-corpora of the ANC | ANC, sampling | Web | Free |
ANNIS | Search and visualization tool for multi-layer linguistic corpora with diverse types of annotation | search, visualization | Web (or Linux, Mac, Windows) | Free |
AntCLAWSGUI | Front-end interface for CLAWS tagger | pos tagger, tagging | Windows | Free |
AntConc | Corpus analysis toolkit | wordlists, concordancer, keywords | Linux, Mac, Windows | Free |
AntCorGen | A freeware discipline-specific corpus creation tool. | compilation, text analysis | Windows, Mac, Linux | Free |
AntFileConverter | Freeware tool to convert PDF and Word (DOCX) files into plain text | converter | Windows, Mac | Free |
AntFileSplitter | A freeware text file splitting tool. | compilation | Windows, Mac, Linux | Free |
AntGram | A freeware n-gram and p-frame (open-slot n-gram) generation tool. | text analysis, n-grams, p-frames, lexical bundles, lexical frames | Windows, Mac, Linux | Free |
AntMover | Tool for text structure (moves) analysis | text analysis | Windows | Free |
AntPConc | Corpus analysis toolkit for files encoded with UTF-8 | wordlists, concordancer | Windows, Mac | Free |
AntWordProfiler | Tool for profiling vocabulary level and text complexity | text complexity | Linux, Mac, Windows | Free |
ANVIL | A tool for video annoation. | video, annotation | Windows, Linux, Mac | Free |
ATLAS.ti | A sophistaticated QDA software for mixed methods approaches | qda, mixed methods | Windows, Mac, Android, iOS | Commerical |
Atomic | Multi-layer corpus annotation platform. | annotation | Linux, Mac, Windows | Free |
Authorial Voice Analyzer (AVA) | A tool for the analysis of interactional metadiscourse features. | discourse, voice | Mac | Free |
BFSU Collocator | A collocation analysis toolkit | collocation, statistics | Windows | Free |
BFSU English Sentence Segmenter | A simple sentence segmenter | segmentation | Windows | Free |
BFSU Qualitative Coder | A tool for manual coding of corpora | coding, annotation | Windows | Free |
BFSU Sentence Collector | A pedagogic concordancer | concordaner, ddl, pedagogy, language learning | Windows | Free |
BFSU Stanford Parser | A simple parser | parser | Windows | Free |
BNCWeb | BNCweb is a web-based client program for searching and retrieving lexical, grammatical and textual data from the British National Corpus (BNC). | analysis, concordancer | Web | Free |
BootCat | Tool for crawling and compiling data from the web with a list of seed words. | crawler, compilation | ||
Bow | Statistical Language Modeling, Text Retrieval, Classification and Clustering | text analysis | UNIX, Linux | Free |
BSFU ParaConc | A parallel concordancer | concordancer, parallel | Windows | Free |
BSFU PowerConc | A fairly powerful concordancer | concordancer | Windows | Free |
BSFU Stanford POS Tagger | A PoS tagger | pos tagger, tagging | Windows | Free |
CasualConc | CasualConc is a concordance program that runs natively on Mac 10.9 or late | concordancer | OSX | Free |
CATMA (Computer Assisted Text Markup and Analysis) | An undogmatic, complex annotation and analysis package | markup, analysis, visualization, annotation | Web | Free |
Chared | Tool for detecting the character encoding of a text | text analysis | Python 2.6 or later | Free |
Chi-Square and Log Likelihood Calculator | A simple tool for calculating Chi-squared and LL | statistics | Windows | Free |
CLaRK | XML Based System For Corpora Development | compilation | Free (with licence agreement) | |
CLAWS POS-Tagger | CLAWS- POS Tagger | pos tagger, tagging | Web | Via licence or in-house tagging at Lancaster |
CLiC | A corpus tool to support the analysis of literary texts. | concordancer | Web | Free |
Coh-Metrix | A web-based system to compute cohesion and coherence metrics. | cohesion, coherence | Web | Free |
Colligator 2.0 | A colligation query/analysis toolkit | colligation | Windows | Free |
Collocate | Tool for the extraction of concordances and collocations | concordancer | Windows | 35 USD |
CoMOn | A tooil for corpus matching analysis | matching | Web | Free |
ConcGramCore | A modern rewrite of ConcGram (Greaves 2005) that allows efficiently searching for concgrams. | collocation, concgram | Windows | Open Source |
Concordance Randomizer | A concordance randomizer | concordancer | Windows | Free |
Concordancer | Online tool for frequency counts and text clouds | concordancer | Web | Free |
CorpKit | An advanced modern corpus toolkit with an emphasis on visualization and annotated corpora. | wordlists, parsing, concordancer, visualization | Linux, Mac, Windows (Python) | Free |
CorporaCoCo | A set of R functions used to compare co-occurrence between corpora | collocation | R | Free |
Corpus Presenter | Tree tagger and corpus analysis software | wordlists, parsing, concordancer, visualization | Windows | Free |
Corpus-Tools | Text annotation and analysis tool | text analysis | Free | |
CorpusExplorer | A complex corpus analysis toolkit combining 45 interactive tools. | visualization, exploration, tagging, text analysis | Windows | Free, Open Source |
CorpusSearchLite | Searches parsed corpora in the Penn Treebank format | searching | ||
CPQWeb | Overview of and access to a wide range of corpora | database | Web | Free (once registered) |
DART | An annotation tool and research environment for annotating dialogues. | dialogues, annotation | Windows | Free |
DepCluster | A tool used for lexeme-based collexeme analysis. | lexis, collexeme | ||
DeTagging Tool | A tool that strips annotation/tags from files | cleaning, annotations | Windows | Free |
Dexter | Tool for text annotation | annotation | Linux, Mac, Windows | Free |
DISCO | Corpus pre-processing tool for a variety of languages that Dallows to retrieve the semantic similarity between arbitrary words and phrases | tokenization, annotation | Windows, Linux, Solaris, and MacOS | Free |
DisMo | An automatic multi-level annotator for spoken language corpora. | spoken, multilevel, multi-layer, pos tagger, annotation, tagging | ||
DocuScope | A tool for computer-aided rhetorical anyalysis | rhetorical analysis, text analysis, visualization | Windows (Java) | Free |
ELAN | Transcription and annotation of sound or video files | transcription, annotation | Linux, Mac, Windows | Free |
Emdros | A database engine fpr analyzed and annotated text. | database, annotation, query | Windows, Linux, Mac | Free, Open Source |
EncodeAnt | Tool for the detection and conversion of character encodings | converter | Windows, Mac | Free |
EXMARaLDA | Tool for transcription, annotation, corpus analysis of spoken data | transcription, annotation, analysis | Free | |
f4analyse | QDA software specifically geared towards interview (spoken) data | qda, spoken | Windows, Mac, Linux | Commerical |
f4transkript | Software for transcribing audio data | transcription, spoken | Windows, Max, Linux | Commercial |
FireAnt | Social media analysis toolkit | downloader, converter | Windows, Mac | Free |
FLAIR (2.0) | An online tool for language teachers and learners that analyzes grammatical constructions and readability on the fly. | constructions, readability | Web | Free |
Flesh PC | Calculating Flesh-scores | readability, statistics | Windows | Free |
FrameNet | Dictionary of more than 10,000 word senses, tagged for semantic roles (according to Fillmorean Frame Semantics) | semantic parser | Web | Free |
gensim | Deep learning via word2vec | word2vec | Multi (Python) | Free, Open Source |
Gephi | A toolkit for network analysis | network analysis, graphs | Windows, Linux, Mac | Free |
Google Ngrams | An ngram-viewer for the whole of Google Books | ngrams | Web | Free |
GraphColl | Tool for building and exploring networks of linguistic collocations | visualization | Windows, Mac | Free |
Gsearch | Tool for syntactic pattern matching | pattern matching | ? | Down |
HeidelGram Web-Based Tools | Basic corpus analysis toolkit for the HeidelGram Corpus | wordlists, concordancer | Web | Free |
HeidelTime | A multilingual, domain-sensitive temporal tagger | temporal tagger, timex3 | Java | Free, Open Source |
Heimdall | A tool that searches a text for sequences written in other languages. | language detection | Linux, Windows, Mac | Open Source |
HGSimpleCorpusNetwork | Batch frequency analysis on corrupted (e.g. OCR) corpus data and generation of network analysis data. | wordlists, network analysis | Multi (Python) | Free, Open Source |
HTST Samuels | Historical Thesaurus Semantic Tagger via web-interface | semantic tagger | Web | Free |
ICARUS | Search and visualization tool for dependency trees | visualization | Free | |
ICEweb | A tool for compiling, downloading, and analyzing web corpora in accordance with the ICE | ICE, compilation, crawler | Windows | Free |
IMS Corpus Workbench | Tool for sorting frequencies in corpora | wordlists, concordancer | Web and local version | Free |
Intelligent Archive | Managing corpora for stylometry | stylometry, management | Windows, Unix, Linux, Mac | Free |
jTokenizer | Tokenizing natural language | tokenizer | Free | |
JusText | Tool for removing boilerplate content, such as navigation links, headers, and footers from HTML pages | boilerplate remover | Python | Free |
juxta | Comparing and collating multiple witnesses to single textual works | textual criticism, witnesses | Windows, Unix, Linux, Mac | Free |
Kaleidographic | A dynamic and interactive visualization tool for multivariate data. | visualization | Web | Free |
KAT Tool | Grouping patterns based on search terms | patterns, concordancer | Windows | Free |
kdiff3 | KDiff3 is a diff and merge program. | comparison | Windows, Linux, OSX | Free, Open Source |
Keyword Plus | A keyword generation/analysis tool | keywords | Windows | Free |
kfNgram | A simple tool for generating n-grams | n-grams, p-frames | Windows | Free |
Khepri | A view-based toolfor exploring (historical sociolinguistic) data | sociolinguistics, visualization | JavaScript, Web | Free, Open Source |
KoGra-R | An R-based online tool that provides statistical measures for corpus-based frequencies | statistics, frequency analysis | Web | Free |
KorAP | A complex platform for corpus analysis developed at the IDS in Mannheim | analysis, multilevel, multi-layer | Web | Free, Open Source |
LancsBox | The Lancaster Desktop Corpus Toolbox; Software package for the analysis of language data and corpora | collocation, frequency analysis, keywords | Java | Free (CC) |
langid.py | A standalone language identification tool written in Python. | language detection | Linux, Windows, Mac | Open Source |
LDA-Toolkit | A toolkit for linguistic discourse and image analysis. | discourse, images | Windows | Free |
Leipzig Corpus Miner | A modern text mining infrastructure for qualitative data analysis | qda, mixed methods, text mining, lexicometrics, topic models, information retrieval | Linux, Windows, Mac (via VM) | Free |
LEXA | A complex lemmatizer. | lexis, lemmaizer | Free | |
LexisNexis | A database containing (new and old) news articles. They also have other (business) data. | news, data | Web | Commercial |
lexpan | A tool to analyze syntagmatic structures in corpora. Especially useful to analyze fillers and slots. | syntagmatic, slots | Windows, Linux, Mac | Free |
LightSide | A machine learning workbench. | machine learning | Linux, Windows | Free, Open Source |
Linguistica | Word segmentation and morphological analysis? | segmentation, morphological tagger | Linux, Mac, Windows | Free |
LIWC | A tool that tries to compute scores for different emotions, thinkings styles, and social concerns. | lexical analysis, style | Web | Free (but commerical) |
MALLET | Package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text | statistical nlp | Windows | Free |
MAT - Multidemensional Analysis Tagger | A tagger for MDA (Biber et al.) by Andrea Nini. | tagging, MDA | Windows, Mac | Free |
MAXQDA | Sophisticated QDA software that works with multimodal data and supports mixed methods approaches | qda, mixed methods | Windows, Mac, Android, iOS | Commerical |
MLCT | Tool for building and processing corpora | concordancer, sentence boundary detector | Free | |
MMAX2 | A multi-level annotation tool | annotation, multilevel, multi-layer | Java | Free, Open Source |
MonoConc Esy | Concordancing and text search tool that allows primary and secondary concordancing | concordancer, sentence boundary detector | Free for non-commerical research | |
MorphAdorner | Tool for performing morphological tagging of texts | morphological tagger | Free | |
N-Gram Processor (NGP) | A perl based tool for the creation and processing of n-gram lists out of text files. | n-grams | Linux, Windows, Mac | Open Source |
NATAS | historical, python, lexis | Linux, Windows, Mac | Open Source | |
Natural Language Toolkit | Platform for building Python programs to work with human language data | tokenizer, tagger | Unix, Mac, Windows (+Python 3.4) | Free |
NooJ | Tags texts and corpora (i.e. sets of text files) at the Orthographical, Lexical, Morphological, Syntactic and Semantic levels | multilevel tagger | Windows, Mac, LINUX and BSD Unix | Free |
NoSketch Engine | Word sketches, thesaurus, keyword computation, corpus creation | corpus creation, semantic analysis, wordlists | Free | |
Onion | Tool for removing duplicate parts from large collections of texts | duplicate remover | Free | |
Online Graded Text Editor | Tool for profiling a text's vocabulary level and complexity | text analysis, editing, vocabulary | OSX, Windows | Free |
OpenConc | Tool for concordancing | concordancer | Free | |
PALinkA | Annotation tool | annotation | Down | |
ParaConc | A bilingual/multilingual concordancer | concordancer | Non-Free | |
Pareidoscope | Pareidoscope is a collection of tools for determining the association between arbitrary linguistic structures, such as collocations, collostructions or between structures. | collocation, constructions | Free | |
PatCount | A pattern counting tool with powerful statistic capabilities and regex support | patterns | Windows | Free |
Pattern Builder | A tool helping with regular expressions and PoS tags | regex, tagging | Windows | Free |
Pepper | Conversion between linguistic formats, e.g. from TEI to ANNIS to Tiger XML to EXMARaLDA. | conversion | Free | |
Phonological CorpusTools (PCT) | Phonological analysis on transcribed corpora | phonology | Multi (Python) | Free |
PhraseContext | Tool for wordlists, concordancing, collocation, TTR, | wordlists, concordancer | 35€ | |
Pipoca (formerly openQDA) | A web-based QDA software | qda, mixed methods | Web | Free, Open Source |
Praaline | Praaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora. | speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysis | Windows, Mac, Linux | Free / Open Source (GPL3) |
PRAAT | A tool for doing phonetics by computer | phonetics, spoken | Windows, Mac, Linux | Open Source |
ProtAnt | Tool for prototypical text analysis | wordlists | Windows, Mac | Free |
pysupersensetagger | Analyses texts for MWE and supersenses. | text analysis | Unix, Mac (Python) | Free |
PyXMLConc | Concordancer for XML files with automatic tag and attribute detection. | concordancer | Multi (Python), Windows | Free, Open Source |
Quanteda | A python library used to study neologisms in historical English corpora. | R | Linux, Windows, Mac | Open Source |
Query Tool for the Edenburgh Associative Thesaurus | A query tool for the EAT | query, thesaurus | Windows | Free |
Readability Analyzer | A tool for generating various readability statistics | readability, statistics | Windows | Free |
Readability Webfx | A tool to check how easy or difficult (readability) a given text is. | readability | Web | Free |
RSTTool | Tool that can annotate texts for constituency and rhetorical structure | annotation | Windows, Macintosh, UNIX and LINUX | Free |
Salt | Meta models for linguistic data. | meta modelling | Free | |
SarAnt | Tool for batch search and replacing | editing, searching | Windows | Free |
SegmentAnt | Tool for the segmentation of Japanese and Chinese | segmentation, tokenizing | Windows, Mac, Linux | Free |
Shinyconc | ShinyConc is a framework for generating custom web-based concordancers and is written in R and R Shiny. | concordancer, kwic, r | Open Source / R | Free |
Simple Concordance Program | Tool for concordance and word listing that works with many languages | concordancer | Windows, Mac | Free |
SketchEngine | Word sketches, thesaurus, keyword computation, corpus creation | corpus creation, semantic analysis, wordlists, keywords | 30 day trial or 4,85€/month | |
SpiderLing | Software for obtaining text from the web useful for building text corpora | crawler | Free | |
SPPAS | A tool for the automatic annotation and analysis of speech. | speech, spoken, annotation | Windows, Mac, Linux | Free, Open Source |
SPre | Tool for segmenting and annotating texts | annotation | Free | |
Stanford Log-linear POS Tagger | POS Tagger (with Penn Treebank Tagset) for English, Arabic, Chinese, German | pos tagger, tagging | Free | |
Stanford Topic Modeling Toolbox | The Stanford Topic Modeling Toolbox (TMT) allows users to perform topic modeling on texts imported from spreadsheets. It supports both LDA and labelled LDA. | topic modeling | Java | Free |
Stylo for R | Tool for computational stylistic analysis (authorship attribution, genre analysis) | text analysis | Free | |
Sub-Corpus Creator | A tool for creating sub-corpora based on search searchs and metadata | compilation | Windows | Free |
Synpathy | Tool for manual syntactic annotation | annotation | Windows, Mac, Linux | Free |
TAACO | TAACO is a tool that calculates 150 indices of textual/lexical cohesion. | cohesion, lexical sophistication | All | Free, Open Source |
TAALES | TAALES measures over 400 indices of lexical sophistication. | lexical sophistication | Mac, Linux, Windows | Open Source |
TagAnt | Part-of-speech tagging tool built on Tree Tagger | pos tagger, tagging | Windows, Mac, Linux | Free |
TagCrowd | A simple tool for generating tag/word clouds online | word clouds, visualization | Web | Free |
Tagxedo | A tool for generating word clouds. | word clouds, visualization | Web | Free |
TASX-Annotator | Tool for multilevel annotation and transcription of (multi-channel) video and audio data. | multilevel tagger, transcription | Windows, Mac, Linux, Solaris | Down |
Text Analysis Computing Tools (TACT) | A simple, fairly old concordancer. | concordancer | Commercial | |
Text Variation Explorer | The Text Variation Explorer TVE is a tool for exploring the effect of window size on various common linguistic measures. It visualizes these measures and allows for PCA/Cluster analysis. | visualization, variation analysis | Java | Free |
Text Visualization Browser | A survey/gallery of text visualizations | visualization | Web | Free |
Textanz | Language analysis program that produces frequency lists, word lists, parts of speech tags. | wordlists, concordancer, pos tagger, dictionary | Any OS | Free, Open Source |
TextArc | A tool for visualizing the structure of texts. | visualization | ||
TextDirectory | TextDirectory is a tool for aggregating text files based on various filters and transformation functions. | compilation, text-processing, python | Windows, Linux, OSX | Free, Open Source |
Textplot | A tool for mapping a document into a network of terms in order to visualize the topic structure. | visualization, network analysis | Python | Free, Open Source |
Textplot | A tool for converting documents into (semantic) networks based on KDE. | semantics, network analysis, graphs | Linux, Windows, Mac | Open Source |
TextSmith Tools | A tool for genre-informed phraseological profiles | phraseology, segmentation | Windows | Free |
TextSTAT | Tool for creation and manipulation of linguistic data from different languages | corpus creation, concordancer | Windows, GNU/Linux und MacOS | Free |
The (Phonetic) Transcription Editor | An editor for creating phonetic transcriptions | transcription | Windows | Free |
The Great American Word Mapper | A visualization tool for the top 100,000 words used in American English twitter data. | twitter, lexis, social media | Web | Free |
The Simple Corpus Tool | A corpus analysis toolkit that supports XML annotations. | concordancer, annotation, xml, frequency | Windows | Free |
The Simple PoS Tagger | A simply PoS-tagger utilizing Perl Lingua::EN:Tagger | pos tagger, tagging | Windows | Free |
The SPAADIA concordancer | A concordancer for the SPAADIA corpus | concordancer, SPAADIA | Windows | Free |
The Text Feature Analyser | A tool for investigating textual features and various meassures | text analysis, concordancer | Windows | Free |
Thesaurus.com | English language thesaurus with links to English dictionary and translation sites. | efl, esl, linguistics | Not sure, I'm not a programmer or geek. | Free |
TigerSearch | Tool for searching syntactically and POS-tagged corpora | search tool, pos tagger | Free | |
TnT - Thorsten Brants's PoS Tagger | A simple PoS-Tagger | pos tagger, tagger, tagging | Windows/Unix | Available via Stanford |
Tree Editor TrEd 2.0 | Graphical editor and viewer for tree-like structures. | visualization | Windows, GNU/Linux und MacOS | Free |
TreeTagger | Tool for annotating text with part-of-speech and lemma information | pos tagger, annotation | Windows, Mac, Linux | Free |
TurboParser | Multilingual dependency parser with linear programming | parser | Free | |
Twarc | A command line tool (and Python library) for archiving Twitter JSON | twitter, social media | Python, Windows, Linux, Mac | Free, Open Source |
Tweet NLP | Tweet tokenizer, POS Tagger, hierarchical word clusters, and a dependency parser for tweets, along with annotated corpora and web-based annotation tools. Clusters: http://www.cs.cmu.edu/~ark/TweetNLP/cluster_viewer.html | pos tagger, tokenizer, parser | Free | |
TWINT | A Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API. | twitter, social media, scraping | Linux, Windows, Mac | Open Source |
TXM | XML & TEI compatible text analysis software based on TreeTagger, the CQP search engine and the R statistical environment. | text analysis, concordancer, r, statistics, search tool, tokenizer, xml | Windows,Mac,Linux,Tomcat | Free |
UAM CorpusTool | Text annotation tool and statistics for various types of linguistic analysis and multilayer annotation | annotation, multi-layer | Free | |
UAM ImageTool | Image annotation tool for visual data corpora | annotation | Free | |
Unitok | Tool that splits texts into tokens | tokenizer | Free | |
VARD | Spelling variant detection and deletion in historical corpora (particularly EModE) | variant detector | Free (with academic email) | |
VariAnt | Tool for the detection of spelling variants | variant detector | Windows | Free |
Voyant | A web-based reading/analysis toolkit for digital texts. | reading, text analysis | Web | Free |
VU Amsterdam Metaphor Identification Corpus | Corpus tool for metaphor identification | metaphor identification, metaphors | Web and local version | Free |
WConcord 3.0 | A full featured concordancer | concordancer | Free | |
WebAnno | A web-based annotation tool | annotation, web-based | Web | Free |
WebLicht | WebLicht is an execution environment for automatic annotation of text corpora embedded with the CLARIN-D project. | annotation | Web | Free (CLARIN-D Account needed) |
Wmatrix | Tool for corpus analysis and comparison | wordlists, concordancer, pos tagger, semantic tagger, keywords | Web | £50 per username per year |
WordCruncher | A tool for analyzing ebooks. | concordancer, frequency, ebooks | Windows, Mac, iOS | Free |
WordFish | Extract political positions from text documents. | political science | R | Free |
WordHoard | Close reading and scholarly analysis of deeply tagged texts | close reading | Windows, Unix, Linux, Mac | Free |
Wordle | A tool for generating word clouds. | word clouds, visualization | Web | Free |
WordMap | A simple web-based word-map / wordcloud generator. | visualization, web-based | Web | Free |
Wordscores | A tool (approach) to extract dimensional information from political texts | political science, information retrieval | Free | |
Wordsmith | One of the most established corpus toolkits providing a variety of functionality | concordancer, wordlists, statistics, keywords | Windows | 60€ per licence |
Wordstatix | Corpus analysis tool | concordancer | Free | |
Worldbuilder | Tool for annotation and visualisation in analysis applying text-world-theory | annotation, visualization | ||
Xaira | Indexing and analysis of XML resources, | indexing, xml | Windows | Free, Open Source |
YACSI Chinese Tokeniser / PoS Tagger | A Chinese tokenizer and PoS tagger | chinese, tokenizer, pos tagger | Windows | Free |
Log-Likelihood and Effect-Size Calculator | An online calculator for log-likelihoof and effect sizes. | statistics | Web | Free |
CorefAnnotator | An annotation tool for coreference. | corerference, annotation | Windows, Linux, Mac | Open Source |
SoMaJo | A tokenizer and sentence splitter for German and English web and social media texts. | tokenizer, sentence boundary detector | Linux, Mac, Windows | Free, Open Source |
SoMeWeTa | A part-of-speech tagger with support for domain adaptation and external resources. | tagging, pos, pos tagger | Linux, Mac, Windows | Free, Open Source |
COCA_MWU20 ColloGram | A collocation analysis tool based on a COCA collocation family list. | collocation | Windows | Free |
RDQA | An R package for Qualitative Data Analysis (QDA). | qda | Windows, Linux/FreeBSD, Mac | Free |
KWords | A tool for keyword identification and analysis. | keywords, CADS, concordancer, collocation analysis | Windows, Linux, Mac | Free |
Range Program (formerly VocabProfiler) (Paul Nation) | A tool for for analyzing the vocabulary load of texts. | voabulary, lexis | Windows | Free |
Frequency Program (Paul Nation) | A tool that turns a text or texts into a word list with frequency figures. | vocabulary, frequency, lexis | Windows | Free |
Compleat Lexical Tutor | A website featuring various tools and materials for data-driven language learning. | vocabulary, language learning, lexis, web-based, ddl | Web | Free |
WordSift | A word cloud generator, with dynamic filters, links to images, and KWIC capabilities. Works with various types/formats of word lists. | word cloud, vocabulary profiling, lexis, vocabulary, language teaching | Web | Free |
KHCoder | A free software for quantitative content analysis or text mining that supports multiple languages. | correspondance analysis, collocation analysis, frequency analysis | Windows, Mac, Linux | Free, Open Source |
MaltParser | A system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data and to parse new data using an induced model. | parser, dependency parsing | Windows, Mac, Linux | Free |
MaltOptimizer | A system for parser optimization using the open-source system MaltParser. | parser, dependency parsing | Windows, Mac, Linux | Free |
Link Grammar Parser | A syntactic parser of English, Russian, Arabic and Persian (and others), based on Link Grammar. | parser, syntax, grammar | Linux, Mac, Windows | Free |
ANTLR | ANother Tool for Language Recognition is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. | parser generator | Linux, Mac, Windows | Free, Open Source |
GOLD Parsing System | A parsing system that can be used to develop programming languages, scripting languages and interpreters. | parser generator | Linux, Mac, Windows | Free |
JavaCC | A popular parser generator for use with Java applications. | parser generator | Linux, Mac, Windows | Free |
Lextutor Web Concordancers | Web concordancers targeted towards DDL | collocations, concordancer, DDL | Web | Free |
wordspace | An R package for distributional semantics | semantics, distributional semantics, R | R | Free |
UCS Toolkit | A toolkit (libraries and scripts) for the statistical analysis of coocurence data. | collocation, coocurence, statistics | R, Perl | Free |
Coquery | A free corpus query tool to search, analyze, and visualize corpora | query, visualization | Linux, Mac, Windows | Free |
WordWanderer | A web-based visualization/analysis tool which allows its users to "wander" a text. | visualization, concordancer | Web | Free |
Cortext Manager | A scriptable "ecosystem" for modeling and exploring corpora. Especially useful for creating topic models and co-occurence networks. | NER, topic models, visualization, word2vec, collocation, keywords | Web | Free |
AMesure | A web-based system to analyse the reading complexity of French texts | text complexity, readability | Web | Free |
CEFRLex | A web-based tool to analyse the lexical complexity of words in texts according to the CEFR scale in various languages. | text complexity, readability, language learning | Web | Free |
PACTE | A flexible collaborative text annotation platform that is currently in development. | annotation | Web | Free (for research) |
CLAN | A tool for searching and analyzing child language data in the CHAT transcription format. | search, wordlists, collocation, child language, CHILDES | Windows, Mac, Unix | Free, Open Source |
NVIVO | A commercial Computer-Assisted Qualitative Data Analysis Software (CAQDAS) software that works with both qualitative and mixed methods data | qda, mixed methods | Windows, Mac | Commercial |
QDA Miner | A commercial QDA tool for coding, annotating, retrieving and analyzing collections of documents and images. | qda, mixed methods, text analysis | Windows | Commercial |
tagtog | A text annotation tool specifically built to train AI/ML models. | machine learning, annotation | Cloud-Based | Commerical |
Calc: Corpus Calculator | A web-based tool to calculate basic corpus statistics, for example, comparing frequencies across corpora. | statistics | Web | Free |
gwic | A very basic KWIC tool written in Go. | concordancer, KWIC | Windows, Mac, Linux | Open Source |
ACTRES Rhetorical Movel Tagger | A tool for tagging rhetorical moves. | tagging, rhetorics | Web | Commerical |
ACTRES Corpus Browser | A tool for retrieving tagged information in more than one language. | tagging | Web | Commerical |
ACTRES Corpus Manager | A corpus compilation and analysis platform with a focus on multilingual and parallel corpora. | compilation, corpus management, annotation, multilingual | Web | Commercial |
VideoAnt | A web-based tool to annotate and discuss web-hosted videos. | annotation, video | Web | Free |