Jump to content

Corpus analysis software

From michaelbeijer.co.uk
Back to: Software
Tools for Corpus Linguistics - A comprehensive list of 242 tools used in corpus analysis Original source: https://corpus-analysis.com/
Tool Description Categories Platform Pricing
@nnotate Semi-automatic annotation of corpus data annotation Solaris, Linux Free (with licence agreement)
aConCorde Multilingual concordance tool (English and Arabic) concordancer Linux, Mac, Windows Free
almaneser / SALTA Semantic Parser/POS Tagger for English parser, pos tagger, tagging Free (with licence agreement)
AMALGAM Tool for grammatical annotation (POS and phrase structure). Tagging a text that was entered via email. annotation Web Free
ANC2go A web service that allows users to create custom sub-corpora of the ANC ANC, sampling Web Free
ANNIS Search and visualization tool for multi-layer linguistic corpora with diverse types of annotation search, visualization Web (or Linux, Mac, Windows) Free
AntCLAWSGUI Front-end interface for CLAWS tagger pos tagger, tagging Windows Free
AntConc Corpus analysis toolkit wordlists, concordancer, keywords Linux, Mac, Windows Free
AntCorGen A freeware discipline-specific corpus creation tool. compilation, text analysis Windows, Mac, Linux Free
AntFileConverter Freeware tool to convert PDF and Word (DOCX) files into plain text converter Windows, Mac Free
AntFileSplitter A freeware text file splitting tool. compilation Windows, Mac, Linux Free
AntGram A freeware n-gram and p-frame (open-slot n-gram) generation tool. text analysis, n-grams, p-frames, lexical bundles, lexical frames Windows, Mac, Linux Free
AntMover Tool for text structure (moves) analysis text analysis Windows Free
AntPConc Corpus analysis toolkit for files encoded with UTF-8 wordlists, concordancer Windows, Mac Free
AntWordProfiler Tool for profiling vocabulary level and text complexity text complexity Linux, Mac, Windows Free
ANVIL A tool for video annoation. video, annotation Windows, Linux, Mac Free
ATLAS.ti A sophistaticated QDA software for mixed methods approaches qda, mixed methods Windows, Mac, Android, iOS Commerical
Atomic Multi-layer corpus annotation platform. annotation Linux, Mac, Windows Free
Authorial Voice Analyzer (AVA) A tool for the analysis of interactional metadiscourse features. discourse, voice Mac Free
BFSU Collocator A collocation analysis toolkit collocation, statistics Windows Free
BFSU English Sentence Segmenter A simple sentence segmenter segmentation Windows Free
BFSU Qualitative Coder A tool for manual coding of corpora coding, annotation Windows Free
BFSU Sentence Collector A pedagogic concordancer concordaner, ddl, pedagogy, language learning Windows Free
BFSU Stanford Parser A simple parser parser Windows Free
BNCWeb BNCweb is a web-based client program for searching and retrieving lexical, grammatical and textual data from the British National Corpus (BNC). analysis, concordancer Web Free
BootCat Tool for crawling and compiling data from the web with a list of seed words. crawler, compilation
Bow Statistical Language Modeling, Text Retrieval, Classification and Clustering text analysis UNIX, Linux Free
BSFU ParaConc A parallel concordancer concordancer, parallel Windows Free
BSFU PowerConc A fairly powerful concordancer concordancer Windows Free
BSFU Stanford POS Tagger A PoS tagger pos tagger, tagging Windows Free
CasualConc CasualConc is a concordance program that runs natively on Mac 10.9 or late concordancer OSX Free
CATMA (Computer Assisted Text Markup and Analysis) An undogmatic, complex annotation and analysis package markup, analysis, visualization, annotation Web Free
Chared Tool for detecting the character encoding of a text text analysis Python 2.6 or later Free
Chi-Square and Log Likelihood Calculator A simple tool for calculating Chi-squared and LL statistics Windows Free
CLaRK XML Based System For Corpora Development compilation Free (with licence agreement)
CLAWS POS-Tagger CLAWS- POS Tagger pos tagger, tagging Web Via licence or in-house tagging at Lancaster
CLiC A corpus tool to support the analysis of literary texts. concordancer Web Free
Coh-Metrix A web-based system to compute cohesion and coherence metrics. cohesion, coherence Web Free
Colligator 2.0 A colligation query/analysis toolkit colligation Windows Free
Collocate Tool for the extraction of concordances and collocations concordancer Windows 35 USD
CoMOn A tooil for corpus matching analysis matching Web Free
ConcGramCore A modern rewrite of ConcGram (Greaves 2005) that allows efficiently searching for concgrams. collocation, concgram Windows Open Source
Concordance Randomizer A concordance randomizer concordancer Windows Free
Concordancer Online tool for frequency counts and text clouds concordancer Web Free
CorpKit An advanced modern corpus toolkit with an emphasis on visualization and annotated corpora. wordlists, parsing, concordancer, visualization Linux, Mac, Windows (Python) Free
CorporaCoCo A set of R functions used to compare co-occurrence between corpora collocation R Free
Corpus Presenter Tree tagger and corpus analysis software wordlists, parsing, concordancer, visualization Windows Free
Corpus-Tools Text annotation and analysis tool text analysis Free
CorpusExplorer A complex corpus analysis toolkit combining 45 interactive tools. visualization, exploration, tagging, text analysis Windows Free, Open Source
CorpusSearchLite Searches parsed corpora in the Penn Treebank format searching
CPQWeb Overview of and access to a wide range of corpora database Web Free (once registered)
DART An annotation tool and research environment for annotating dialogues. dialogues, annotation Windows Free
DepCluster A tool used for lexeme-based collexeme analysis. lexis, collexeme
DeTagging Tool A tool that strips annotation/tags from files cleaning, annotations Windows Free
Dexter Tool for text annotation annotation Linux, Mac, Windows Free
DISCO Corpus pre-processing tool for a variety of languages that Dallows to retrieve the semantic similarity between arbitrary words and phrases tokenization, annotation Windows, Linux, Solaris, and MacOS Free
DisMo An automatic multi-level annotator for spoken language corpora. spoken, multilevel, multi-layer, pos tagger, annotation, tagging
DocuScope A tool for computer-aided rhetorical anyalysis rhetorical analysis, text analysis, visualization Windows (Java) Free
ELAN Transcription and annotation of sound or video files transcription, annotation Linux, Mac, Windows Free
Emdros A database engine fpr analyzed and annotated text. database, annotation, query Windows, Linux, Mac Free, Open Source
EncodeAnt Tool for the detection and conversion of character encodings converter Windows, Mac Free
EXMARaLDA Tool for transcription, annotation, corpus analysis of spoken data transcription, annotation, analysis Free
f4analyse QDA software specifically geared towards interview (spoken) data qda, spoken Windows, Mac, Linux Commerical
f4transkript Software for transcribing audio data transcription, spoken Windows, Max, Linux Commercial
FireAnt Social media analysis toolkit downloader, converter Windows, Mac Free
FLAIR (2.0) An online tool for language teachers and learners that analyzes grammatical constructions and readability on the fly. constructions, readability Web Free
Flesh PC Calculating Flesh-scores readability, statistics Windows Free
FrameNet Dictionary of more than 10,000 word senses, tagged for semantic roles (according to Fillmorean Frame Semantics) semantic parser Web Free
gensim Deep learning via word2vec word2vec Multi (Python) Free, Open Source
Gephi A toolkit for network analysis network analysis, graphs Windows, Linux, Mac Free
Google Ngrams An ngram-viewer for the whole of Google Books ngrams Web Free
GraphColl Tool for building and exploring networks of linguistic collocations visualization Windows, Mac Free
Gsearch Tool for syntactic pattern matching pattern matching ? Down
HeidelGram Web-Based Tools Basic corpus analysis toolkit for the HeidelGram Corpus wordlists, concordancer Web Free
HeidelTime A multilingual, domain-sensitive temporal tagger temporal tagger, timex3 Java Free, Open Source
Heimdall A tool that searches a text for sequences written in other languages. language detection Linux, Windows, Mac Open Source
HGSimpleCorpusNetwork Batch frequency analysis on corrupted (e.g. OCR) corpus data and generation of network analysis data. wordlists, network analysis Multi (Python) Free, Open Source
HTST Samuels Historical Thesaurus Semantic Tagger via web-interface semantic tagger Web Free
ICARUS Search and visualization tool for dependency trees visualization Free
ICEweb A tool for compiling, downloading, and analyzing web corpora in accordance with the ICE ICE, compilation, crawler Windows Free
IMS Corpus Workbench Tool for sorting frequencies in corpora wordlists, concordancer Web and local version Free
Intelligent Archive Managing corpora for stylometry stylometry, management Windows, Unix, Linux, Mac Free
jTokenizer Tokenizing natural language tokenizer Free
JusText Tool for removing boilerplate content, such as navigation links, headers, and footers from HTML pages boilerplate remover Python Free
juxta Comparing and collating multiple witnesses to single textual works textual criticism, witnesses Windows, Unix, Linux, Mac Free
Kaleidographic A dynamic and interactive visualization tool for multivariate data. visualization Web Free
KAT Tool Grouping patterns based on search terms patterns, concordancer Windows Free
kdiff3 KDiff3 is a diff and merge program. comparison Windows, Linux, OSX Free, Open Source
Keyword Plus A keyword generation/analysis tool keywords Windows Free
kfNgram A simple tool for generating n-grams n-grams, p-frames Windows Free
Khepri A view-based toolfor exploring (historical sociolinguistic) data sociolinguistics, visualization JavaScript, Web Free, Open Source
KoGra-R An R-based online tool that provides statistical measures for corpus-based frequencies statistics, frequency analysis Web Free
KorAP A complex platform for corpus analysis developed at the IDS in Mannheim analysis, multilevel, multi-layer Web Free, Open Source
LancsBox The Lancaster Desktop Corpus Toolbox; Software package for the analysis of language data and corpora collocation, frequency analysis, keywords Java Free (CC)
langid.py A standalone language identification tool written in Python. language detection Linux, Windows, Mac Open Source
LDA-Toolkit A toolkit for linguistic discourse and image analysis. discourse, images Windows Free
Leipzig Corpus Miner A modern text mining infrastructure for qualitative data analysis qda, mixed methods, text mining, lexicometrics, topic models, information retrieval Linux, Windows, Mac (via VM) Free
LEXA A complex lemmatizer. lexis, lemmaizer Free
LexisNexis A database containing (new and old) news articles. They also have other (business) data. news, data Web Commercial
lexpan A tool to analyze syntagmatic structures in corpora. Especially useful to analyze fillers and slots. syntagmatic, slots Windows, Linux, Mac Free
LightSide A machine learning workbench. machine learning Linux, Windows Free, Open Source
Linguistica Word segmentation and morphological analysis? segmentation, morphological tagger Linux, Mac, Windows Free
LIWC A tool that tries to compute scores for different emotions, thinkings styles, and social concerns. lexical analysis, style Web Free (but commerical)
MALLET Package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text statistical nlp Windows Free
MAT - Multidemensional Analysis Tagger A tagger for MDA (Biber et al.) by Andrea Nini. tagging, MDA Windows, Mac Free
MAXQDA Sophisticated QDA software that works with multimodal data and supports mixed methods approaches qda, mixed methods Windows, Mac, Android, iOS Commerical
MLCT Tool for building and processing corpora concordancer, sentence boundary detector Free
MMAX2 A multi-level annotation tool annotation, multilevel, multi-layer Java Free, Open Source
MonoConc Esy Concordancing and text search tool that allows primary and secondary concordancing concordancer, sentence boundary detector Free for non-commerical research
MorphAdorner Tool for performing morphological tagging of texts morphological tagger Free
N-Gram Processor (NGP) A perl based tool for the creation and processing of n-gram lists out of text files. n-grams Linux, Windows, Mac Open Source
NATAS historical, python, lexis Linux, Windows, Mac Open Source
Natural Language Toolkit Platform for building Python programs to work with human language data tokenizer, tagger Unix, Mac, Windows (+Python 3.4) Free
NooJ Tags texts and corpora (i.e. sets of text files) at the Orthographical, Lexical, Morphological, Syntactic and Semantic levels multilevel tagger Windows, Mac, LINUX and BSD Unix Free
NoSketch Engine Word sketches, thesaurus, keyword computation, corpus creation corpus creation, semantic analysis, wordlists Free
Onion Tool for removing duplicate parts from large collections of texts duplicate remover Free
Online Graded Text Editor Tool for profiling a text's vocabulary level and complexity text analysis, editing, vocabulary OSX, Windows Free
OpenConc Tool for concordancing concordancer Free
PALinkA Annotation tool annotation Down
ParaConc A bilingual/multilingual concordancer concordancer Non-Free
Pareidoscope Pareidoscope is a collection of tools for determining the association between arbitrary linguistic structures, such as collocations, collostructions or between structures. collocation, constructions Free
PatCount A pattern counting tool with powerful statistic capabilities and regex support patterns Windows Free
Pattern Builder A tool helping with regular expressions and PoS tags regex, tagging Windows Free
Pepper Conversion between linguistic formats, e.g. from TEI to ANNIS to Tiger XML to EXMARaLDA. conversion Free
Phonological CorpusTools (PCT) Phonological analysis on transcribed corpora phonology Multi (Python) Free
PhraseContext Tool for wordlists, concordancing, collocation, TTR, wordlists, concordancer 35€
Pipoca (formerly openQDA) A web-based QDA software qda, mixed methods Web Free, Open Source
Praaline Praaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora. speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysis Windows, Mac, Linux Free / Open Source (GPL3)
PRAAT A tool for doing phonetics by computer phonetics, spoken Windows, Mac, Linux Open Source
ProtAnt Tool for prototypical text analysis wordlists Windows, Mac Free
pysupersensetagger Analyses texts for MWE and supersenses. text analysis Unix, Mac (Python) Free
PyXMLConc Concordancer for XML files with automatic tag and attribute detection. concordancer Multi (Python), Windows Free, Open Source
Quanteda A python library used to study neologisms in historical English corpora. R Linux, Windows, Mac Open Source
Query Tool for the Edenburgh Associative Thesaurus A query tool for the EAT query, thesaurus Windows Free
Readability Analyzer A tool for generating various readability statistics readability, statistics Windows Free
Readability Webfx A tool to check how easy or difficult (readability) a given text is. readability Web Free
RSTTool Tool that can annotate texts for constituency and rhetorical structure annotation Windows, Macintosh, UNIX and LINUX Free
Salt Meta models for linguistic data. meta modelling Free
SarAnt Tool for batch search and replacing editing, searching Windows Free
SegmentAnt Tool for the segmentation of Japanese and Chinese segmentation, tokenizing Windows, Mac, Linux Free
Shinyconc ShinyConc is a framework for generating custom web-based concordancers and is written in R and R Shiny. concordancer, kwic, r Open Source / R Free
Simple Concordance Program Tool for concordance and word listing that works with many languages concordancer Windows, Mac Free
SketchEngine Word sketches, thesaurus, keyword computation, corpus creation corpus creation, semantic analysis, wordlists, keywords 30 day trial or 4,85€/month
SpiderLing Software for obtaining text from the web useful for building text corpora crawler Free
SPPAS A tool for the automatic annotation and analysis of speech. speech, spoken, annotation Windows, Mac, Linux Free, Open Source
SPre Tool for segmenting and annotating texts annotation Free
Stanford Log-linear POS Tagger POS Tagger (with Penn Treebank Tagset) for English, Arabic, Chinese, German pos tagger, tagging Free
Stanford Topic Modeling Toolbox The Stanford Topic Modeling Toolbox (TMT) allows users to perform topic modeling on texts imported from spreadsheets. It supports both LDA and labelled LDA. topic modeling Java Free
Stylo for R Tool for computational stylistic analysis (authorship attribution, genre analysis) text analysis Free
Sub-Corpus Creator A tool for creating sub-corpora based on search searchs and metadata compilation Windows Free
Synpathy Tool for manual syntactic annotation annotation Windows, Mac, Linux Free
TAACO TAACO is a tool that calculates 150 indices of textual/lexical cohesion. cohesion, lexical sophistication All Free, Open Source
TAALES TAALES measures over 400 indices of lexical sophistication. lexical sophistication Mac, Linux, Windows Open Source
TagAnt Part-of-speech tagging tool built on Tree Tagger pos tagger, tagging Windows, Mac, Linux Free
TagCrowd A simple tool for generating tag/word clouds online word clouds, visualization Web Free
Tagxedo A tool for generating word clouds. word clouds, visualization Web Free
TASX-Annotator Tool for multilevel annotation and transcription of (multi-channel) video and audio data. multilevel tagger, transcription Windows, Mac, Linux, Solaris Down
Text Analysis Computing Tools (TACT) A simple, fairly old concordancer. concordancer Commercial
Text Variation Explorer The Text Variation Explorer TVE is a tool for exploring the effect of window size on various common linguistic measures. It visualizes these measures and allows for PCA/Cluster analysis. visualization, variation analysis Java Free
Text Visualization Browser A survey/gallery of text visualizations visualization Web Free
Textanz Language analysis program that produces frequency lists, word lists, parts of speech tags. wordlists, concordancer, pos tagger, dictionary Any OS Free, Open Source
TextArc A tool for visualizing the structure of texts. visualization
TextDirectory TextDirectory is a tool for aggregating text files based on various filters and transformation functions. compilation, text-processing, python Windows, Linux, OSX Free, Open Source
Textplot A tool for mapping a document into a network of terms in order to visualize the topic structure. visualization, network analysis Python Free, Open Source
Textplot A tool for converting documents into (semantic) networks based on KDE. semantics, network analysis, graphs Linux, Windows, Mac Open Source
TextSmith Tools A tool for genre-informed phraseological profiles phraseology, segmentation Windows Free
TextSTAT Tool for creation and manipulation of linguistic data from different languages corpus creation, concordancer Windows, GNU/Linux und MacOS Free
The (Phonetic) Transcription Editor An editor for creating phonetic transcriptions transcription Windows Free
The Great American Word Mapper A visualization tool for the top 100,000 words used in American English twitter data. twitter, lexis, social media Web Free
The Simple Corpus Tool A corpus analysis toolkit that supports XML annotations. concordancer, annotation, xml, frequency Windows Free
The Simple PoS Tagger A simply PoS-tagger utilizing Perl Lingua::EN:Tagger pos tagger, tagging Windows Free
The SPAADIA concordancer A concordancer for the SPAADIA corpus concordancer, SPAADIA Windows Free
The Text Feature Analyser A tool for investigating textual features and various meassures text analysis, concordancer Windows Free
Thesaurus.com English language thesaurus with links to English dictionary and translation sites. efl, esl, linguistics Not sure, I'm not a programmer or geek. Free
TigerSearch Tool for searching syntactically and POS-tagged corpora search tool, pos tagger Free
TnT - Thorsten Brants's PoS Tagger A simple PoS-Tagger pos tagger, tagger, tagging Windows/Unix Available via Stanford
Tree Editor TrEd 2.0 Graphical editor and viewer for tree-like structures. visualization Windows, GNU/Linux und MacOS Free
TreeTagger Tool for annotating text with part-of-speech and lemma information pos tagger, annotation Windows, Mac, Linux Free
TurboParser Multilingual dependency parser with linear programming parser Free
Twarc A command line tool (and Python library) for archiving Twitter JSON twitter, social media Python, Windows, Linux, Mac Free, Open Source
Tweet NLP Tweet tokenizer, POS Tagger, hierarchical word clusters, and a dependency parser for tweets, along with annotated corpora and web-based annotation tools. Clusters: http://www.cs.cmu.edu/~ark/TweetNLP/cluster_viewer.html pos tagger, tokenizer, parser Free
TWINT A Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API. twitter, social media, scraping Linux, Windows, Mac Open Source
TXM XML & TEI compatible text analysis software based on TreeTagger, the CQP search engine and the R statistical environment. text analysis, concordancer, r, statistics, search tool, tokenizer, xml Windows,Mac,Linux,Tomcat Free
UAM CorpusTool Text annotation tool and statistics for various types of linguistic analysis and multilayer annotation annotation, multi-layer Free
UAM ImageTool Image annotation tool for visual data corpora annotation Free
Unitok Tool that splits texts into tokens tokenizer Free
VARD Spelling variant detection and deletion in historical corpora (particularly EModE) variant detector Free (with academic email)
VariAnt Tool for the detection of spelling variants variant detector Windows Free
Voyant A web-based reading/analysis toolkit for digital texts. reading, text analysis Web Free
VU Amsterdam Metaphor Identification Corpus Corpus tool for metaphor identification metaphor identification, metaphors Web and local version Free
WConcord 3.0 A full featured concordancer concordancer Free
WebAnno A web-based annotation tool annotation, web-based Web Free
WebLicht WebLicht is an execution environment for automatic annotation of text corpora embedded with the CLARIN-D project. annotation Web Free (CLARIN-D Account needed)
Wmatrix Tool for corpus analysis and comparison wordlists, concordancer, pos tagger, semantic tagger, keywords Web £50 per username per year
WordCruncher A tool for analyzing ebooks. concordancer, frequency, ebooks Windows, Mac, iOS Free
WordFish Extract political positions from text documents. political science R Free
WordHoard Close reading and scholarly analysis of deeply tagged texts close reading Windows, Unix, Linux, Mac Free
Wordle A tool for generating word clouds. word clouds, visualization Web Free
WordMap A simple web-based word-map / wordcloud generator. visualization, web-based Web Free
Wordscores A tool (approach) to extract dimensional information from political texts political science, information retrieval Free
Wordsmith One of the most established corpus toolkits providing a variety of functionality concordancer, wordlists, statistics, keywords Windows 60€ per licence
Wordstatix Corpus analysis tool concordancer Free
Worldbuilder Tool for annotation and visualisation in analysis applying text-world-theory annotation, visualization
Xaira Indexing and analysis of XML resources, indexing, xml Windows Free, Open Source
YACSI Chinese Tokeniser / PoS Tagger A Chinese tokenizer and PoS tagger chinese, tokenizer, pos tagger Windows Free
Log-Likelihood and Effect-Size Calculator An online calculator for log-likelihoof and effect sizes. statistics Web Free
CorefAnnotator An annotation tool for coreference. corerference, annotation Windows, Linux, Mac Open Source
SoMaJo A tokenizer and sentence splitter for German and English web and social media texts. tokenizer, sentence boundary detector Linux, Mac, Windows Free, Open Source
SoMeWeTa A part-of-speech tagger with support for domain adaptation and external resources. tagging, pos, pos tagger Linux, Mac, Windows Free, Open Source
COCA_MWU20 ColloGram A collocation analysis tool based on a COCA collocation family list. collocation Windows Free
RDQA An R package for Qualitative Data Analysis (QDA). qda Windows, Linux/FreeBSD, Mac Free
KWords A tool for keyword identification and analysis. keywords, CADS, concordancer, collocation analysis Windows, Linux, Mac Free
Range Program (formerly VocabProfiler) (Paul Nation) A tool for for analyzing the vocabulary load of texts. voabulary, lexis Windows Free
Frequency Program (Paul Nation) A tool that turns a text or texts into a word list with frequency figures. vocabulary, frequency, lexis Windows Free
Compleat Lexical Tutor A website featuring various tools and materials for data-driven language learning. vocabulary, language learning, lexis, web-based, ddl Web Free
WordSift A word cloud generator, with dynamic filters, links to images, and KWIC capabilities. Works with various types/formats of word lists. word cloud, vocabulary profiling, lexis, vocabulary, language teaching Web Free
KHCoder A free software for quantitative content analysis or text mining that supports multiple languages. correspondance analysis, collocation analysis, frequency analysis Windows, Mac, Linux Free, Open Source
MaltParser A system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data and to parse new data using an induced model. parser, dependency parsing Windows, Mac, Linux Free
MaltOptimizer A system for parser optimization using the open-source system MaltParser. parser, dependency parsing Windows, Mac, Linux Free
Link Grammar Parser A syntactic parser of English, Russian, Arabic and Persian (and others), based on Link Grammar. parser, syntax, grammar Linux, Mac, Windows Free
ANTLR ANother Tool for Language Recognition is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. parser generator Linux, Mac, Windows Free, Open Source
GOLD Parsing System A parsing system that can be used to develop programming languages, scripting languages and interpreters. parser generator Linux, Mac, Windows Free
JavaCC A popular parser generator for use with Java applications. parser generator Linux, Mac, Windows Free
Lextutor Web Concordancers Web concordancers targeted towards DDL collocations, concordancer, DDL Web Free
wordspace An R package for distributional semantics semantics, distributional semantics, R R Free
UCS Toolkit A toolkit (libraries and scripts) for the statistical analysis of coocurence data. collocation, coocurence, statistics R, Perl Free
Coquery A free corpus query tool to search, analyze, and visualize corpora query, visualization Linux, Mac, Windows Free
WordWanderer A web-based visualization/analysis tool which allows its users to "wander" a text. visualization, concordancer Web Free
Cortext Manager A scriptable "ecosystem" for modeling and exploring corpora. Especially useful for creating topic models and co-occurence networks. NER, topic models, visualization, word2vec, collocation, keywords Web Free
AMesure A web-based system to analyse the reading complexity of French texts text complexity, readability Web Free
CEFRLex A web-based tool to analyse the lexical complexity of words in texts according to the CEFR scale in various languages. text complexity, readability, language learning Web Free
PACTE A flexible collaborative text annotation platform that is currently in development. annotation Web Free (for research)
CLAN A tool for searching and analyzing child language data in the CHAT transcription format. search, wordlists, collocation, child language, CHILDES Windows, Mac, Unix Free, Open Source
NVIVO A commercial Computer-Assisted Qualitative Data Analysis Software (CAQDAS) software that works with both qualitative and mixed methods data qda, mixed methods Windows, Mac Commercial
QDA Miner A commercial QDA tool for coding, annotating, retrieving and analyzing collections of documents and images. qda, mixed methods, text analysis Windows Commercial
tagtog A text annotation tool specifically built to train AI/ML models. machine learning, annotation Cloud-Based Commerical
Calc: Corpus Calculator A web-based tool to calculate basic corpus statistics, for example, comparing frequencies across corpora. statistics Web Free
gwic A very basic KWIC tool written in Go. concordancer, KWIC Windows, Mac, Linux Open Source
ACTRES Rhetorical Movel Tagger A tool for tagging rhetorical moves. tagging, rhetorics Web Commerical
ACTRES Corpus Browser A tool for retrieving tagged information in more than one language. tagging Web Commerical
ACTRES Corpus Manager A corpus compilation and analysis platform with a focus on multilingual and parallel corpora. compilation, corpus management, annotation, multilingual Web Commercial
VideoAnt A web-based tool to annotate and discuss web-hosted videos. annotation, video Web Free