Skip to content
View kavgan's full-sized avatar

Highlights

  • Pro

Block or report kavgan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, …

Jupyter Notebook 1,184 786 Updated Dec 2, 2020

Robustness Gym is an evaluation toolkit for machine learning.

Python 447 38 Updated Jun 28, 2022

Curated List of Blog Posts From Opinosis Analytics

2 1 Updated Aug 14, 2021

Python word cloud library for use within Jupyter notebook and Python apps.

Jupyter Notebook 49 14 Updated May 15, 2024

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Python 17,208 3,739 Updated Jun 2, 2023

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

Python 7,005 2,252 Updated Oct 14, 2025

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

Python 131 45 Updated Jul 15, 2019
Jupyter Notebook 28 13 Updated Sep 30, 2016
Jupyter Notebook 46 45 Updated Feb 25, 2018

A few exercises for use at events.

Jupyter Notebook 1,436 665 Updated Apr 27, 2021

CNN text classification using keras

Python 16 6 Updated Nov 27, 2017

ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.

Java 221 37 Updated Apr 9, 2020

Cool links & research papers related to Machine Learning applied to source code (MLonCode)

6,561 837 Updated Dec 3, 2020

Discovering Related Clinical Concepts using Large Amounts of Clinical Notes. An unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical no…

25 11 Updated Jan 22, 2018

OpinRank Dataset. Dataset containing user reviews for entities namely cars and hotels. Full reviews from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews)

44 12 Updated May 28, 2021

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to t…

Java 1,024 351 Updated Feb 6, 2026

Examples of code in spark

Python 11 6 Updated Dec 2, 2017

RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.

15 7 Updated Jan 24, 2020

This repo contains code and dataset for the Opinosis Summarization Framework

53 18 Updated Nov 14, 2019

Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...

Java 2,742 415 Updated Jun 1, 2022