Machine Learning - Open Source(2017 - continue)

I've developed many open source projects on different topics like NLP,MLOps,Deep Learning,Machine Learning,OCR,Image Processing etc.

Custom ChatGPT Chatbot

  • Just mention your website name
  • Start asking questions about it and get the most relevant answer.
  • Used openai chat completion,completion apis,web scraping.

Almost All NLP Application

  • Assembled almost all NLP concepts in a single use case.
  • Created an application where user can search for entity(user query) in news articles(knowledge base)
  • Performed sentiment analysis, topic modeling, custom NER, knowledge graph on fetched article
  • Used best practices in MLOps to develop project.

Custom Named Entity Recognition

  • Used doccano to annotate news data.
  • Trained model using Spacy after converting data to Spacy format.
  • Trained model using transformers trainer after converting data to BILOU format.
  • Created production ready code for inference and deployment.

Text Classification

  • Created production ready text classification pipeline using different methods.
  • Used doccano to annotate news data to classify into POSITIVE,NEGATIVE,NEUTRAL
  • Input vectors used : TF-IDF, word embeddings from distilbert,word embeddings from sentence transformer.
  • Model techniques used : Custom ML using sklearn,Custom Neural Network using keras,Custom Neural Network using pytorch, Custom Neural Network using transformers trainer.

Knowledge Graphs

  • Coreference Resolution using coreferee library
  • Named Entity Recognition using custom NER model
  • Entity Linking using wikidata api
  • Relationship Extraction using rebel and stanford open ie libraries
  • Knowledge Graph Creation using neo4j and networkx

Semantic Search Engine

  • Used elasticssearch to get relevant news article given input query
  • Applied sentence transformer embedding and cosine similarity

Code Template

  • A code template for Machine Learning application development lifecycle
  • It includes folder structure as well as some boilerplate code used for typical ML project

News API

  • API to access news articles given a query to search

NLP Text Cleaner

  • Utility module for text cleaning/pre processing required in NLP projects
  • Published on PyPI using circleci CI/CD

Audio Books

  • A project to create audio book from abook in any format(pdf,jpg etc.)
  • It helped NGO working for visually impaired children reduce book creation time from 1 month to few hours
  • Featured Post

FAQ Chatbot

  • COVID-19 FAQ chatbot in python along with user interface

Machine Learning - Professional(2017 - continue)

Following are some of the projects I did using NLP,Python while working in different oraganizations mainly in Banking domain

Call Report Actions

  • Built a classification service to find actionable items from callreport
  • Built custom algorithm to find Product mentions in the callreport
  • Achieved 90% recall and helped user to track Minutes of Meeting
  • Responsible for all activities from requirement to deployment and monitoring
  • transformers,Spacy,nltk

Feedback Routing

  • Created a classification service to automate feedback routing process which saved around $50k/year
  • Achieved 85% accuracy which helped to minimise manual intervention
  • Responsible for all activities from requirement to deployment in production
  • transformers,nltk

Online Marketplace

  • Built a inference pipeline for online marketplace
  • Consulted on shifting current Data Science activities to AWS Sagemaker
  • Created ML guidelines and processes for the Data Science team

Text.ai

  • Created ensemble of regex,spacy NER and known entities dictionary to find the relevant entities in trade finance documents like LC,Bill of Trading,Sanctions etc
  • Created image processing pipeline to handle noisy images for better OCR output
  • Built a bounding box solution using OCR to get entities from W-2 forms and invoices.
    For key-value bases entities get the nearest word from key as value For only value based entities,use predefined templates to determine its location
  • Python,Java,SQL,Regex,OCR(On prem,cloud),OpenCV,GATE,Spacy,NLTK

L2 Assistant Chatbot

  • A chatbot to assist L2 support team on frequently occurring issues on trade settlement platform
  • Created dataset by parsing design docs,troubleshooting guides,JIRA
  • Used word2vec,text similarity techniques to find most related questions in the dataset
  • Created user interface in flask
  • NLTK,Chatterbot,Flask,Gensim,JIRA APIs

Test Scripts Recommender

  • A recommender system to assist QA team in prioritizing test scripts
  • Created dataset by using GIT and JIRA APIs
  • Created ranking algorithm based on current release code check ins
  • This system helped QA team to reduce time by 30% to report the issues
  • Python,JIRA API,GIT API

Java Development - Professional(2011 - 2017)

Worked on multiple projects in Banking,PLM domain for 6 years
Core Java,J2EE,Struts,Spring,SQL,Hibernate,Shell Script,JSP,HTML,CSS

Skills

Extensive Usage Moderate Usage
NLP Text preprocessing,Data Annotation,Word Embeddings, Text Classification,Named Entity Recognition,OCR,Regular Expressions,Image Processing Coreference resolution,Entity linking,Relationship extraction,Knowledge graph,Text clustering,Topic modeling,Search,Chatbot,Dimensionality Reduction
Tools/Libraries pytorch,transformers,spacy,NLTK,sklearn,pandas,numpy,pytest, pylint,black,opencv,regex gensim,keras,tensorflow,neo4j,doccano,GATE,elasticsearch,openai
Deployment flask,FastAPI docker,circleci,jenkins,streamlit
Languages python java,SQL,shell script,javascript,html
Cloud AWS S3,EC2,Sagemaker,Textract,GCP GCS,BigQuery,Vision API

Certifications and Education

Blogs

Machine Learning Cooking Poems