Posts

  • Migration

    A blog has travelled without moving to a new location

    Without moving… how’s that possible? Check out the video.

    [ Read more... ]

  • Prediction using NLP and Keras Neural Net

    This Notebook focuses on NLP techniques combined with Keras-built Neural Networks. The idea is to complete end-to-end project and to understand best approaches to text processing with Neural Networks by myself on practice. The tutorial provides vivid understanding of how to prepare the data for a Neural Network with Keras and how to actually implement and run it.

    Project description: predict if the review of the film is positive or negative. The dataset is a set of imdb reviews labeled as positive/negative.

    It is inspired by a DeepLearning with NLP CrashCourse by Dr. Jason Brownlee.

    [ Read more... ]

  • Filter Spam with Machine Learning

    This guide contains a very simple, yet powerful Machine Learning technique to filter the spam!

    Multinomial Naive Bayes with minimal preprocessing yields incredible results! I was quite amazed by it’s efficiency to learn from text.

    Please know, this technique is derived from Applied Text Mining in Python course by University of Michingan, which I highly recommend to anyone in Computer Science and Machine Learning.

    Ok, let’s dive in.

    [ Read more... ]

  • text mining - HackerNews Buzz-headlines

    This project is an exploration of headlines frequency, it’s popularity.

    The dataset is a collection of headlines from HackerNews portal gathered for period 2006-2015.

    Note: to see final result scroll to the end

    Let’s dive into exploration of the so-called metaparameters, and find out: ​

    • How popularity correlates with headline length?
    • What about popularity in respect quarter periods?
    • How does overall popularity changes over time on the website?
    • What are the buzzphrases appeared in the headlines and how do they change over time?

    Intuitively we can tell “Why these particular articles were popular” - that most probably depends on the content, relevance, the author. Yet, lets look for some unobvious correlations and then visualize most popular headlines!

    [ Read more... ]

  • ML framework with Kaggle Titanic competition

    Such a renown Kaggle competition. Everyone into Machine Learning had tried to predict, who is more likely to survive: be it a family man, a gentlemen with expensive ticket, or a child? Or maybe someone holding a Royalty title? Yes, there is quite a number of features to learn for a machine!

    Let’s dive into this tutorial which is more of presentation of my top notch framework for Machine Learning so far! Yes! The framework I’ve build with the courtesy of dataquest.io, spending quite some time getting my best competition results.

    Btw, dataquest.io is a great platform and a community to learn some great things!

    title

    [ Read more... ]

  • Word Frequency From a Text

    Suppose you want to get top frequent words from a text. This task quickly reveals the caveats: there are swarms of words each with dozens of forms, all those n’t and ‘s and and also commas and periods… All these should be accounted for.

    Good for us, there are very powerful tools exist for word processing and text mining - libraries that handles these tasks in a best way possible.

    Get familiar with nltk - a powerful library for NLP (natural language processing)!

    [ Read more... ]

  • ML Modelling Workflow Tutorial

    This is a solid and quick Machine Learning tutorial that leads through the steps of building a best prediction model. It allows to get understanding of the process and provides with code examples.

    I’ll be using Logistic Regressor and Trees (either Gradient Boosted or Ensembles are good), for saving space, although I encourage you to run different classifiers.

    Originally built on a project I’ve completed for Michigan University course Applied Machine Learning. Which I highly recommend to anyone passionate with Data Science.

    [ Read more... ]

  • ML Model Evaluation cheatsheet

    A handy cheatsheet on tools for model evaluation. Briefly explains key concepts, and ends up with Powerful GridSearch tool, providing code snippets.

    [ Read more... ]

  • Machine Learning Algorithms Cheatsheet

    A useful cheatsheet of Machine Learning Algorithms, with brief description on best application along with code examples.

    The cheatsheet lists various models as well as few techniques (at the end) to compliment model performance.

    [ Read more... ]

  • NYC Best School Districts

    My NYC Schools Exploration project completed with interesting results and great visual. There is quite a number of code lines I’ve produced, all for the sake of finding out on how good can matplotlib be in drawing visuals. Turns out it is quite capable!

    Scroll to the very bottom to marvel on data-mined results wrapped up in a nice visual!

    This research addresses following questions:

    • Determing wheter there’s a correlation between class size and SAT scores
    • Figuring out which neighborhoods have the best schools
      • In combination with a dataset containing property values for NY districts, we could find the least expensive neighborhoods that have good schools.

    [ Read more... ]

  • Graph Aesthetics

    It is always nice to have tidy and visually appealing graphs.

    As a matter of fact there is always a few lines of code necessary to beautify default graphs. This post contains those handy snippets and even more.

    There are few ways to follow:

    • tune matplotlib-plotted graphs: dejunkify, set color scheme, additional tweaks
    • seaborn library with predefined scheme + a few more tunes via library wrapper

    [ Read more... ]

  • Mapping data Geographically

    Basemap is a powerful yet simple tool for python and allows to plot visualizations geography-wise by providing coordinates.

    This quick tutorial is a good example of syntax usage and technique references of drawing a basemap via Mercator projection. To achieve that we require coordinates such as Latitude and Longitude.

    A high level review of a basemap library.

    [ Read more... ]

  • Prepare Datasets cheatsheet

    This is a Cheatsheet, meaning it contains code snippets. Based on Data Exploration course from Dataquest.

    Often Dataset Preparation consists of combining several messy data sets into a single clean one to make further analysis easier. It is always good to get acquainted with the topic on the domain the data represents. For example, reading wikipedia (for public data) - thus getting insights and understanding on what’s relevant (in practice we can understand which rows or columns are relevant for us).

    [ Read more... ]

  • Jekyll tutorial with GitHub pages

    There is a great satisfaction to set up own Blog on GitHub pages. It took me a while to gather all the information from various sources and combine it to create a stylish website.

    This article will lead you through all the steps and explain how set up your own open, professional-looking Blog. Which is perfect to show off portfolio, share knowledge and post easy to access tech articles for your future self.

    [ Read more... ]

  • My First Post

    Hello World! Hope to get some happy-blog-birthday comments :)

    [ Read more... ]

subscribe via RSS