Page 3 of 4 for DataGeeko.com - page 3

Posts

Go Even Deeper with Char-Gram+CNN
Apr 29, 2018 deep-learning kaggle
This is a repost from my kernel at Kaggle, which has received several positive responses from the community that it’s helpful to them. This is one of my kernels that tackles the interesting Toxic Comment Classification Challenge at Kaggle, which aims to identify and classify toxic online comments.
Do Pretrained Embeddings Give You The Extra Edge?
Apr 29, 2018 deep-learning kaggle
This is a repost from my kernel at Kaggle, which has received several positive responses from the community that it’s helpful to them. This is one of my kernels that tackles the interesting Toxic Comment Classification Challenge at Kaggle, which aims to identify and classify toxic online comments.
Tackling Toxic Using Keras
Apr 28, 2018 deep-learning kaggle machine-learning
This is a repost from my kernel at Kaggle, which has received several positive responses from the community that it’s helpful to them. This is one of my kernels that tackles the interesting Toxic Comment Classification Challenge at Kaggle, which aims to identify and classify toxic online comments.
How to build a Apache Spark Cluster with Hadoop HDFS and YARN
Oct 25, 2017 hadoop spark
In our earlier post, we built a pretty light 2-nodes Apache Spark cluster without using any Hadoop HDFS and YARN underneath. We didn’t point the spark installation to any Hadoop distribution or set up any “HADOOP_HOME” as a PATH environment variable and we have deliberately set the “master” parameter to a spark master node.
Do you know what are the most talked about topics in Singapore?
Aug 21, 2017 data-mining data-preprocessing data-visualization news toy-projects
There are tons of news publications in Singapore, from the traditional The Straits Times, Today, and also the digital ones like ChannelNewsAsia.com and many more to grab your latest news fix but let’s face it. *Cough Cough*. It’s beginning to look like what they want to write, not what want to read and know.
How to deploy a Hadoop/Spark Cluster with multiple machines
Aug 3, 2017 hadoop spark
When you take your machine learning models to the production level, especially in an enterprise setting, you will need your models to give you a fast and reliable response. And this is where Spark comes into the picture. Spark offers a reliable distributed/clustered computing framework that sits on top of the Hadoop framework and if you go the extra mile of configuring the HDFS and YARN, it can even achieve even more resiliency in your product. To start things small, let’s start with Spark and we’ll see how the other components fit in.
How to install Kaggle's Most Won Algorithm - XGBoost (Screenshots included)
Jul 22, 2017 machine-learning
If you are on this page, chances are you have heard of the incredible capability of XGBoost. Not only it “boasts” higher accuracy compared to similar boasted tree algorithms like GBM (Gradient Descent Machine), thanks to a more regularized model formalization to control over-fitting, it enables many Kaggle Masters to win Kaggle competitions as well. In fact, it’s probably the most popular machine learning algorithm at the data science space right now!
Titantic Disaster Use Case: Using Seaborn to gain insights
Jul 2, 2017 data-preprocessing data-visualization
There are tons of Python-based visualisation tools out there but my favourite one has to be Seaborn. Some would say using Seaborn is a form of cheating. Well, after all Seaborn is just a wrapper of matplotlib and instead of saying Seaborn VS matplotlib, we should look at it as a upgraded, flashy version of the old trusty matplotlib library. The part I like about Seaborn is that it comes with a ready set of color palettes that not only makes your data visualisation looks tasty, it also shouts out professionalism in just a liner or two.
Practical Data Problems - How to read big data files efficiently
Jun 25, 2017 data-preprocessing
Panda’s read_table or read_csv is probably the number 1 method that comes to everyone’s mind when you need to read the rows of data into dataframe. After all, you could do that in just 2 lines:
Reality sucks - dealing with imbalanced data
Jun 21, 2017 data-preprocessing machine-learning
You stumble upon some intriguing patient cancer dataset that seems to be the last remaining puzzle towards solving the human war against cancer that will make this world a better place for everyone and you excitedly download the dataset.