ideas

Creating Large Elasticsearch Index by Using Limited Hardware

First of all let me tell you my current requirement and you'll continue reading if you have similar use case. I'm trying to submit a paper for SIGIR 2018 and for this paper I need to index 50,000,000 unique documents to 15 different Elasticsearch indicies. According to my

Finding best fasttext hyperparameters

If you check fasttext info page, you will see fasttext has a lot of different input parameters for training and also dictionary. If you ever tried to tune your model accuracy, you would see that changing these parameters changes model's precision and recall dramatically. So I decided to make a

Trying fasttext classifier models with different corpus

After making some experiments with using stackoverflow data, I wonder how these models work with different corpus. Is it a good idea to predict tags from body with a model which is trained by titles? I used models from this post and I used a simple methodology for this experiment:

Fasttext classifier for stackoverflow data

I think I'm so obsessed with open data and open data processing. I think each website, which is driven by user's content (Like wikipedia, twitter, tumblr, facebook) needs to provide API or data dump for community to use these information. It's a fair trade right, people provides information and then

speed up mongodb aggregation functions

I love MongoDB and last month I started to learn MongoDB's amazing Aggregation API. It has a lot of useful aggregation functions and it's documentation can be found here. I tried all aggregation functions on mongo shell and I also tried explain function on aggregation functions to understand how MongoDB