ideas

Creating Large Elasticsearch Index by Using Limited Hardware

First of all, let me tell you my current requirement and you'll continue reading if you have a similar use case. I'm trying to submit a paper for SIGIR 2018 and for this paper I need to index 50,000,000 unique documents to 15 different Elasticsearch indexes. According to

Finding best fasttext hyperparameters

If you check fasttext info page, you will see fasttext has a lot of different input parameters for training and also dictionary. If you ever tried to tune your model accuracy, you would see that changing these parameters changes model's precision and recall dramatically. So I decided to make a

Trying fasttext classifier models with different corpus

After making some experiments with using stackoverflow data, I wonder how these models work with different corpus. Is it a good idea to predict tags from body with a model which is trained by titles? I used models from this post and I used a simple methodology for this experiment:

Fasttext classifier for stackoverflow data

I think I'm so obsessed with open data and open data processing. I think each website, which is driven by user's content (Like wikipedia, twitter, tumblr, facebook) needs to provide API or data dump for community to use these information. It's a fair trade right, people provides information and then

Open NLP Turkish Sentence Model

In a recent probject I needed to extract sentences for Turkish language and I decided to train a model for Open NLP which covers Turkish grammar rules. You can download model and use your project and reach model: http://bit.ly/opennlp This model is generated by using 800,000