ideas

Trying fasttext classifier models with different corpus

After making some experiments with using stackoverflow data, I wonder how these models work with different corpus. Is it a good idea to predict tags from body with a model which is trained by titles? I used models from this post and I used a simple methodology for this experiment:

Fasttext classifier for stackoverflow data

I think I'm so obsessed with open data and open data processing. I think each website, which is driven by user's content (Like wikipedia, twitter, tumblr, facebook) needs to provide API or data dump for community to use these information. It's a fair trade right, people provides information and then