Multimodal learning has shown promising performance in content-based recommendation due to the auxiliary user and item information of multiple modalities such as text and images. However, the problem of incomplete and missing modality is rarely explored and most existing methods fail in learning a recommendation model with missing or corrupted modalities. In this paper, we propose LRMM, a novel framework that mitigates not only the problem of missing modalities but also more generally the cold-start problem of recommender systems. We propose modality dropout (m-drop) and a multimodal sequential autoencoder (m-auto) to learn multimodal representations for complementing and imputing missing modalities. Extensive experiments on real-world Amazon data show that LRMM achieves state-of-the-art performance on rating prediction tasks. More importantly, LRMM is more robust to previous methods in alleviating data-sparsity and the cold-start problem.