The issues of algorithmic fairness and bias have recently featured prominently in many publications highlighting the fact that training the algorithms for maximum performance may often result in predictions that are biased against various groups. Educational applications based on NLP and speech processing technologies often combine multiple complex machine learning algorithms and are thus vulnerable to the same sources of bias as other machine learning systems. Yet such systems can have high impact on people’s lives especially when deployed as part of high-stakes tests. In this paper we discuss different definitions of fairness and possible ways to apply them to educational applications. We then use simulated and real data to consider how test-takers’ native language backgrounds can affect their automated scores on an English language proficiency assessment. We illustrate that total fairness may not be achievable and that different definitions of fairness may require different solutions.