Audio deepfake detection is essential for addressing societal challenges such as differentiating real news from fake content or authenticating voice recordings in legal contexts. However, identifying whether a voice is human or AI-generated requires knowing which characteristics to examine, and the choice of voice features for this task is relatively unguided. This justifies the systematic review presented in this paper. Hypothesizing that human voices exhibit more intra-speaker variation than deepfakes, the aim of this review has been to summarize and analyze the published studies on the topic of intra-speaker variation in human voice. Findings highlight speaking style as a major factor in intra-speaker variation affecting 10 various acoustic parameters. We conclude that experts should focus on those voice features exhibiting intra-speaker variation when analyzing potential deepfakes.
Link to the publication here