re:sample


16 minute read

A Guide on Sentence Mixing

Hello again(?)

CH FR

Hi, I’m CH FR, the article you’re reading here is the revised version of a guide I wrote a few years ago. I went back over some of the wording and made it a bit more interactive.

Intro

When learning to make YTPMV and Otomad audio, some parts will be easier to learn, it can depend on your affinities, but also on how much of it is based on concrete theory versus just “vibes”. Some techniques can even have both of those aspects at once: While the technicalities of “correcting a sample with VocalShifter” are easy to grasp, decisions like “do I leave the initial pitch slide intact or do I correct it to” is not that clear-cut.

Sentence mixing kind of feels like it’s almost entirely “guts” and “feeling”. It’s based on lots of tiny considerations and habits that aren’t often put into words, making it all look like some sort of innate skill.

Because it’s such a varied technique, it makes it hard to give advice in the form of broad, sweeping statements. As a result, most of the time someone gives advice, it’s often in a one-on-one, private context. It means there aren’t that many “guides” to Otomad sentence mixing.

However, I think there are parts that can be taught and passed down in a more generic way, and this is what I’ll aim to do with this guide. I believe in the importance of having public learning resources, so that the next generations can get to a higher point than us.

That’s why I’ve written down some of the stuff I picked up over the years. This guide is intended for people who are already familiar with basic audio-making. In it, I will explain the whats and whys of sentence mixing, I’ll give a few basic tips and pitfalls to avoid and finally, I’ll go over some of the mental categories that I’ve made up based on the structure and intent of the sentencing.

I’ll use the words “sentence mixing” and “sentencing” interchangeably, they’re the same thing.

With that out of the way, let’s get started!

Use cases for sentence mixing, what’s it for anyways?

Here are some of the reasons why you’d want to add sentencing to your audio.

Rhythm

Sentence mixing is another way to add a layer of rhythm to your audios, either from the flow of the syllables, or even as drums.

Uniqueness

More generally, sentence mixing is one of the reasons Otomads are what they are, most videos in Niconico’s all-time rankings have some element of sentencing. And it would be hard to imagine what Otomad or YTPMV would look like without those standout videos.

Humor

Humor is another way to make your work memorable. Like turning the dialog of your source material into dirty jokes, halting the rhythm in awkward places, etc.

There are also niche methods such as referencing other works or BGM PVs in clever ways, messing with the pitch (vibrato + wave warp to imply that someone is crying), diverting from a fad in unexpected way, etc.

Lyrics

Having a lyrical section in your audio can really amp up its cool factor, it makes for a nice surprise if it only comes later in. You can stick to the original lyrics, change them up, or even make up lyrics to songs that don’t have any.

After all, timing vocals to the beat and pitching them are just one tiny step away!… Just kidding, tuning alone is its own can of worms and you run into all sorts of complications when you ask the simple question “will this pitch well?”. As a result, I will not give tuning tips in this article. However, this doesn’t mean sentence mixing skills aren’t part of the process, which is why I’m still bringing it up here.

The basics

Vocabulary check

Let’s review the basics, those are concepts that can be useful no matter what style you’re going for. We’ll focus on rhythm.

Here’s a quick reminder of some of the terms you’ll see in this article:

  1. BPM: Beats Per Minute (you need to know where or how to find it).
  2. Time signature (same here).
  3. The marker for a beat, since the time signature is 4/4, you have 4 beats for a measure.
  4. The marker for a measure.

Let’s now move on to some actual tips.

Think with syllables

The first piece of advice I can give is to think of it in terms of syllables, just timing by word isn’t enough.

In this particular example, the difference is pretty subtle, which is even worse! It means that if you don’t notice it, you’ll be stuck with sentencing that’s just “sounds slightly off” without ever knowing why.

Timing by word means operations such as stretching will affect all of the syllables, so if just one syllable is off-time, fixing it will mess up every other syllable.

By the way, this is the song that I am using in this example

This might sound like a lot of extra work if you’ve done it with whole words until now, but it gets easier with enough muscle memory with your DAW. That’s why it’s essential to know and set your shortcuts.

Those are very important since you need to have snapping (for timing) and not have it (for cutting syllables) at the same time.

For example, in Reaper:

Soft and hard consonants

So let’s say that you have timed your syllables down to a T, but the timing still sounds off.

In that case, soft consonants may be at fault.

I learnt about this concept when watching Mikami’s UTAU tutorial.

Soft consonants are drawn out (m, n, sh, s, …) while hard consonants sound percussive and shorter (t, k, p, b, …).

The problem here is that soft consonants can be pretty long, yet you only really “register” the syllable in the rhythm when the vowel comes into start. So the consonant is offsetting the “perceived start” of the syllables, making it sound off-time.

In this specific case, we hear the vowel a whole quarter of a beat too late.

The solution? Go off the grid! In your DAW I mean…

Offset your items so that the vowel starts earlier. The waveform is usually enough to guess where the transition between consonant and vowel happens.

And finally, make sure you don’t overdo it, soft consonants are not always that long.

If it sounds fine from the start, then that’s fine! You don’t have to mechanically apply those rules every single time.

Stretching

You could just stretch the whole syllable, but if you overdo it, you’ll stumble onto one big issue: Consonants don’t stretch well.

If it’s a soft consonant, then it will stretch alright but throw off the timing of the vowel sound, if it’s a hard consonant, stretching it too much will have undesirable effects. A “t” sound, for example, can start to sound like “d”. There are situations where you could exploit this, but this is off-topic.

One way to deal with that is by making a cut between the consonant and vowel part, then stretching only the latter.

Then extend either item so that you have a bit of overlap.

If you don’t do that, you’ll hear the gap between the two items.

If there’s a previous syllable, then make sure that no syllable is ever completely covered, or it will mess with the automatic crossfade.

Having to manage all that can be a bit of a pain, so if you’re using Reaper, you could learn how to use its Stretch Markers instead.

Sometimes you won’t be able to stretch the vowel part because it’s silenced, it can happen when syllables like し(shi), す(su), つ(tsu), etc.. are at the end of certain words. In that case, you can still keep the consonant part as the main rhythmic element.

Let’s look back at the previous example, with the timings now readjusted for soft consonants.

Dealing with gaps

You may have some gaps between your items, how you plan to deal with them can vary depending on the situation.

  1. You could stretch the items to entirely fill the gaps, in which case, you’ll have to be careful about consonants as seen previously.
  2. You could add some repeating syllables or stuttering.
  3. You could preserve, or even embrace the gaps to make the next syllable more impactful, here’s an example (see the gap In Ara-i san, this doesn’t really apply to Fenne-c because the gap is part of the regular pronunciation for フェネック).

About point 3, you can also introduce tiny gaps at the end of your syllables to make the next one easier to hear, this can be used on fast-paced sentence mixing, and will make the whole thing sound pluckier with added compression.

In summary

Here’s an example of working with these concepts in mind, by Kolina.

Styles of sentence mixing

From this point on, we’ll talk style over technique. This part of the article is more subjective, since I have to label vague ideas about sentence mixing.

Knowing the different ways you can do sentence mixing, and what effects each of them can achieve, will help you to get a sense of what style best fits your needs.

The way sentencing is done can vary from person to person, video to video, and even scene to scene. By listening attentively, you’ll learn to pick up on reoccurring characteristics in many videos.

Here are some makeshift categories that I’ve found useful to slap a label on.

Meaning-focused

Defined by the commitment to making complete, structured sentences.

Here are some examples:

The sentence integrity is preserved, and the rhythm stays relatively consistent. At 0:34, the rhythm from the original lyrics is completely ignored in order to deliver the source’s dialog more smoothly.

At 00:48, we’ll see that most syllables are the same length (1/4th of a beat). In this case, the variations are dictated by the dialog itself: syllables are being elongated (― = ~) and put together (like the “ni n”).

Here, meaning takes precedence over rhythm, each syllable get roughly the same amount of time, and some sentences end up being longer than others.

The sentencing in this one has a less consistent rhythm, with syllable length varying wildly because of the anime’s frequent tirades and the song’s fast tempo. Depending on the song, this pacing may sound good. Otherwise, you’ll need to consider eliminating some sentences to get more space.

Looking at 00:32, the shortest syllables are still 1/4th of a beat. The gaps and the composed syllables (“sen”, sei”, “ren”, “ai”) help in making the pacing more digest. They give the listener time to catch up on the meaning.

Additional examples

Catchphrase-focused

We say that limitation breeds creativity, I think Otomads are no exception. You’ve probably seen a few videos where the author gets more mileage than you’d expect from just a few seconds of footage. The simplicity of that kind of sentencing makes it a lot easier to remember, so those Otomads can often become earworms.

Examples

Rhythm-focused

What happens when even words become optional? You get to the logical opposite of meaning-focused sentencing, where the rhythm dominates every decision.

A lot of EDM artists regularly do that. Like PSYQUII or t+pazolite.

Examples

Chaos

A powerful love letter to Otomads that makes for a great collab finale or a short, supercharged part.

Examples

Mixing styles together

It’s common for more than one style to be used at once. Here’s some general advice when you do this.

Here are a few examples:




Where we blur the line

The last thing you’d want to do after reading this article is to think that this is all there is to know about sentence mixing, or that you must adhere to only one of those styles. Hopefully the later sections have done a good-enough job to show that most authors will switch a lot between style and put their own twist on it, and that’s what you should do too!

Think of the styles I presented as a rough outline to get started when brainstorming for a video.

…But what if, those instructions were the very reason you’d want to do go with a different style? What would a story-driven Haunted Dance Otomad sound like? Those are the thoughts I’d like you to keep in mind.

So go crazy! Understand what makes the sentencing in the Otomads you love what it is, learn to identify the different techniques and patterns, and then Do the wild things no one’s ever seen!

With all that said, does it mean that an audio needs to have sentence mixing in order to tell a story? Fuck no.

Conclusion

In this article, we first revisited some of the basic item-editing tricks that are needed for sentence mixing, as well as what are the mistakes that can make your timing sound “off”.

Then, we took a broader and more opinionated look at some of the styles I was able to identify and label over time.

I don’t think the diagrams I showed in this article will be of any use as a blueprint, rather, they illustrate what I believe were some of the concessions the authors had to do when it comes to the balance between rhythm and meaning.

That should be about it, thanks for reading thus far, and I hope this has been a helpful read.

Tags: