In the last couple of years, my Data Science attention went mostly towards text (NLP / NLU), but that does not prevent me from playing around with video. Inspired by Trump’s response to his Corona approach and Jim Carrey vs Allison Brie, see my first attempt at playing with DeepLearning for video and DeepFakes.
While it is by no means perfect. I must say that I am very impressed with how easy it was to accomplish this result. In my opinion, it is therefore a good example of the Democratization of AI. Although these trends carry risks (from Fake News to Garbage in / out), am I a big believer that this trend will bring us more good than bad.
It took me four days of training time, well not me, my computer (MSI Prestige with NVIDIA GTX1070). While it was crunching, I was actually away for the weekend. Personally, I think I didn’t spend more than two or three hours working on it. Where most of my time went into taking a video of myself creating training data, and some after effects.
Another trend that is emerging is a greater focus on efficiency. Days / weeks / months of training and pretrained models that consist of billions of parameters; GBs, or even TBs, in size, are not always easy and quick to get into production. While only when algorithms are in production, they can add value. A while back I heard that only 20% of the algorithms go into production. When I look at my years as a data scientist consultant, I’ve seen quite some organizations that didn’t even reach 20%. But that’s a story for another time.
For now let’s get back to DeepFake. Or how you can do this by yourself! This is not going to be a detailed tutorial, but more the steps I took and stuff I learned from it. You also don’t need to be very techy to be able to do this!
- Find a video where you want to put a / your face on.
- Create a video of around 20 minutes of yourself. I just recorded myself while I was working. I actually found the video while I was recording, looking back, I would definitely first find the video. The reason is quite simple, if you imitate the destination video, the results will be better.
- Some tips:
- Try to get the lighting the same (mine was too light),
- imitate speech (as i was filming while working, I didn’t really speak, so in the results my teeth are somewhat blurry and Jeroen Scott doesn’t articulate very well)
- maybe shave the same as the person you’re imitating.
- Then go through the batch files:
- clear workspace.bat
- extract images from video data_src
- extract images from video data_dst FULL FPS.bat
- data_src faceset extract.bat
- data_dst faceset extract.bat
- train SAEHD.bat
I used the SAEHD for training, there are other options, but from all the blogs I read, this one was use most.
- TIP: go away for the weekend. your computer will make a lot of noise. You won’t be able to concentrate on other stuff without earplugs. My main question here was: how long do I need to train, and I found multiple answers. One said three to seven days (wow big help), another said until you reach a loss of 0.02. Well I got to 0.16 after four days, the next day I needed to go work again and didn’t want to be distracted by the noise of my computer so I stopped. (You can always pick up the training at a later stage!) and I am quite happy with the results.
- merge SAEHD.bat
- merged to mp4.bat. There is an interactive shell which allows you to alter things like the size of your head (I didn’t need to change), skin color (ended up using mix-m) and other stuff. The secret here: almost nobody knows what is really happening here, just try some stuff out.