Evolution of Tech Content Creators

curious, tinkerer, and explorer

Question

YouTube’s analytics team is studying the evolution of tech content creators. They have historical data showing how creators’ content changes over time in terms of technical depth and entertainment value. Each video is rated on two scales: technical depth and entertainment value and each creator posts one video every week.

You have a dataset of 100 creators spread across 52 weeks. Each line in the dataset contains <tech value, entertainment value> of previous video and <tech value, entertainment value> of the next video posted by the same creator. Analyzing this shows will show you how the content evolves over time.

Now, you are given a different list of 30 creators and their current state of content <tech value, entertainment value>. Now among these 30 creators figure out,

the creator, to have highest technical depth after 4 weeks
the creator, to have highest entertainment value after 4 weeks
the creators who switched from tech-focused to entertainment-focused and from entertainment-focused to more tech-focused

You can output the index of the creator in the list of 30 creators (starting with 0).

Datasets

Solution

Here’s the code for reference and some notes on the solution below.

We need to use the data to calculate the transformation matrix. The transformation matrix will be a 2x2 matrix which tells how much current tech depth and entertainment value influences future tech depth and entertainment value.

To generate this matrix, we leverage least squares regression method. Either follow the link above or refer your favourite LLM tool to build an understanding. Applying this method to the data will give us the following matrix.

[[0.70500624 0.19902547]
 [0.09087316 0.89926622]]

0.70500624 represents, how much current tech depth influences future tech depth
0.19902547 represents, how much current entertainment value influences future tech depth
0.09087316 represents, how much current tech depth influences future entertainment value
0.89926622 represents, how much current entertainment value influences future entertainment

Now that we have the transformation matrix, we can use it to predict the future state of any creator. The idea is to multiply the transformation matrix with the current state of the creator to get the future state.

To compute the kth state, we have two options

Multiply the transformation matrix with the current state k times
Use eigenvalues and eigenvectors

The second option is better because it is faster and more efficient.

def predict(A, x0, k):
    eigenvalues, eigenvectors = np.linalg.eig(A)
    return eigenvectors @ np.diag(eigenvalues ** k) @ np.linalg.inv(eigenvectors) @ x0

Applying the predict to all 30 creators (in the test), we get the final state for each and then computing

np.argmax(final_state[:, 0]) to get the creator with highest technical depth after 4 weeks
np.argmax(final_state[:, 1]) to get the creator with highest entertainment value after 4 weeks
Comparing argmins of initial and final state to tell which creators switched from tech-focused to entertainment-focused and from entertainment-focused to more tech-focused

Why this matters?

Transtion or Transformation matrix can be leveraged to predict the future state of any system
This is used in prediction, system stability, recommendation systems, etc.
This is used in Markov Chains to predict the future state of a system
This is used in Computer Graphics, Finance modelling, NLP, and Social Media Analysis.

Staff Engg at GCP Memorystore, Creator of DiceDB, ex-Staff Engg for Google Ads and GCP Dataproc, ex-Amazon Fast Data, ex-Director of Engg. SRE and Data Engineering at Unacademy. I spark engineering curiosity through my no-fluff engineering videos on YouTube and my courses