Photon by HTML5 UP

Have you ever wanted to get into a podcast episode,
but didn’t listen because it was too long?

Reading a condensed summary of details from a podcast episode can give you a sense of what a new show focuses on or help you catch up on hours’ worth of content in minutes. Whether you are catching up on previous episodes or browsing for something new, TL;DL will give you the synopses of the podcast of your choice and help you make better use of your time.

Why TL;DL?

Podcasts and its supporting listenership have exploded in recent years. The podcast industry is targeted to reach $1 billion in 2021 (BusinessInsider). A 16% increase in listeners is expected over the course of 2021, with the total number of listeners having doubled since 2016. Principle US market share holders include Apple, Spotify, YouTube and Google, with the first two possessing the majority share of the market.

Topics covered by podcasts are as diverse as listeners. Though content is more readily available than ever before, what has not changed is the time needed to consume it. According to Spotify's Podcast Dataset, the average length of podcasts is around 34 minutes. This would indicate that spending five minutes or less to read a detailed, accurate summary of podcast episodes would provide significant time-saving benefits. Furthermore, existing podcast descriptions found on popular podcast platforms aim to entice the listener, rather than provide an accurate summary of the content available.

Beyond the immediate summarization of content for podcast listeners, TL;DL can offer substantial support to entities that curate audio content. Whether it be public libraries or private companies, TL;DL can support numerous requirements within the audio industry.

TL;DL's Value Proposition

x

TL;DL aims to provide concise, accurate, and comprehensive summaries of podcast episodes that a listener can read within a few minutes in lieu of listening to episodes in their entirety. According to podcast listener surveys, we found that:

62% said they would use summaries to explore new podcasts

38% said they would read summaries instead of listening when they're short on time

In both cases, it's essential to listeners to ensure that generated summaries are not only easy and quick to read, but also accurate and comprehensive in terms of content so that they may be used to replace listening to podcasts as needed.

Sample Summary

We generated a summary of "Elise Strobach of AeroShield: a clear (and energy efficient) window to the future", from the Understory podcast; you can find the audio and transcript here.

Elise Strobach grew up in a small town in Wisconsin, where "we really spent a lot of time out in the environment and really saw the appreciation of just how magnificent the world we live in is," she says. So when she and a co-founder of her company, AeroShield, realized that 80% of new windows in the US aren't energy efficient and lose $40 billion a year because of them, "the more we became excited about trying to solve it." Their solution? A "super clear, transparent solid insulator" that's made up of 95% air, Strobach tells the Understory podcast. "It allows these tiny little pores that we've engineered to be so small that the wavelengths of light don't interact with them," she says. "It allows these tiny little pores to do all the job of superinsulation regardless of the wavelengths of light." Strobach, CEO and co-founder, says the company's " airship material" is ready to drop into existing window manufacturing so that "we can create better window products from the ground up and that's pretty exciting."

x

Spotify Podcast Dataset

The podcast dataset contains about 100,000 podcasts filtered to contain only documents which the creator tags as being in the English language, as well as by a language filter applied to the creator-provided title and description. Episodes were sampled from both professional and amateur podcasts including episodes produced in a studio with dedicated equipment by trained professionals, as well as episodes self-published from a phone app — these vary in quality depending on professionalism and equipment of the creator. Dataset and full information may be accesssed at https://podcastsdataset.byspotify.com/

Pipeline for Text Summarization

The supporting pipeline for this project is simple and straight forward. Upon identifying a podcast audio file, the text is transcribed and then sent to the summarization model. The output summary is an easily digestable and is approximately a paragraph in length.

Podcast Transcription

Google Cloud’s Speech to Text API handles the transcription portion of the pipeline. This out-of-the-box solution has many advantages, though it does impose some limitations on the final output.

More

PEGASUS Model for Abstractive
Text Summarization

PEGASUS is a state-of-the-art model for abstractive text summarization developed by Google Research and published at the 2020 International Conference on Machine Learning.

More

User Surveys

To capture insights from current podcast listeners, a survey was distributed to a variety of audiences and was posted on LinkedIn, Facebook Groups, and various channels on the MIDS Slack organization. We obtained a respondent sample size of 40, and the survey results (1) delineated the general habits of podcast listeners and (2) gauged the concept utility of TL;DL by asking users' interest and likely use cases.

Understanding Podcast Listener Habits

Based on survey responses, we found that around half of users spent at least one hour or more listening to podcasts per week. Given this fact, we determined three distinct audience groups within the podcast listenership:

Casual listeners (listen to less than 1 hour of podcasts per week)

Moderate listeners (listen to 1-3 hours of podcasts per week)

Enthusiastic listeners (listen to more than 3 hours of podcasts per week)

x

Around 30-37% of casual listeners report not listening to full episodes most of the time. However, this drops off for moderate (17%) and enthusiastic (8%) listeners, indicating that casual listeners are more likely to use TL;DL in lieu of listening to a full episode.

Additionally, casual and moderate listeners are more likely to think that it’s not necessary to keep current with a podcast to listen to the latest episodes. TL;DL may give casual and moderate listeners a chance to keep up with older episodes that they wouldn’t have listened to anyway.

It’s important to enthusiasts to keep current on a podcast’s older episodes. TL;DL may allow enthusiasts to keep current with a podcast without having to listen to an episode’s entirety.

x

Concept Utility

After asking users about podcast listening habits, we then introduced them to the concept of TL;DL as a tool that would provide accurate and comprehensive summaries of podcasts that they would listen to. When asked whether or not TL;DL would enhance their listening experience:

Moderate and enthusiastic listeners were more likely to say that TL;DL would provide little to no value

Casual listeners were the most receptive to TL;DL - 20% of group indicated the concept added a large amount of value

x

Additionally, we asked users in which scenarios they would most likely use TL;DL-generated summaries. The plot below shows the distribution of the reported likely use cases:

Thus, 62% of users said they would most likely use TL;DL to explore new podcast content, while 38% reported they would read summaries of podcasts when they were short on time.

Send us your feedback!

Rate and review your TL;DL summary

The Team

Inspiration for this project came from avid podcast listeners who were also practicing data scientists. Brought together by the UC Berkeley MIDS program, this project represents a capstone experience that not only supported degree requirements, but also honed skills necessary to tackle present and future data challenges.

Jill Cheney
LinkedIn

Patricia Domínguez
LinkedIn

Kenneth Pong
LinkedIn

Cecily Sun
LinkedIn

Too Long; Didn't Listen
(TL;DL)

Have you ever wanted to get into a podcast episode,
but didn’t listen because it was too long?

Why TL;DL?

TL;DL's Value Proposition

Sample Summary

Spotify Podcast Dataset

Pipeline for Text Summarization

Podcast Transcription

PEGASUS Model for Abstractive
Text Summarization

Podcast Transcription

Advantages

Disadvantages

PEGASUS Model for Abstractive Text Summarization

Entity Hallucination

Evaluation

References

User Surveys

Understanding Podcast Listener Habits

Concept Utility

Send us your feedback!

The Team

Too Long; Didn't Listen (TL;DL)

Podcast Transcription

PEGASUS Model for AbstractiveText Summarization

Too Long; Didn't Listen
(TL;DL)

PEGASUS Model for Abstractive
Text Summarization