The opinions stated here are my own, not those of my company.
the medium makes the model impossible to scale without degrading the user experience
Before thinking further about scale, I want to briefly analyze the medium itself, as layers, as well as the content made in and for the medium.
To not bury the lede though, here’s my tl;dr. IMHO the medium is ultimately digital audio. Podcasting is content in this medium, albeit one near and dear to many people’s hearts. Scaling the medium itself, i.e., being creative with audio’s possibilities, is definitely achievable through innovation. Doing this without “degrading the user experience” is subjective, depending on what someone considers a degraded experience. Inserting ads into the audio stream is definitely not going to be the only game in town.
The lowest layer of the medium is digital audio. But even this layer is non-obvious, as audio can, with technology, be arranged into parallel streams of audio bytes. One stream or many? Who can read and write to which streams, that is, who can listen to which streams, and who can send audio back (e.g. through speaking, playing music, live microphone in a crowd or public place, triggering sound FX, etc.) to which channels? How do streams themselves interact (just separate? can they be merged? are there hierarchies of participation and experience?) Are the channels transient (a la Snap) or persisted? Will the stream be open, or DRM-protected? And so on.
Clubhouse got traction in part by innovating at this lowest layer, the audio itself, but there are many more intriguing combinations here to consider and possibly pursue.
The next layer up from audio is the set of complementary experiences created by related non-audio mediums that either “participate” with the audio intensely or just embellish it in some way. These too can be many. Consider text. Live transcription of the audio content, allowing people to “tweet” short text synchronized in time (a la SoundCloud), automatically publishing the transcript as blog posts for further discussion, etc. Heck, you can even convert text to audio using text-to-speech synthesis, allowing people to participate in the audio even if they can’t currently speak live (e.g., on a plane), or can’t speak at all (e.g., muteness, or pan-language discussions, as I can’t speak Portugese, Thai, Swedish, etc).
And text isn’t the only complementary medium. You can ask similar questions for visuals (photos, video, art, the almighty screenshot, etc) too.
The next layer up is the “app”, be it a mobile or web app. How does the app facilitate the audio, the complementary mediums, and the various participants? How much will it cost to design, build, maintain, evolve? Is the app “just an app” or is it a platform, and if the latter, who are the content creators on the platform — primarily podcasters, or a wider swath of participants (human, animal, virtual agents and digital intelligences, etc)?
You see where I’m going with this. There’s plenty of possibility in the audio and podcasting space, no doubt. It’s just the usual goblins lurking around the bend: business model, scale, technical sophistication, funding and cost, critical mass of adoption/usage, etc.
I also distinguish between these medium layers from the content created in and for it. There is a spectrum here from NPR to Joe Rogan. The content is how the medium is used: what is said or done, and for what purpose or end. Conversation? Community? Public Good? Profit? A “good business”? etc.
So back to the quote and my tl;dr. The medium is digital audio. Podcasting is content. Podcasting is near and dear to many people’s hearts, and part of that passion entails keeping podcasting available through certain approaches to Free, Open, etc. This is good and important, but overall, audio is a wide, wide space where a lot of experiences will unfold.
Scaling the medium, audio itself, is largely about being creative with audio’s possibilities, and realizing those possibilities in durable ways. Being a technologist, my answer here is ultimately innovation: the concerted effort to make things people want and use. Some of these ideas will be hard to make happen for time-varying technical reasons, others for business or capital reasons, others for lack of compelling content or being insufficiently differentiated, etc. Clubhouse is an example of this so-far-succeeding innovation, and Pocket Casts has an unclear future, but it’s the just a few chapters of this story, definitely not the last. Whether that happens from startups, or “Big Tech”, or public initiatives, etc., or all the above, is all TBD.
However, doing this without “degrading the user experience” is subjective, depending on what someone considers a degraded experience. Inserting ads into the audio stream is definitely not going to be the only game in town. Some experiences will be paid, others ad-supported, others funded publicly or through governments. I imagine there will be an ICO or NFT involved too… 🤷.
So there you have it. Just my personal thoughts.
Stratechery has a recent article on Clubhouse, which I’ll read soon, but I like to form my own opinions first.