Elevating Content Verticalization With Emotion-Aware AI: A Q&A With Vionlabs' Arash Pendari

Vionlabs makes full content libraries “discoverable, engaging, and revenue-ready” using its “emotion-aware” AI. It’s a company paying attention to the shift toward vertical and mobile-first viewing, including the micro-drama boom. Founder and creative director Arash Pendari writes in a blog post, “The [content] libraries are there. The content value is there. The audiences are there—scrolling, searching, spending more time in vertical formats than on Netflix. What is missing, for most incumbents, is the AI infrastructure to extract that value at the speed and scale the moment demands.” That’s where Vionlabs comes in. Content creators that work with Vionlabs get the verticalization of long-form content into chapters and micro-episodes, preview clips for every title that can be used for social media and promotion, scene-level metadata that surfaces content to match viewers’ moods, and non-intrusive ad breaks with contextual targeting.
I spoke with Pendari about his thoughts on the use of AI in the streaming industry and how Vionlabs’ platform works.

Arash Pendari, Founder & Product Evangelist, Vionlabs
Brandi Scardilli: How has Vionlabs’ use of AI evolved as the streaming industry changes? How does Vionlabs test and refine its AI to ensure it is performing as the company intends?
Arash Pendari: When we started Vionlabs, the core problem we were solving was helping streaming platforms understand their content at scale—not just metadata, but what is actually happening visually and emotionally in a piece of content, frame by frame. Early on, that meant building computer vision models that could analyze mood, pacing, and visual energy to power things like better content discovery and smarter trailer generation.
As the streaming industry matured, the challenges shifted. We went from helping platforms with a few thousand titles to working with libraries of hundreds of thousands of hours of content. At the same time, consumption behavior changed dramatically. Audiences increasingly watch on mobile, in short-form, in vertical. That pushed us to evolve from pure analysis into transformation: not just understanding content, but intelligently adapting it for new contexts and formats.
On the refinement side, we work in close collaboration with our clients. Every content library presents unique challenges, different production eras, different cinematographic styles, different genres. We run continuous evaluation pipelines where outputs are scored against both automated quality benchmarks and human review. Over time, the feedback loops from real deployments are what make the models sharper. You can build a strong model in a lab, but the real refinement happens when you’re processing content at scale and edge cases surface that you never anticipated.
AI’s primary value proposition is based on speed and cost and accomplishing a task that would take too long and cost too much if it had to be done by humans, but AI output still needs humans to check it for accuracy. What does “human-in-the-loop” quality control look like for vertical content library conversion? How much time and expense does that typically add to the process?
Human-in-the-loop is not an afterthought for us; it’s built into the workflow architecture from the start. We believe that creative tasks like this need tools to give creative teams better control of the models and scale their creativity. AI handles the scale and consistency that would be impossible for humans to achieve, and humans handle the judgment calls that require contextual or cultural understanding the model hasn’t fully learned yet. You can scale your creativity with machines but can only perfect it with humans.
In practice, our QC process works on a tiered basis. The AI processes the content and flags scenes where its confidence score falls below a defined threshold, using signals like character tracking, dialogue detection, and relevance scoring to determine the cropping of vertical videos and framing. Those flagged segments are then routed to human reviewers who can accept, adjust, or override the AI’s decisions about what type of scene, mood, or characters to prioritize.
This means the human effort is concentrated where it actually matters, rather than spread thin across thousands of scenes that the model handles well. The result is that the overhead is significantly lower than a fully manual process, while still maintaining editorial quality standards that clients trust. And importantly, every human correction feeds back into model training, so the threshold for what requires review shrinks over time.
-ORG.png)
What reference points in a piece of content is Vionlabs’ AI trained to identify? Are there particular cues? When those don’t exist, how does the AI adjust?
Our AI looks at content through several simultaneous lenses. On the visual side, it’s tracking face detection and positioning, body language, dominant subject movement, depth of field, shot composition, and camera motion. On the audio side, it’s analyzing dialogue patterns, sound design cues, and music, because audio is often one of the most reliable signals for scene structure and emotional weight. And on the text side, it’s processing dialogue, keywords, and narrative cues, all aligned on a single timeline so the AI experiences the content the way a viewer does. Beyond those perceptual signals, the model is trained to understand editorial grammar, the logic of how shots are assembled into scenes and scenes into sequences. A cut to a close-up typically signals something emotionally important. A wide establishing shot signals a transition. These patterns hold across most narrative filmmaking.
The interesting challenge is when those conventions break down—experimental cinema, certain documentary styles, or older archive content shot with different grammar entirely. In those cases, the model leans more heavily on contextual inference: If the strong primary cues are absent, it looks at surrounding context, pacing patterns, and secondary signals to make the best compositional decision. It’s less confident in those moments, which is why our QC pipeline flags them for human review. We’d rather surface uncertainty than paper over it.
Are there certain types of streaming content that Vionlabs finds easier or more difficult to convert to vertical? For example, how does filmmaking with unconventional framing, less camera movement, or not much dialogue affect the transition to vertical? How do group scenes and action set pieces affect it?
Content that was produced with clear subject hierarchy and active camera work tends to convert most cleanly. Dialogue-driven drama, interview formats, sports highlights—these give the AI strong, consistent signals about where the meaningful action is happening in the frame.
The harder cases are exactly the ones you’d expect. Films with deliberately unconventional framing, compositions where the director has intentionally placed subjects in the periphery for artistic effect, etc. are genuinely challenging, because respecting that intention while making the content functional in a 9:16 crop requires genuine editorial judgment. Static, wide-shot-heavy films can also be difficult; there’s less motion to track and the compositional logic can be more ambiguous.
Group scenes and action set pieces present a different kind of challenge, not because the AI can’t identify what’s happening, but because there’s often no single correct crop. In an ensemble conversation with four people in frame, any one of them might be the focal point depending on the narrative context. And in complex action sequences, the relevant action may be distributed across the entire frame simultaneously. These are cases where story awareness matters most, understanding not just what’s visually prominent but what’s narratively important in that specific moment.
What makes Vionlabs’ AI “emotion-aware” (and, as I’ve also seen referenced, “story-aware”)? What are some ways Vionlabs is training it to recognize narrative structures, scene functions, and emotional arcs? Why does this distinction matter for viewers?
Part of what drives this for us is that we are huge film nerds. We have spent a lot of time watching movies and thinking about why certain scenes hit differently than others, and the answer is almost never just about what’s on screen. It’s about what the filmmaker is building toward. That obsession with storytelling is a big part of why Vionlabs went deep on this problem rather than stopping at visual analysis like most of the industry did.
Most AI that processes video treats it as a sequence of independent frames or short clips. It can tell you what’s in a frame, but it has no understanding of what that frame means within the story it’s part of. That’s a fundamental limitation if your goal is to make intelligent editorial decisions. Our approach is to build models that understand video at the level of scenes and sequences, not just frames. That means training on large volumes of narrative content with labels that capture function: Is this scene establishing context, building tension, resolving conflict, and tracking emotional valence? What is the audience meant to feel here? The model learns to recognize the signals that filmmakers use to communicate these things: shot length, camera distance, lighting, music, pacing, dialogue rhythm, character dynamics.
The distinction matters for viewers because the right crop for a scene isn’t just about where the faces are, it’s about what the scene is doing. A tense standoff and a comedic reaction shot might have identical visual compositions, but they deserve different treatment. Story awareness is what allows us to make those distinctions at scale, rather than reducing every scene to the same compositional logic. The viewer may never consciously notice the difference, but they feel it. A vertical clip that respects the emotional intent of a scene holds attention; one that doesn’t feels subtly wrong even if nothing is technically out of frame.
The companies that will win the next decade in entertainment are not necessarily the ones with the biggest libraries. They will be the ones that understand their libraries best, and can translate that understanding into every viewer-facing touchpoint, recommendations, thumbnails, previews, and yes, vertical clips. Story-awareness is what makes that possible at scale.

Are there any particular content libraries that exemplify Vionlabs’ iterative process? Are there client examples you can discuss?
Our most instructive projects have come from working with deep, diverse libraries at real scale, and we’re fortunate to work with some of the most significant names in the industry. Our clients include Paramount, Hulu, Plex, Pluto TV, Canela Media, and Deutsche Telekom, among others. That’s a deliberately varied mix: major subscription streamers, ad-supported platforms, Latino content specialists, and large telco operators. Each presents a genuinely different challenge.
Plex is a good example of the iterative dynamic at work. They have an enormous catalog spanning decades, genres, and production styles, including a significant volume of content that arrived with minimal or inconsistent metadata. Getting our models to perform consistently across that range—everything from studio releases to independent film to classic television—required real iteration. You encounter edge cases at scale that you simply don’t anticipate from a smaller sample, and each round of refinement makes the model sharper across the board.
The same applies on the international side. Working with Deutsche Telekom means dealing with content that spans different cinematographic traditions, languages, and production eras across 99-plus languages. That breadth is actually one of the most valuable things for model development, because it forces the AI to generalize intelligently rather than overfit to a single content style.
What’s the workflow a client can expect to experience when converting their content? Are clients involved in QC review? How granular is their involvement? Are there instances where clients have looked at a converted scene or show and said, “The focal point of this scene is out of frame”?
The workflow is designed to be as low-friction as possible on the client side while still keeping them meaningfully in the loop. At the start of an engagement, we spend time understanding the client’s editorial standards and brand sensibility, what quality bar they’re holding us to, whether there are content categories that need special handling, and what their platform’s audience expects. From there, we ingest the content library and the AI begins processing. Clients typically get access to a review interface where they can sample outputs across different content types, spot-check flagged scenes, and provide feedback. Some clients want deep involvement in QC; they’ll review a statistically meaningful sample of every content category before sign-off. Others prefer a lighter touch and rely on our internal QC thresholds. We accommodate both.
To the specific question about scenes being out of frame: Yes, that happens, and it’s valuable when it does. When a client comes back and says, “In this scene, the storytelling depends on the eyeline between these two characters and your crop breaks that,” that’s exactly the kind of domain-specific feedback that makes the model smarter. It’s not a failure, it’s the process working. Those corrections get fed back into training and that category of error becomes rarer with each iteration.
-ORG.png)
What are the most common ways or platforms clients use the converted vertical content?
The dominant use case right now is social media—Reels, TikToks, YouTube Shorts—where streaming platforms are trying to drive discovery and engagement with audiences who spend significant time in vertical environments. A 90-second vertical clip from a series can perform very differently in that context than a traditional landscape trailer, and platforms like Paramount and Plex are increasingly sophisticated about testing and optimizing that.
Beyond social, we’re seeing vertical content deployed within streaming apps themselves for mobile-native browsing experiences. The idea being that a user scrolling through content on their phone shouldn’t have to tilt their device or squint at a letterboxed preview. And there’s growing interest in vertical as a format for highlights and recap content, particularly in sports and anime, where the consumption context is often a second screen or a commute.
One of the most exciting emerging use cases is micro-dramas. This is a format that originated in China and has grown into a global phenomenon: serialized stories of typically 60 to 90 seconds per episode, shot natively in vertical, designed for binge consumption on mobile. The global market is projected to reach between $11 and $14 billion by the end of 2026, and Hollywood is paying serious attention, with Fox Entertainment, for example, committing to producing over 200 vertical series for the format. For existing content libraries, vertical conversion is part of how traditional platforms are exploring whether their IP can travel into this new format, adapting scenes and sequences into the pacing and framing conventions that micro-drama audiences expect.
The broader trend is that vertical is no longer just a social media consideration. It’s becoming a first-class format that streaming platforms need to support across the content life cycle, from promotion to playback. When you’re working with platforms like Plex and Pluto TV that serve audiences across mobile and connected TV simultaneously, the ability to deliver the right format for the right context at scale stops being a nice-to-have and becomes core infrastructure.
Related Articles
Just because microdramas are being viewed on mobile devices doesn't mean that they're not a streaming media opportunity.
27 Mar 2026
On Thursday, February 26, at Streaming Media Connect, the panel "Vertical Leap: Growing the Free Vertical Drama Business at Streaming Media Connect" assembled expert practitioners from Celestine Pictures, Hudson Vertical, Stratagem Vertical, and GoodShort to explore the ways that vertical drama is exploding on free streaming platforms, driven by mobile-first viewing, bingeable formats, and a new generation of viewers and global creators.
02 Mar 2026
The world of streaming is on the brink of another shift. After years of long-form dominance—sprawling series, multi-hour binge sessions, and cinematic storytelling—attention is swinging toward a radically different format: micro-dramas. And if the early signals are right, they could be the most significant disruption to OTT since the binge model itself.
11 Sep 2025
On Tuesday, August 20, leading new media and technology producer and strategist Chris Pfaff will moderate the panel "Now It's Personal: AI and Streaming Personalization." Many of the ways AI is poised to alter the streaming ecosystem happen behind the scenes, involving streamlining workflows or automating repetitive production or delivery tasks. But AI also has the potential to transform viewing experiences, making them more personal and immersive. This panel of industry experts from Google TV, Vionlabs, TCL, and Deltatre will explore what's possible and probable with AI and streaming in the months and years to come.
12 Aug 2024