Tags
understanding

Threads and library pages tagged with

understanding

UI-UG: A Unified MLLM for UI Understanding and Generation

Although Multimodal Large Language Models (MLLMs) have been widely applied across domains, they are still facing challenges in domain-specific tasks, such as User Interface (UI) understanding accuracy and UI generation quality. In this paper, we introduce UI-UG (a unified MLLM for UI Understanding and Generation), integrating both capabilities. For understanding tasks, we employ Supervised Fine-tuning (SFT) combined with Group Relative Policy Optimization (GRPO) to enhance fine-grained understanding on the modern complex UI data. For generation tasks, we further use Direct Preference Optimization (DPO) to make our model generate human-preferred UIs. In addition, we propose an industrially effective workflow, including the design of an LLM-friendly domain-specific language (DSL), training strategies, rendering processes, and evaluation metrics. In experiments, our model achieves state-of-the-art (SOTA) performance on understanding tasks, outperforming both larger general-purpose MLLMs and similarly-sized UI-specialized models. Our model is also on par with these larger MLLMs in UI generation performance at a fraction of the computational cost. We also demonstrate that integrating understanding and generation tasks can improve accuracy and quality for both tasks. Code and Model: https://github.com/neovateai/UI-UG

discord-link-aggregator-cpl0's avatar

Discord Link Aggregator

@discord-link-aggregator-cpl0


View profile
Copy link
  Report member
9d
Library

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Screen user interfaces (UIs) and infographics, sharing similar visual language and design principles, play important roles in human communication and human-machine interaction. We introduce ScreenAI, a vision-language model that specializes in UI and infographics understanding. Our model improves upon the PaLI architecture with the flexible patching strategy of pix2struct and is trained on a unique mixture of datasets. At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements. We use these text annotations to describe screens to Large Language Models and automatically generate question-answering (QA), UI navigation, and summarization training datasets at scale. We run ablation studies to demonstrate the impact of these design choices. At only 5B parameters, ScreenAI achieves new state-of-the-artresults on UI- and infographics-based tasks (Multi-page DocVQA, WebSRC, MoTIF and Widget Captioning), and new best-in-class performance on others (Chart QA, DocVQA, and InfographicVQA) compared to models of similar size. Finally, we release three new datasets: one focused on the screen annotation task and two others focused on question answering.

discord-link-aggregator-cpl0's avatar

Discord Link Aggregator

@discord-link-aggregator-cpl0


View profile
Copy link
  Report member
9d
Library
 Search Ask

Makeroom

RegisterLogin

Discussion

General
Tech

Library

Chevron Right Icon
Design
Resources
Websites
Articles
Chevron Right Icon
Web development
Cool Libraries
Tools
Resources
Papers and Studies
Articles
Language Models
Tech and Systems
Chevron Right Icon
Computers
Chevron Right Icon
Windows Tools and Modding
Windhawk
Raycast for Windows
Rainmeter
Plasma 6.7
Haiku: BeOS-Inspired Open-Source OS
Chevron Right Icon
Random fun stuff
Esoteric File Systems
Cool websites
Chevron Right Icon
Friends
Unity - Cheaterman's Bar
Chevron Right Icon
Storyden
Selfh.st
OpenAlternative
Microlaunch
Peerlist
Glama.ai
AlternativeTo
Brandfetch
Dokploy
PitchHut
Piefed Social
 Collections Links Members Roles

Makeroom

Icon

A small rag-tag assortment of makers, engineers and designers sharing mentoring, support and projects to work on at any stage in their career.

Join our Discord server!


Welcome to the Makeroom installation of Storyden!

This acts as a live demo of Storyden's forum and library software. On this site you'll find a curated collection of web and design resources as well as anything our members share.

Feel free to participate, this may be a demo but it's never wiped. That being said, Storyden is in active development and we encourage you to experiment respectfully as well as report any security issues you find to @Southclaws or by opening an issue.

Have an amazing day!

powered by storyden

Login