Papers and Studies

Papers and Studies

A collection of interesting papers, studies, research and academic work shared by Makeroom members.

Not all of this is reviewed or guaranteed to be perfect, but serve as a curated collection of interesting and thought provoking works.

Inverse Occam's razor

Scientists have long preferred the simplest possible explanation of their data. More re-cently, a worrying trend to favor complex interpretations has taken hold because they are perceived as more impactful.

Inverse Occam's razor

Billion-Parameter Theories

We assumed good theories are small. But the minimum viable compression of a complex system might be billions of parameters large.

Billion-Parameter Theories

Experimental Evidence of Massive-Scale Emotional Contagion via Social Networks

https://www.pnas.org/doi/10.1073/pnas.1404212111

EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression

Multimodal Large Language Models have demonstrated exceptional performance in UI2Code tasks, significantly enhancing website development efficiency. However, these tasks incur substantially higher computational overhead than traditional code generation due to the large number of input image tokens and extensive output code tokens required. Our comprehensive study identifies significant redundancies in both image and code tokens that exacerbate computational complexity and hinder focus on key UI elements, resulting in excessively lengthy and often invalid HTML files. We propose EfficientUICoder, a compression framework for efficient UI code generation with three key components. First, Element and Layout-aware Token Compression preserves essential UI information by detecting element regions and constructing UI element trees. Second, Region-aware Token Refinement leverages attention scores to discard low-attention tokens from selected regions while integrating high-attention tokens from unselected regions. Third, Adaptive Duplicate Token Suppression dynamically reduces repetitive generation by tracking HTML/CSS structure frequencies and applying exponential penalties. Extensive experiments show EfficientUICoderachieves a 55%-60% compression ratio without compromising webpage quality and delivers superior efficiency improvements: reducing computational cost by 44.9%, generated tokens by 41.4%, prefill time by 46.6%, and inference time by 48.8% on 34B-level MLLMs. Code is available at https://github.com/WebPAI/EfficientUICoder.

EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression

Figma2Code: Automating Multimodal Design to Code in the Wild

Front-end development constitutes a substantial portion of software engineering, yet converting design mockups into production-ready User Interface (UI) code remains tedious and costly. While recent work has explored automating this process with Multimodal Large Language Models (MLLMs), existing approaches typically rely solely on design images. As a result, they must infer complex UI details from images alone, often leading to degraded results. In real-world development workflows, however, design mockups are usually delivered as Figma files, a widely used tool for front-end design, that embed rich multimodal information (e.g., metadata and assets) essential for generating high-quality UI. To bridge this gap, we introduce Figma2Code, a new task that advances design-to-code into a multimodal setting and aims to automate design-to-code in the wild. Specifically, we collect paired design images and their corresponding metadata files from the Figma community. We then apply a series of processing operations, including rule-based filtering, human- and MLLM-based annotation and screening, and metadata refinement. This process yields 3,055 samples, from which designers curate a balanced dataset of 213 high-quality cases. Using this dataset, we benchmark ten state-of-the-art open-source and proprietary MLLMs. Our results show that while proprietary models achieve superior visual fidelity, they remain limited in layout responsiveness and code maintainability. Further experiments across modalities and ablation studies corroborate this limitation, partly due to models' tendency to directly map primitive visual attributes from Figma metadata.

Figma2Code: Automating Multimodal Design to Code in the Wild

UI-UG: A Unified MLLM for UI Understanding and Generation

Although Multimodal Large Language Models (MLLMs) have been widely applied across domains, they are still facing challenges in domain-specific tasks, such as User Interface (UI) understanding accuracy and UI generation quality. In this paper, we introduce UI-UG (a unified MLLM for UI Understanding and Generation), integrating both capabilities. For understanding tasks, we employ Supervised Fine-tuning (SFT) combined with Group Relative Policy Optimization (GRPO) to enhance fine-grained understanding on the modern complex UI data. For generation tasks, we further use Direct Preference Optimization (DPO) to make our model generate human-preferred UIs. In addition, we propose an industrially effective workflow, including the design of an LLM-friendly domain-specific language (DSL), training strategies, rendering processes, and evaluation metrics. In experiments, our model achieves state-of-the-art (SOTA) performance on understanding tasks, outperforming both larger general-purpose MLLMs and similarly-sized UI-specialized models. Our model is also on par with these larger MLLMs in UI generation performance at a fraction of the computational cost. We also demonstrate that integrating understanding and generation tasks can improve accuracy and quality for both tasks. Code and Model: https://github.com/neovateai/UI-UG

UI-UG: A Unified MLLM for UI Understanding and Generation

UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback

Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing…

UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Design2Code is the first real-world benchmark for automated front-end engineering from visual designs. Researchers manually...

https://aclanthology.org/2025.naacl-long.199.pdf

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Screen user interfaces (UIs) and infographics, sharing similar visual language and design principles, play important roles in human communication and human-machine interaction. We introduce ScreenAI, a vision-language model that specializes in UI and infographics understanding. Our model improves upon the PaLI architecture with the flexible patching strategy of pix2struct and is trained on a unique mixture of datasets. At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements. We use these text annotations to describe screens to Large Language Models and automatically generate question-answering (QA), UI navigation, and summarization training datasets at scale. We run ablation studies to demonstrate the impact of these design choices. At only 5B parameters, ScreenAI achieves new state-of-the-artresults on UI- and infographics-based tasks (Multi-page DocVQA, WebSRC, MoTIF and Widget Captioning), and new best-in-class performance on others (Chart QA, DocVQA, and InfographicVQA) compared to models of similar size. Finally, we release three new datasets: one focused on the screen annotation task and two others focused on question answering.

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

SpecifyUI: Supporting Iterative UI Design Intent Expression through Structured Specifications and Generative AI

Large language models (LLMs) promise to accelerate UI design, yet current tools struggle with two fundamentals: externalizing designers' intent and controlling iterative change. We introduce SPEC, a structured, parameterized, hierarchical intermediate representation that exposes UI elements as controllable parameters. Building on SPEC, we present SpecifyUI, an interactive system that extracts SPEC from UI references via region segmentation and vision-language models, composes UIs across multiple sources, and supports targeted edits at global, regional, and component levels. A multi-agent generator renders SPEC into high-fidelity designs, closing the loop between intent expression and controllable generation. Quantitative experiments show SPEC-based generation more faithfully captures reference intent than prompt-based baselines. In a user study with 16 professional designers, SpecifyUI significantly outperformed Stitch on intent alignment, design quality, controllability, and overall experience in human-AI co-creation. Our results position SPEC as a specification-driven paradigm that shifts LLM-assisted design from one-shot prompting to iterative, collaborative workflows.

SpecifyUI: Supporting Iterative UI Design Intent Expression through Structured Specifications and Generative AI

LLMs are Bayesian, in Expectation, not in Realization

Large language models (LLMs) exhibit a striking ability to learn and adapt to new tasks on the fly—commonly referred to as in...

LLMs are Bayesian, in Expectation, not in Realization

Innovating for the Real World

Innovation has long been a driving force behind societal progress, but in today's interconnected and rapidly changing world, the...

The Diffusion Dilemma

AI Tools Slowed Down Experienced Developers

The integration of advanced AI tools into software development workflows has been heralded as a major productivity boost, yet...

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design

Text-to-image models now generate graphic design at production scale, yet their supervision still comes primarily from photo-style preference datasets with a single overall verdict per comparison. Designers evaluate designs along several distinct axes (e.g., typography, layout, color harmony) that a single preference label collapses. We release \emph{TASTE} \textit{(Typography, Aesthetics, Spatial, Tone, Etc.)}, a multi-dimensional preference dataset in which two disjoint cohorts of five professional designers each ranked outputs from four current text-to-image models across nine criteria along with per-image hallucination flags. We pair the dataset with two contributions. First, a criterion-agnostic signal-validation framework based on Kendall's $τ$, majority-vote probability, and Condorcet cycles against exact iid-uniform nulls; the analysis reveals significant but moderate designer agreement, with every TASTE criterion rejecting the random-rater null. Second, we benchmark preference models on TASTE and find that off-the-shelf VLM judges and dedicated T2I scorers fail to reach majority agreement with the designer panel, while a small MLP head trained directly on TASTE substantially narrows the gap to the single-rater ceiling, setting a baseline for future TASTE-trained preference models.

TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design

S-Path-RAG: Semantic-Aware Shortest-Path RAG for Knowledge Graphs

A framework for multi-hop question answering over large knowledge graphs using semantic-aware path retrieval combined with...

S-Path-RAG: Semantic-Aware Shortest-Path Retrieval Augmented Generation for Multi-Hop Knowledge Graph Question Answering

Is the iPhone Birth Control?

An NBER working paper examining the causal relationship between smartphone adoption and birth rates using AT&T's 2007-2011 carrier monopoly as a natural experiment.

Is the iPhone Birth Control? Causal Evidence from AT&T’s 2007–2011 Carrier Monopoly

The role of large language models in UI/UX design: A systematic literature review

This systematic literature review examines the role of large language models (LLMs) in UI/UX design, synthesizing findings from 38 peer-reviewed studies published between 2022 and 2025. We identify key LLMs in use, including GPT-4, Gemini, and PaLM, and map their integration across the design lifecycle, from ideation to evaluation. Common practices include prompt engineering, human-in-the-loop workflows, and multimodal input. While LLMs are reshaping design processes, challenges such as hallucination, prompt instability, and limited explainability persist. Our findings highlight LLMs as emerging collaborators in design, and we propose directions for the ethical, inclusive, and effective integration of these technologies.

The role of large language models in UI/UX design: A systematic literature review

Towards a Working Definition of Designing Generative User Interfaces

Generative UI is transforming interface design by facilitating AI-driven collaborative workflows between designers and computational systems. This study establishes a working definition of Generative UI through a multi-method qualitative approach, integrating insights from a systematic literature review of 127 publications, expert interviews with 18 participants, and analyses of 12 case studies. Our findings identify five core themes that position Generative UI as an iterative and co-creative process. We highlight emerging design models, including hybrid creation, curation-based workflows, and AI-assisted refinement strategies. Additionally, we examine ethical challenges, evaluation criteria, and interaction models that shape the field. By proposing a conceptual foundation, this study advances both theoretical discourse and practical implementation, guiding future HCI research toward responsible and effective generative UI design practices.

Towards a Working Definition of Designing Generative User Interfaces

Human-AI Co-Creation: A Framework for Collaborative Design in Intelligent Systems

As artificial intelligence (AI) continues to evolve from a back-end computational tool into an interactive, generative collaborator, its integration into early-stage design processes demands a rethinking of traditional workflows in human-centered design. This paper explores the emergent paradigm of human-AI co-creation, where AI is not merely used for automation or efficiency gains, but actively participates in ideation, visual conceptualization, and decision-making. Specifically, we investigate the use of large language models (LLMs) like GPT-4 and multimodal diffusion models such as Stable Diffusion as creative agents that engage designers in iterative cycles of proposal, critique, and revision.

Human-AI Co-Creation: A Framework for Collaborative Design in Intelligent Systems

Generative AI for Product Design: Getting the Right Design and the Design Right

Generative AI (GenAI) models excel in their ability to recognize patterns in existing data and generate new and unexpected content. Recent advances have motivated applications of GenAI tools (e.g., Stable Diffusion, ChatGPT) to professional practice across industries, including product design. While these generative capabilities may seem enticing on the surface, certain barriers limit their practical application for real-world use in industry settings. In this position paper, we articulate and situate these barriers within two phases of the product design process, namely "getting the right design" and "getting the design right," and propose a research agenda to stimulate discussions around opportunities for realizing the full potential of GenAI tools in product design.

Generative AI for Product Design: Getting the Right Design and the Design Right

Papers and Studies

A collection of interesting papers, studies, research and academic work shared by Makeroom members.

Not all of this is reviewed or guaranteed to be perfect, but serve as a curated collection of interesting and thought provoking works.

Inverse Occam's razor

Billion-Parameter Theories

We assumed good theories are small. But the minimum viable compression of a complex system might be billions of parameters large.

Billion-Parameter Theories

Experimental Evidence of Massive-Scale Emotional Contagion via Social Networks

https://www.pnas.org/doi/10.1073/pnas.1404212111

EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression

Figma2Code: Automating Multimodal Design to Code in the Wild

UI-UG: A Unified MLLM for UI Understanding and Generation

UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback

Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing…

UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Design2Code is the first real-world benchmark for automated front-end engineering from visual designs. Researchers manually...

https://aclanthology.org/2025.naacl-long.199.pdf

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

SpecifyUI: Supporting Iterative UI Design Intent Expression through Structured Specifications and Generative AI

LLMs are Bayesian, in Expectation, not in Realization

Large language models (LLMs) exhibit a striking ability to learn and adapt to new tasks on the fly—commonly referred to as in...

LLMs are Bayesian, in Expectation, not in Realization

Innovating for the Real World

Innovation has long been a driving force behind societal progress, but in today's interconnected and rapidly changing world, the...

The Diffusion Dilemma

AI Tools Slowed Down Experienced Developers

The integration of advanced AI tools into software development workflows has been heralded as a major productivity boost, yet...

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design

S-Path-RAG: Semantic-Aware Shortest-Path RAG for Knowledge Graphs

A framework for multi-hop question answering over large knowledge graphs using semantic-aware path retrieval combined with...

S-Path-RAG: Semantic-Aware Shortest-Path Retrieval Augmented Generation for Multi-Hop Knowledge Graph Question Answering

Is the iPhone Birth Control?

An NBER working paper examining the causal relationship between smartphone adoption and birth rates using AT&T's 2007-2011 carrier monopoly as a natural experiment.

Is the iPhone Birth Control? Causal Evidence from AT&T’s 2007–2011 Carrier Monopoly

The role of large language models in UI/UX design: A systematic literature review

Towards a Working Definition of Designing Generative User Interfaces

Human-AI Co-Creation: A Framework for Collaborative Design in Intelligent Systems

Generative AI for Product Design: Getting the Right Design and the Design Right