Search Ask

Makeroom

RegisterLogin

Discussion

General
Tech

Library

Chevron Right Icon
Design
Resources
Websites
Reve: Reimagine Reality
Chevron Right Icon
Web development
Cool Libraries
Tools
Resources
Papers and Studies
Articles
Language Models
Tech and Systems
Chevron Right Icon
Computers
Chevron Right Icon
Windows Tools and Modding
Windhawk
Raycast for Windows
Rainmeter
Haiku: BeOS-Inspired Open-Source OS
Chevron Right Icon
Random fun stuff
Esoteric File Systems
Cool websites
Chevron Right Icon
Friends
Unity - Cheaterman's Bar
Chevron Right Icon
Storyden
Selfh.st
OpenAlternative
Microlaunch
Peerlist
Glama.ai
AlternativeTo
Brandfetch
Dokploy
PitchHut
Piefed Social
 Collections Links Members Roles

Makeroom

Icon

A small rag-tag assortment of makers, engineers and designers sharing mentoring, support and projects to work on at any stage in their career.

Join our Discord server!


Welcome to the Makeroom installation of Storyden!

This acts as a live demo of Storyden's forum and library software. On this site you'll find a curated collection of web and design resources as well as anything our members share.

Feel free to participate, this may be a demo but it's never wiped. That being said, Storyden is in active development and we encourage you to experiment respectfully as well as report any security issues you find to @Southclaws or by opening an issue.

Have an amazing day!

powered by storyden

Login
Library
papers-and-studies

No versions or drafts yet.

Papers and Studies

A collection of interesting papers, studies, research and academic work shared by Makeroom members.

Not all of this is reviewed or guaranteed to be perfect, but serve as a curated collection of interesting and thought provoking works.

Inverse Occam's razor
Scientists have long preferred the simplest possible explanation of their data. More re-cently, a worrying trend to favor complex interpretations has taken hold because they are perceived as more impactful.
Inverse Occam's razor
Billion-Parameter Theories
We assumed good theories are small. But the minimum viable compression of a complex system might be billions of parameters large.
Billion-Parameter Theories
Experimental Evidence of Massive-Scale Emotional Contagion via Social Networks
https://www.pnas.org/doi/10.1073/pnas.1404212111
EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression
Multimodal Large Language Models have demonstrated exceptional performance in UI2Code tasks, significantly enhancing website development efficiency. However, these tasks incur substantially higher computational overhead than traditional code generation due to the large number of input image tokens and extensive output code tokens required. Our comprehensive study identifies significant redundancies in both image and code tokens that exacerbate computational complexity and hinder focus on key UI elements, resulting in excessively lengthy and often invalid HTML files. We propose EfficientUICoder, a compression framework for efficient UI code generation with three key components. First, Element and Layout-aware Token Compression preserves essential UI information by detecting element regions and constructing UI element trees. Second, Region-aware Token Refinement leverages attention scores to discard low-attention tokens from selected regions while integrating high-attention tokens from unselected regions. Third, Adaptive Duplicate Token Suppression dynamically reduces repetitive generation by tracking HTML/CSS structure frequencies and applying exponential penalties. Extensive experiments show EfficientUICoderachieves a 55%-60% compression ratio without compromising webpage quality and delivers superior efficiency improvements: reducing computational cost by 44.9%, generated tokens by 41.4%, prefill time by 46.6%, and inference time by 48.8% on 34B-level MLLMs. Code is available at https://github.com/WebPAI/EfficientUICoder.
EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression
Figma2Code: Automating Multimodal Design to Code in the Wild
Front-end development constitutes a substantial portion of software engineering, yet converting design mockups into production-ready User Interface (UI) code remains tedious and costly. While recent work has explored automating this process with Multimodal Large Language Models (MLLMs), existing approaches typically rely solely on design images. As a result, they must infer complex UI details from images alone, often leading to degraded results. In real-world development workflows, however, design mockups are usually delivered as Figma files, a widely used tool for front-end design, that embed rich multimodal information (e.g., metadata and assets) essential for generating high-quality UI. To bridge this gap, we introduce Figma2Code, a new task that advances design-to-code into a multimodal setting and aims to automate design-to-code in the wild. Specifically, we collect paired design images and their corresponding metadata files from the Figma community. We then apply a series of processing operations, including rule-based filtering, human- and MLLM-based annotation and screening, and metadata refinement. This process yields 3,055 samples, from which designers curate a balanced dataset of 213 high-quality cases. Using this dataset, we benchmark ten state-of-the-art open-source and proprietary MLLMs. Our results show that while proprietary models achieve superior visual fidelity, they remain limited in layout responsiveness and code maintainability. Further experiments across modalities and ablation studies corroborate this limitation, partly due to models' tendency to directly map primitive visual attributes from Figma metadata.
Figma2Code: Automating Multimodal Design to Code in the Wild
UI-UG: A Unified MLLM for UI Understanding and Generation
Although Multimodal Large Language Models (MLLMs) have been widely applied across domains, they are still facing challenges in domain-specific tasks, such as User Interface (UI) understanding accuracy and UI generation quality. In this paper, we introduce UI-UG (a unified MLLM for UI Understanding and Generation), integrating both capabilities. For understanding tasks, we employ Supervised Fine-tuning (SFT) combined with Group Relative Policy Optimization (GRPO) to enhance fine-grained understanding on the modern complex UI data. For generation tasks, we further use Direct Preference Optimization (DPO) to make our model generate human-preferred UIs. In addition, we propose an industrially effective workflow, including the design of an LLM-friendly domain-specific language (DSL), training strategies, rendering processes, and evaluation metrics. In experiments, our model achieves state-of-the-art (SOTA) performance on understanding tasks, outperforming both larger general-purpose MLLMs and similarly-sized UI-specialized models. Our model is also on par with these larger MLLMs in UI generation performance at a fraction of the computational cost. We also demonstrate that integrating understanding and generation tasks can improve accuracy and quality for both tasks. Code and Model: https://github.com/neovateai/UI-UG
UI-UG: A Unified MLLM for UI Understanding and Generation
UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback
Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing…
UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback
Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering
Design2Code is the first real-world benchmark for automated front-end engineering from visual designs. Researchers manually...
https://aclanthology.org/2025.naacl-long.199.pdf
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Screen user interfaces (UIs) and infographics, sharing similar visual language and design principles, play important roles in human communication and human-machine interaction. We introduce ScreenAI, a vision-language model that specializes in UI and infographics understanding. Our model improves upon the PaLI architecture with the flexible patching strategy of pix2struct and is trained on a unique mixture of datasets. At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements. We use these text annotations to describe screens to Large Language Models and automatically generate question-answering (QA), UI navigation, and summarization training datasets at scale. We run ablation studies to demonstrate the impact of these design choices. At only 5B parameters, ScreenAI achieves new state-of-the-artresults on UI- and infographics-based tasks (Multi-page DocVQA, WebSRC, MoTIF and Widget Captioning), and new best-in-class performance on others (Chart QA, DocVQA, and InfographicVQA) compared to models of similar size. Finally, we release three new datasets: one focused on the screen annotation task and two others focused on question answering.
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
SpecifyUI: Supporting Iterative UI Design Intent Expression through Structured Specifications and Generative AI
Large language models (LLMs) promise to accelerate UI design, yet current tools struggle with two fundamentals: externalizing designers' intent and controlling iterative change. We introduce SPEC, a structured, parameterized, hierarchical intermediate representation that exposes UI elements as controllable parameters. Building on SPEC, we present SpecifyUI, an interactive system that extracts SPEC from UI references via region segmentation and vision-language models, composes UIs across multiple sources, and supports targeted edits at global, regional, and component levels. A multi-agent generator renders SPEC into high-fidelity designs, closing the loop between intent expression and controllable generation. Quantitative experiments show SPEC-based generation more faithfully captures reference intent than prompt-based baselines. In a user study with 16 professional designers, SpecifyUI significantly outperformed Stitch on intent alignment, design quality, controllability, and overall experience in human-AI co-creation. Our results position SPEC as a specification-driven paradigm that shifts LLM-assisted design from one-shot prompting to iterative, collaborative workflows.
SpecifyUI: Supporting Iterative UI Design Intent Expression through Structured Specifications and Generative AI
LLMs are Bayesian, in Expectation, not in Realization
Large language models (LLMs) exhibit a striking ability to learn and adapt to new tasks on the fly—commonly referred to as in...
LLMs are Bayesian, in Expectation, not in Realization
Innovating for the Real World
Innovation has long been a driving force behind societal progress, but in today's interconnected and rapidly changing world, the...
The Diffusion Dilemma
AI Tools Slowed Down Experienced Developers
The integration of advanced AI tools into software development workflows has been heralded as a major productivity boost, yet...
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
The role of large language models in UI/UX design: A systematic literature review
This systematic literature review examines the role of large language models (LLMs) in UI/UX design, synthesizing findings from 38 peer-reviewed studies published between 2022 and 2025. We identify key LLMs in use, including GPT-4, Gemini, and PaLM, and map their integration across the design lifecycle, from ideation to evaluation. Common practices include prompt engineering, human-in-the-loop workflows, and multimodal input. While LLMs are reshaping design processes, challenges such as hallucination, prompt instability, and limited explainability persist. Our findings highlight LLMs as emerging collaborators in design, and we propose directions for the ethical, inclusive, and effective integration of these technologies.
The role of large language models in UI/UX design: A systematic literature review
Towards a Working Definition of Designing Generative User Interfaces
Generative UI is transforming interface design by facilitating AI-driven collaborative workflows between designers and computational systems. This study establishes a working definition of Generative UI through a multi-method qualitative approach, integrating insights from a systematic literature review of 127 publications, expert interviews with 18 participants, and analyses of 12 case studies. Our findings identify five core themes that position Generative UI as an iterative and co-creative process. We highlight emerging design models, including hybrid creation, curation-based workflows, and AI-assisted refinement strategies. Additionally, we examine ethical challenges, evaluation criteria, and interaction models that shape the field. By proposing a conceptual foundation, this study advances both theoretical discourse and practical implementation, guiding future HCI research toward responsible and effective generative UI design practices.
Towards a Working Definition of Designing Generative User Interfaces
Human-AI Co-Creation: A Framework for Collaborative Design in Intelligent Systems
As artificial intelligence (AI) continues to evolve from a back-end computational tool into an interactive, generative collaborator, its integration into early-stage design processes demands a rethinking of traditional workflows in human-centered design. This paper explores the emergent paradigm of human-AI co-creation, where AI is not merely used for automation or efficiency gains, but actively participates in ideation, visual conceptualization, and decision-making. Specifically, we investigate the use of large language models (LLMs) like GPT-4 and multimodal diffusion models such as Stable Diffusion as creative agents that engage designers in iterative cycles of proposal, critique, and revision.
Human-AI Co-Creation: A Framework for Collaborative Design in Intelligent Systems
Generative AI for Product Design: Getting the Right Design and the Design Right
Generative AI (GenAI) models excel in their ability to recognize patterns in existing data and generate new and unexpected content. Recent advances have motivated applications of GenAI tools (e.g., Stable Diffusion, ChatGPT) to professional practice across industries, including product design. While these generative capabilities may seem enticing on the surface, certain barriers limit their practical application for real-world use in industry settings. In this position paper, we articulate and situate these barriers within two phases of the product design process, namely "getting the right design" and "getting the design right," and propose a research agenda to stimulate discussions around opportunities for realizing the full potential of GenAI tools in product design.
Generative AI for Product Design: Getting the Right Design and the Design Right