Graphic Design Layout


Graphics layout concerns about the arrangement of graphic design elements to form a structure that can achieve some objectives. While the layout may refer to a 2D page, a 2D image, a 2D/3D object, or a 3D scene, the objectives of the layout may include visual aesthetics and functionality. The arrangement of graphic design elements is typically via the adjustment of their properties, such as location, orientation, color, size and shape. In this project, we aim at developing automatic/interactive tools for synthesizing novel layouts. In particular, we are interested in learning the design rules through the data-driven approach, using machine learning or deep learning techniques. We can then use the learned layout features to synthesize novel layouts. Concurrent to this project, we are also investigating how visual saliency may affect graphic design layout. (See also our work on image saliency.)


Language-based Photo Color Adjustment for Graphic Designs [paper] [suppl] [video] [code] [dataset]

Zhenwei Wang, Nanuxan Zhan, Gerhard Hancke, and Rynson Lau

ACM Trans. on Graphics (presented at SIGGRAPH 2023), 42(4), 2023

Language-based photo recoloring of a graphic design. Given a graphic design containing an inserted photo (a), our model recolors the photo automatically according to the given language-based instruction. To facilitate the expression of various user intentions, our model supports multi-granularity instructions for describing the source color(s) (selected from the design elements) and the region(s) to be modified (selected from the photo). For example, the user may provide a coarse-grained instruction ( background in (b)) to refer to multiple source colors or a fine-grained instruction ( yellow shape in (c), i.e., using the yellow shape at the top left) to specify a source color. For visualization, we highlight the source colors predicted by our model at the bottom (colors inside the circles).

Abstract. Adjusting the photo color to associate with some design elements is an essential way for a graphic design to effectively deliver its message and make it aesthetically pleasing. However, existing tools and previous works face a dilemma between the ease of use and level of expressiveness. To this end, we introduce an interactive language-based approach for photo recoloring, which provides an intuitive system that can assist both experts and novices on graphic design. Given a graphic design containing a photo that needs to be recolored, our model can predict the source colors and the target regions, and then recolor the target regions with the source colors based on the given language-based instruction. The multi-granularity of the instruction allows diverse user intentions. The proposed novel task faces several unique challenges, including: 1) color accuracy for recoloring with exactly the same color from the target design element as specified by the user; 2) multi-granularity instructions for parsing instructions correctly to generate a specific result or multiple plausible ones; and 3) locality for recoloring in semantically meaningful local regions to preserve original image semantics. To address these challenges, we propose a model called LangRecol with two main components: the language-based source color prediction module and the semantic-palette-based photo recoloring module. We also introduce an approach for generating a synthetic graphic design dataset with instructions to enable model training. We evaluate our model via extensive experiments and user studies. We also discuss several practical applications, showing the effectiveness and practicality of our approach.

Design Order Guided Visual Note Optimization [paper]

Xiaotian Qiao, Ying Cao, and Rynson Lau

IEEE Trans. on Visualization and Computer Graphics, 29(9):3922-3936, 2023

Layout optimization results. For each example, given an input visual note (left), we predict its grid-based global design order (upper middle) and element-wise design order (lower middle). Based on the predicated design order, the layout of the visual note is automatically optimized such that it is easier for readers to follow along (right).

Abstract. With the goal of making contents easy to understand, memorize and share, a clear and easy-to-follow layout is important for visual notes. Unfortunately, since visual notes are often taken by the designers in real time while watching a video or listening to a presentation, the contents are usually not carefully structured, resulting in layouts that may be difficult for others to follow. In this paper, we address this problem by proposing a novel approach to automatically optimize the layouts of visual notes. Our approach predicts the design order of a visual note and then warps the contents along the predicted design order such that the visual note can be easier to follow and understand. At the core of our approach is a learning-based framework to reason about the element-wise design orders of visual notes. In particular, we first propose a hierarchical LSTM-based architecture to predict a grid-based design order of the visual note, based on the graphical and textual information. We then derive the element-wise order from the grid-based prediction. Such an idea allows our network to be weakly-supervised, i.e., making it possible to predict dense grid-based orders from visual notes with only coarse annotations. We evaluate the effectiveness of our approach on visual notes with diverse content densities and layouts. The results show that our network can predict plausible design orders for various types of visual notes and our approach can effectively optimize their layouts in order for them to be easier to follow.

Selective Region-based Photo Color Adjustment for Graphic Designs [paper] [suppl] [video] [code] [dataset]

Nanxuan Zhao, Quanlong Zheng, Jing Liao, Ying Cao, Hanspeter Pfister, and Rynson Lau

ACM Trans. on Graphics (presented at SIGGRAPH 2021), 40(2), 2021

Photo color adjustment results in the context of graphic designs. When inserting a photo into a graphic design (see the input designs on the left), our model can automatically predict modifiable regions and recolor these regions with the target colors (see the color bars at the bottom of the input designs) to form the output design (see our designs on the right). We show results generated by our model with a single target color in the first row and with multiple target colors in the second row. We can see that our method can suggest appropriate regions for recoloring to the target colors, such that the resulting images still look natural with the original object semantics preserved and the resulting designs look visually more harmonious. Our model is also able to provide multiple suggestions for the user to choose from (see the right-most example in the bottom row).

Abstract. When adding a photo onto a graphic design, professional graphic designers often adjust its colors based on some target colors obtained from the brand or product to make the entire design more memorable to audiences and establish a consistent brand identity. However, adjusting the colors of a photo in the context of a graphic design is a difficult task, with two major challenges: (1) Locality: the color is often adjusted locally to preserve the semantics and atmosphere of the original image; (2) Naturalness: the modified region needs to be carefully chosen and recolored to obtain a semantically valid and visually natural result. To address these challenges, we propose a learning-based approach to photo color adjustment for graphic designs, which maps an input photo along with the target colors to a recolored result. Our method decomposes the color adjustment process into two successive stages: modifiable region selection and target color propagation. The first stage aims to solve the core, challenging problem of which local image region(s) should be adjusted, which requires not only a common sense of colors appearing in our visual world but also understanding of subtle visual design heuristics. To this end, we capitalize on both natural photos and graphic designs to train a region selection network, which detects the most likely regions to be adjusted to the target colors. The second stage trains a recoloring network to naturally propagate the target colors in the detected regions. Through extensive experiments and a user study, we demonstrate the effectiveness of our selective region-based photo recoloring framework.

ICONATE: Automatic Compound Icon Generation and Ideation [paper] [suppl] [code] [IconVoc152]

Nanxuan Zhao, Nam Wook Kim, Laura Mariah Herman, Hanspeter Pfister, Rynson Lau, Jose Echevarria, and Zoya Bylinskii

Proc. ACM SIGCHI, April 2020

Abstract. Compound icons are prevalent on signs, webpages, and infographics, effectively conveying complex and abstract concepts, such as "no smoking" and "health insurance", with simple graphical representations. However, designing such icons requires experience and creativity, in order to efficiently navigate the semantics, space, and style features of icons. In this paper, we aim to automate the process of generating icons given compound concepts, to facilitate rapid compound icon creation and ideation. Informed by ethnographic interviews with professional icon designers, we have developed ICONATE, a novel system that automatically generates compound icons based on textual queries and allows users to explore and customize the generated icons. At the core of ICONATE is a computational pipeline that automatically finds commonly used icons for sub-concepts and arranges them according to inferred conventions. To enable the pipeline, we collected a new dataset, Compicon1k, consisting of 1000 compound icons annotated with semantic labels (i.e., concepts). Through user studies, we have demonstrated that our tool is able to automate or accelerate the compound icon design process for both novices and professionals.

Content-aware Generative Modeling of Graphic Design Layouts [paper] [suppl] [code] [dataset]

Xinru Zheng*, Xiaotian Qiao*, Ying Cao, and Rynson Lau (* joint first authors)

ACM Trans. on Graphics (Proc. ACM SIGGRAPH 2019), 38(4), July 2019

Abstract. Layout is fundamental to graphic designs. For visual attractiveness and efficient communication of messages and ideas, graphic design layouts often have great variation, driven by the contents to be presented. In this paper, we study the problem of content-aware graphic design layout generation. We propose a deep generative model for graphic design layouts that is able to synthesize layout designs based on the visual and textual semantics of user inputs. Unlike previous approaches that are oblivious to the input contents and rely on heuristic criteria, our model captures the effect of visual and textual contents on layouts, and implicitly learns complex layout structure variations from data without the use of any heuristic rules. To train our model, we build a large-scale magazine layout dataset with fine-grained layout annotations and keyword labeling. Experimental results show that our model can synthesize high-quality layouts based on the visual semantics of input images and keyword-based summary of input text. We also demonstrate that our model internally learns powerful features that capture the subtle interaction between contents and layouts, which are useful for layout-aware design retrieval.

ButtonTips: Design Web Buttons with Suggestions (Oral) [paper]

Dawei Liu, Ying Cao, Rynson Lau, and Antoni Chan

Proc. IEEE ICME, July 2019

Fig. 2. Overview of our method. Given a web design being edited without buttons (a), the button presence prediction step proposes a set of candidate regions (orange boxes) that roughly contains buttons, and the button layout prediction step will then provide a button layout (blue box) for each candidate region. After the user selects a button layout, the color selection step will automatically select suitable color for it (c). Best viewed in color.

Abstract. Buttons are fundamental in web design. An effective button is important for higher click-through and conversion rates. However, designing effective buttons can be challenging for novices. This paper presents a novel interactive method to aid the button design process by making design suggestions. Our method proceeds in three steps: 1) button presence prediction, 2) button layout suggestion and 3) button color selection. We investigate two distinct but complementary interfaces for button design suggestion: 1) region selection interface, where the button will appear in a user-specific region; 2) element selection interface, where the button will be associated with a user-selected element. We compare our method with an existing website building tool, and show that for novice designers, both interfaces require significantly less manual efforts, and produce significantly better button design, as evaluated by professional web designers.

Tell Me Where I Am: Object-level Scene Context Prediction (Oral) [paper] [suppl] [code]

Xiaotian Qiao, Quanlong Zheng, Ying Cao, and Rynson Lau

Proc. IEEE CVPR, June 2019

Given a partial scene layout or a sketch as input, our method is able to generate a complete scene layout and further synthesize a realistic full scene image.

Abstract. Contextual information has been shown to be effective in helping solve various image understanding tasks. Previous works have focused on the extraction of contextual information from an image and use it to infer the properties of some object(s) in the image. In this paper, we consider an inverse problem of how to hallucinate missing contextual information from the properties of a few standalone objects. We refer to it as scene context prediction. This problem is difficult as it requires an extensive knowledge of complex and diverse relationships among different objects in natural scenes. We propose a convolutional neural network, which takes as input the properties (i.e., category, shape, and position) of a few standalone objects to predict an object-level scene layout that compactly encodes the semantics and structure of the scene context where the given objects are. Our quantitative experiments and user studies show that our model can generate more plausible scene context than the baseline approach. We demonstrate that our model allows for the synthesis of realistic scene images from just partial scene layouts and internally learns useful features for scene recognition.

Modeling Fonts in Context: Font Prediction on Web Design [paper] [suppl] [CTXFont-dataset]

Nanxuan Zhao, Ying Cao, and Rynson Lau

Computer Graphics Forum (Proc. Pacific Graphics 2018), Oct. 2018

Abstract. Web designers often carefully select fonts to fit the context of a web design to make the design look aesthetically pleasing and effective in communication. However, selecting proper fonts for a web design is a tedious and time-consuming task, as each font has many properties, such as font face, color, and size, resulting in a very large search space. In this paper, we aim to model fonts in context, by studying a novel and challenging problem of predicting fonts that match a given web design. To this end, we propose a novel, multi-task deep neural network to jointly predict font face, color and size for each text element on a web design, by considering multi-scale visual features and semantic tags of the web design. To train our model, we have collected a CTXFont dataset, which consists of 1k professional web designs, with labeled font properties. Experiments show that our model outperforms the baseline methods, achieving promising qualitative and quantitative results on the font selection task. We also demonstrate the usefulness of our method in a font selection task via a user study.

Task-driven Webpage Saliency [paper] [suppl]

Quanlong Zheng, Jianbo Jiao, Ying Cao, and Rynson Lau

Proc. ECCV, Sept. 2018

Given an input webpage (a), our model can predict a different saliency map under a different task, e.g., information browsing (b), form filling (c) and shopping (d).

Abstract. In this paper, we present an end-to-end learning framework for predicting task-driven visual saliency on webpages. Given a webpage, we propose a convolutional neural network to predict where people look at it under different task conditions. Inspired by the observation that given a specific task, human attention is strongly correlated with certain semantic components on a webpage (e.g., images, buttons and input boxes), our network explicitly disentangles saliency prediction into two independent sub-tasks: task-specific attention shift prediction and task-free saliency prediction. The task-specific branch estimates task-driven attention shift over a webpage from its semantic components, while the task-free branch infers visual saliency induced by visual features of the webpage. The outputs of the two branches are combined to produce the final prediction. Such a task decomposition framework allows us to efficiently learn our model from a small-scale task-driven saliency dataset with sparse labels (captured under a single task condition). Experimental results show that our method outperforms the baselines and prior works, achieving state-of-the-art performance on a newly collected benchmark dataset for task-driven webpage saliency detection.

What Characterizes Personalities of Graphic Designs? [paper] [suppl] [video] [code] [dataset]

Nanxuan Zhao, Ying Cao, and Rynson Lau

ACM Trans. on Graphics (Proc. ACM SIGGRAPH 2018), 37(4), Aug. 2018

Abstract: Graphic designers often manipulate the overall look and feel of their designs to convey certain personalities (e.g., cute, mysterious and romantic) to impress potential audiences and achieve business goals. However, understanding the factors that determine the personality of a design is challenging, as a graphic design is often a result of thousands of decisions on numerous factors, such as font, color, image, and layout. In this paper, we aim to answer the question of what characterizes the personality of a graphic design. To this end, we propose a deep learning framework for exploring the effects of various design factors on the perceived personalities of graphic designs. Our framework learns a convolutional neural network (called personality scoring network) to estimate the personality scores of graphic designs by ranking the crawled web data. Our personality scoring network automatically learns a visual representation that captures the semantics necessary to predict graphic design personality. With our personality scoring network, we systematically and quantitatively investigate how various design factors (e.g., color, font, and layout) affect design personality across different scales (from pixels, regions to elements). We also demonstrate a number of practical application scenarios of our network, including element-level design suggestion and example-based personality transfer.

Directing User Attention via Visual Flow on Web Designs [paper] [suppl] [video] [models] [dataset]

Xufang Pang*, Ying Cao*, Rynson Lau, and Antoni Chan (* joint first authors)

ACM Trans. on Graphics (Proc. ACM SIGGRAPH Asia 2016), 35(6), Article 240, Dec. 2016

US Patent 11,275,596 B2 (Publication Date: Mar 15, 2022)

Abstract: We present a novel approach that allows web designers to easily direct user attention via visual flow on web designs. By collecting and analyzing users' eye gaze data on real-world webpages under the task-driven condition, we build two user attention models that characterize user attention patterns between a pair of page components. These models enable a novel web design interaction for designers to easily create a visual flow to guide users' eyes (i.e., direct user attention along a given path) through a web design with minimal effort. In particular, given an existing web design as well as a designer-specified path over a subset of page components, our approach automatically optimizes the web design so that the resulting design can direct users' attention to move along the input path. We have tested our approach on various web designs of different categories. Results show that our approach can effectively guide user attention through the web design according to the designer's high-level specification.

Look Over Here: Attention-Directing Composition of Manga Elements [paper] [suppl] [video]

Ying Cao, Rynson Lau, and Antoni Chan

ACM Trans. on Graphics (Proc. ACM SIGGRAPH 2014), 33(4), Article 94, Aug. 2014

Abstract: Picture subjects and text balloons are basic elements in comics, working together to propel the story forward. Japanese comics artists often leverage a carefully designed composition of subjects and balloons (generally referred to as panel elements) to provide a continuous and fluid reading experience. However, such a composition is hard to produce for people without the required experience and knowledge. In this paper, we propose an approach for novices to synthesize a composition of panel elements that can effectively guide the reader's attention to convey the story. Our primary contribution is a probabilistic graphical model that describes the relationships among the artist's guiding path, the panel elements, and the viewer attention, which can be effectively learned from a small set of existing manga pages. We show that the proposed approach can measurably improve the readability, visual appeal, and communication of the story of the resulting pages, as compared to an existing method. We also demonstrate that the proposed approach enables novice users to create higher-quality compositions with less time, compared with commercially available programs.

Structured Mechanical Collage [paper] [video] [more results]

Zhe Huang, Jiang Wang, Hongbo Fu, and Rynson Lau

IEEE Trans. on Visualization and Computer Graphics, 20(7):1076-1082, July 2014

Abstract: We present a method to build 3D structured mechanical collages consisting of numerous elements from the database given artist-designed proxy models. The construction is guided by some graphic design principles, namely unity, variety and contrast. Our results are visually more pleasing than previous works as confirmed by a user study.

Automatic Stylistic Manga Layout [paper] [video] [more results]

Ying Cao, Antoni Chan, and Rynson Lau

ACM Trans. on Graphics (Proc. ACM SIGGRAPH Asia 2012), 31(6), Article 141, Nov. 2012

Abstract: Manga layout is a core component in manga production, characterized by its unique styles. However, stylistic manga layouts are difficult for novices to produce as it requires hands-on experience and domain knowledge. In this paper, we propose an approach to automatically generate a stylistic manga layout from a set of input artworks with user-specified semantics, thus allowing less-experienced users to create high-quality manga layouts with minimal efforts. We first introduce three parametric style models that encode the unique stylistic aspects of manga layouts, including layout structure, panel importance, and panel shape. Next, we propose a two-stage approach to generate a manga layout: 1) an initial layout is created that best fits the input artworks and layout structure model, according to a generative probabilistic framework; 2) the layout and artwork geometries are jointly refined using an efficient optimization procedure, resulting in a professional-looking manga layout. Through a user study, we demonstrate that our approach enables novice users to easily and quickly produce higher-quality layouts that exhibit realistic manga styles, when compared to a commercially-available manual layout tool.

Last updated in November 2023