Given a large repository of geotagged imagery, we seek to automatically find visual elements, e. g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering approach able to take into account the weak geographic supervision. We show that geographically representative image elements can be discovered automatically from Google Street View imagery in a discriminative manner. We demonstrate that these elements are visually interpretable and perceptually geo-informative. The discovered visual elements can also support a variety of computational geography tasks, such as mapping architectural correspondences and influences within and across cities, finding representative elements at different geo-spatial scales, and geographically-informed image retrieval.
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully connected layers we employed a recently developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
We propose a new method to quickly and accurately predict human pose -the 3D positions of body joints-from a single depth image, without depending on information from preceding frames. Our approach is strongly rooted in current object recognition strategies. By designing an intermediate representation in terms of body parts, the difficult pose estimation problem is transformed into a simpler per-pixel classification problem, for which efficient machine learning techniques exist. By using computer graphics to synthesize a very large dataset of training image pairs, one can train a classifier that estimates body part labels from test images invariant to pose, body shape, clothing, and other irrelevances. Finally, we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes. The system runs in under 5ms on the Xbox 360. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state-of-the-art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.
We describe a new vector-based primitive for creating smooth-shaded images, called the diffusion curve. A diffusion curve partitions the space through which it is drawn, defining different colors on either side. These colors may vary smoothly along the curve. In addition, the sharpness of the color transition from one side of the curve to the other can be controlled. Given a set of diffusion curves, the final image is constructed by solving a Poisson equation whose constraints are specified by the set of gradients across all diffusion curves. Like all vector-based primitives, diffusion curves conveniently support a variety of operations, including geometry-based editing, keyframe animation, and ready stylization. Moreover, their representation is compact and inherently resolution-independent. We describe a GPU-based implementation for rendering images defined by a set of diffusion curves in realtime. We then demonstrate an interactive drawing system for allowing artists to create artworks using diffusion curves, either by drawing the curves in a freehand style, or by tracing existing imagery. The system is simple and intuitive: we show results created by artists after just a few minutes of instruction. Furthermore, we describe a completely automatic conversion process for taking an image and turning it into a set of diffusion curves that closely approximate the original image content.
What can you do with a million images? In this paper we present a new image completion algorithm powered by a huge database of photographs gathered from the Web. The algorithm patches up holes in images by finding similar image regions in the database that are not only seamless but also semantically valid. Our chief insight is that while the space of images is effectively infinite, the space of semantically differentiable scenes is actually not that large. For many image completion tasks we are able to find similar scenes which contain image fragments that will convincingly complete the image. Our algorithm is entirely data-driven, requiring no annotations or labelling by the user. Unlike existing image completion methods, our algorithm can generate a diverse set of results for each input image and we allow users to select among them. We demonstrate the superiority of our algorithm over existing image completion approaches.
This paper presents the results of a study in which artists made line drawings intended to convey specific 3D shapes. The study was designed so that drawings Could be registered with rendered images of 3D models. supporting an analysis of how well the locations of the artists' lines correlate with other artists', with current computer graphics line definitions, and with the underlying differential properties of the 3D surface. Lines drawn by artists in this study largely overlapped one another (75% are within 1 mm of another line), particularly along the occluding contours of the object. Most lines that do not overlap contours overlap large gradients of the image intensity, and correlate strongly with predictions made by recent line drawing algorithms in computer graphics. 14% were not well described by any of the local properties considered in this study. The result of our work is a publicly available data set of aligned drawings, an analysis of where lines appear in that data set based on local properties of 3D models, and algorithms to predict where artists will draw lines for new scenes.
We develop a method for reliable simulation of elastica in complex contact scenarios. Our focus is on firmly establishing three parameter-independent guarantees: that simulations of well-posed problems (a) have no interpenetrations, (b) obey causality, momentum- and energy-conservation laws, and (c) complete in finite time. We achieve these guarantees through a novel synthesis of asynchronous variational integrators, kinetic data structures, and a discretization of the contact barrier potential by an infinite sum of nested quadratic potentials. In a series of two- and three-dimensional examples, we illustrate that this method more easily handles challenging problems involving complex contact geometries, sharp features, and sliding during extremely tight contact.