Data Visualisation

Monday, 14 October 2013

Week1 Sketch 1

Tuesday, 2 April 2013

Hypothesising and problem finding

Visualising relationships could afford administrators/ data analysts

- who to target promotion to?

- retrospection of the effects of a particular promotion e.g. if Creative Edinburgh is trying to promote sculpturing in Scotland. They can see how the network changes before and after the promotion. See how affective the promotion is the what the rate of conversion is.

- another advantage for visualisation is that you can "see" the network structure. How does the network look like apart from the statistical measures, connectivity (sparsely connected or are the connections dense?) apart from knowing the network's influencers centrality and their position in the network. In a snapshot the user sees clearly the network shape reduces the user's cognitive load. (eyes beat memory).

Influence Targeting

So where is this all going? It’s hard to tell, but if I had to guess I would say that we’ll end up with an approach that looks a lot like the marketing tactics of the past.
Generally speaking, we target not because we are sure that we have the right people, but we want to exclude those that are least likely to be valuable.

Influencer Marketing, as increasingly practised in a commercial context, comprises four main activities:

Identifying influencers, and ranking them in order of importance.
Marketing to influencers, to increase awareness of the firm within the influencer community
Marketing through influencers, using influencers to increase market awareness of the firm amongst target markets
Marketing with influencers, turning influencers into advocates of the firm.

Showing network properties ??
Degree Centrality: How many links each person of interest had to the rest of the network.
Betweenness Centrality: Their location in the network relative to other members.
Closeness Centrality: The average social distance between a particular member and all of
the other members of the network.

User fitness model?
In complex network theory, the fitness model is a model of the evolution of a network: how the links between nodes change over time depends on the fitness of nodes. Fitter nodes attract more links at the expense of less fit nodes.
It has been used to model the network structure of the World Wide Web.

Monday, 1 April 2013

Computer-based visualization systems provide visual representations of datasets intended to help people carry out some task better (T Muzner).

The questions ensues with this definition is

- What are visual representations?
- What are datasets?
- How does the system help people?
- What sort of tasks people do? (Problem definition)

I'm especially interested and task oriented design and want to learn more about how do we best determine the problem and validate it.

The model of data visualisation design process that I'm currently studying is one by Tamura Muzer.

According to the model proposed there are 4 steps to design process and validations associated with the steps. The model takes a top down approach where the decisions made at the top level affects the lower levels.

The 4 steps are
1. Characterize the tasks and data in the vocabulary of the problem domain.
2. abstract into operations and data types
3. design visual encoding and interaction techniques
4. create algorithms to execute these techniques

What evaluation methodology is appropriate to validate each of these different kinds of design choices is of importance for the visualisation design.

4 categories of validity are proposed (they as in users and you as a designer)-
1. wrong problem: they don't do that
2. wrong abstraction: you are showing them the wrong thing
3. wrong encoding/iteration: the way you show it doesn't work
4. wrong algorithm: your code is too slow.

Be aware of limitations on human perception and cognition (Stevens' power law)
Choose appropriate marks and visual channels for representing different data categories. This requires us to understand properties of the data, properties of the image and rules of mapping data to image.
Be aware of data ink ratio and avoid chart junk
Allow/encourage exploration
Reduce clutter

Example case studies

What makes us happy

http://prcweb.co.uk/lab/what-makes-us-happy/

Here the author allows users to explore data by clicking on different economical/social measures such as GDP. It makes good use of colour and area to denote different countries. Good separability between data items (countries). When you hover above the circles the country name is displayed. The limitations are there is no way of comparing different social measures e.g. compare working hours and work income. The data points are don't move vertically since GDP is fixed but going back and forth between two social measures and trying to work out how two or more countries differ is not very apparent.

One needs to hover on the circles to know what country it represents and there is no labelling that can be switched on and off.

Parallel Sets

http://www.jasondavies.com/parallel-sets/

Parallel sets(Parsets) is a visualisation application for categorical data. At first glance, it's not so easy to make sense of the graph. Further analysis and interaction is needed to fully understand the information. It is a powerful representation especially if you want to compare categorical data. The horizontal bars in the visualization show the absolute frequency of how often each category occurred: in this example, the top line shows the distribution between survived and perished when you can see clearly that there are move perished then survived. Each top category can be move up and down and secondary category e.g. male and female can be moved horizontally on the graph.

We can see at a glance that the relative proportion of surviving women is far greater than that of the men. As for children, it becomes clearer when we drag the Age dimension up: around half the children survived. This is proportionally less than the women but more than the men.

Parallel sets - age dimension at the top


Co-occurrence matrix

This matrix diagram visualizes character co-occurrences in Victor Hugo’s Les Misérables. Each colored cell represents two characters that appeared in the same chapter; darker cells indicate characters that co-occurred more frequently. The matrix can be re-order by using a clustering algorithm.
The effectiveness of a matrix diagram is heavily dependent on the order of rows and columns: if related nodes are placed closed to each other, it is easier to identify clusters and bridges. This type of diagram can be extended with manual reordering of rows and columns, and expanding or collapsing of clusters, to allow deeper exploration. You could also possibly produce a 3D visualisation but needs to be aware of occlusion, interaction complexity, perspective distortion and text legibility. The question of 'when' is most effectively displayed using temporal: animation.

Friday, 15 March 2013

Challenges

There is human in the loop - perception, cognition and culture
More or less control/interaction - more control means more challenging to learn and difficult to find patterns and comprehend structure of network, less means less discovery through exploration, less analytical process.
Too much data - Problem with too much data is that it produce clutter, affect cognition, may cause information loss and limits on cognition. How do we display huge amount of information clearly without distorting or hiding information.
Algorithm - needs to be fast enough
Evaluating - evaluate in terms of efficiency, how easy it is to learn, does it help make decisions
Dare I say, make the invisible visible.

Solutions

Solution 1. Human visual processing involves two levels: automatic (Sensory) and controlled processing (arbitrary) works on visual properties such as position and colour[1][2]. It is highly parallel, but limited in power. Controlled processing works on abstract encodings such as text. It has powerful operations, but is limited in capacity. The distinction between these two types of capacity is important for visual design.

Understand Data
Three main categories

Nominal
Ordered
Quantitative

Understand graphical representations and their properties
1. Marks
(Point, Line, Area, Surface, Volume)

2. Know your visual channel types and ranks

3. Effects on human perception
- Accuracy (Stevens' psychological power law)
- Discriminability
- Separability vs Integrality
- Dangers of depth
- Data/Ink ratio
- categorical colour constraints
- power of the plane
- resolutions beats immersion
- eyes beat memory

4. Visual encoding - which data best suited for which representation
- Effectiveness principle, encode most important attributes with highest ranked channels [ Mackinlay 86]

Understand design process
Beyond encoding need to understand the problem that you are trying to solve(what to show), abstraction (what to represent), encoding and algorithm and needs validation at each step.

Identify problem
   - provide novel capabilities
   - speed up existing workflow

Abstraction - abstract from domain specific to generic
   - operations => sorting, filtering, browsing, comparing, finding trends/outliers, characterising distributions, finding correlations

Encoding

Algorithm

Solution 2. Use SYFD (Systematic yet flexible discovery?)
http://hcil2.cs.umd.edu/trs/2007-26/2007-26.pdf

Solution 3. Use the visual analytic mantra - Analyse, show the important, zoom/filter analyse further, details on demand.

[1] R. M. Shiffrin and W. Schneider, “Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory,” Psychological Review, vol. 84, pp. 127–190, 1977.

[2] Edward Tufte - Visual display of quantitative data