Meaningful, row-wise summary function for lists and data frames

summary_colorDF(
  object,
  numformat = "quantiles",
  digits = 3,
  width = getOption("width")
)

# S3 method for colorDF
summary(object, ...)

Arguments

object

a data frame (possibly a color data frame)

numformat

format of the summary for numerical values. Can be one of "quantiles", "mean" and "graphics"

digits

number of significant digits to show (default: 3)

width

width of the summary table in characters

...

passed to summary_colorDF

Value

A colorful data frame of class colorDF containing useful information on a dataframe-like object.

Details

While this function is a summary method for objects of the colorDF class, it can also be applied to any other data frame-like object.

The summary table has five columns and as many rows as there are columns in the summarized data frame (or elements in a list). First four columns contain, respectively, column name, column class (abbreviated as in tibbles), number of unique values and number of missing values (NA's). The contents of the fifth column depends on the column class and column type as follows:

  • first, any lists are unlisted

  • numeric columns (including integers) are summarized (see below)

  • for character vectors and factors, if all values are unique or missing (NA) then this is stated explicitely

  • otherwise, for character vectors and factors, the values will be listed, starting with the most frequent. The list will be shortened to fit the screen.

For numeric columns, by default the quantiles 0 (minimum), .25, .50 (median), .75 and 1 (maximum) are shown. Following alternatives can be specified using the option numformat:

  • "mean": mean +- standard deviation

  • "graphics": a graphical summary. Note that all numerical columns will be scaled with the same parameter, so this option makes sense only if the numerical columns are comparable. The graphics summary looks like this: ---| + |---- and corresponds to a regular box plot, indicating the extremes and the three quartiles (- ... - indicates the data range, |...| the interquartile range and '+' stands for the median).

summary_colorDF is the exported version of this function to facilitate usage in cases when converting an object to a colorDF is not desirable.

Examples

summary(colorDF(iris))
#> # Color data frame (class colorDF) 5 x 5:
#>  │Col         │Class│NAs  │unique│Summary                                  
#><chr>       <chr><int><int> <chr>                                    
#> 1Sepal.Length<dbl>│    0│    35│4.3 [5.1 <5.8> 6.4] 7.9                  
#> 2Sepal.Width <dbl>│    0│    23│2.0 [2.8 <3.0> 3.3] 4.4                  
#> 3Petal.Length<dbl>│    0│    43│1.00 [1.60 <4.35> 5.10] 6.90             
#> 4Petal.Width <dbl>│    0│    22│0.1 [0.3 <1.3> 1.8] 2.5                  
#> 5Species     <fct>│    0│     3│setosa: 50, versicolor: 50, virginica: 50
summary_colorDF(iris)
#> # Color data frame (class colorDF) 5 x 5:
#>  │Col         │Class│NAs  │unique│Summary                                  
#><chr>       <chr><int><int> <chr>                                    
#> 1Sepal.Length<dbl>│    0│    35│4.3 [5.1 <5.8> 6.4] 7.9                  
#> 2Sepal.Width <dbl>│    0│    23│2.0 [2.8 <3.0> 3.3] 4.4                  
#> 3Petal.Length<dbl>│    0│    43│1.00 [1.60 <4.35> 5.10] 6.90             
#> 4Petal.Width <dbl>│    0│    22│0.1 [0.3 <1.3> 1.8] 2.5                  
#> 5Species     <fct>│    0│     3│setosa: 50, versicolor: 50, virginica: 50
summary_colorDF(iris, numformat="g")
#> # Color data frame (class colorDF) 5 x 5:
#>  │Col         │Class│NAs  │unique│Summary                                   
#><chr>       <chr><int><int> <chr>                                     
#> 1Sepal.Length<dbl>│    0│    35│                      ╾───┤   +  ├───────╼
#> 2Sepal.Width <dbl>│    0│    23│          ╾───┤+ ├─────╼                  
#> 3Petal.Length<dbl>│    0│    43│     ╾──┤             +   ├─────────╼     
#> 4Petal.Width <dbl>│    0│    22│╾┤    +  ├───╼                            
#> 5Species     <fct>│    0│     3│setosa: 50, versicolor: 50, virginica: 50 
if(require(dplyr) && require(tidyr)) {
  starwars %>% summary_colorDF

  ## A summary of iris data by species
  iris %>% 
    mutate(row=rep(1:50, 3)) %>% 
    gather(key="parameter", value="Size", 1:4)  %>%
    mutate(pa.sp=paste(parameter, Species, sep=".")) %>% 
    select(row, pa.sp, Size) %>% 
    spread(key=pa.sp, value=Size) %>% 
    select(-row) %>%
    summary_colorDF(numformat="g")
}
#> Loading required package: tidyr
#> # Color data frame (class colorDF) 5 x 12:
#>   │Col                    │Class│NAs  │unique│Summary                        
#><chr>                  <chr><int><int> <chr>                          
#>  1Petal.Length.setosa    <dbl>│    0│     9│   ╾─+├╼                       
#>  2Petal.Length.versicolor<dbl>│    0│    19│           ╾───┤+├─╼           
#>  3Petal.Length.virginica <dbl>│    0│    20│                 ╾─┤ +├───╼    
#>  4Petal.Width.setosa     <dbl>│    0│     6│+├╼                            
#>  5Petal.Width.versicolor <dbl>│    0│     9│   ╾┤+─╼                       
#>  6Petal.Width.virginica  <dbl>│    0│    12│     ╾─+├╼                     
#>  7Sepal.Length.setosa    <dbl>│    0│    15│                ╾─┤+├─╼        
#>  8Sepal.Length.versicolor<dbl>│    0│    21│                  ╾──┤+ ├──╼   
#>  9Sepal.Length.virginica <dbl>│    0│    21│                  ╾─────┤+├───╼
#> 10Sepal.Width.setosa     <dbl>│    0│    16│        ╾───┤+├──╼             
#> 11Sepal.Width.versicolor <dbl>│    0│    14│       ╾─┤+├─╼                 
#> 12Sepal.Width.virginica  <dbl>│    0│    13│        ╾─┤+├─╼