5 Useful R Commands for Working with Big Data

BY IN Code, R, Tutorials NO COMMENTS YET ,

5 useful R commands for working with big data

1. View a large object (big RData file called object for example) in a manner similar to less in command line.

page(object, method="print")

2. Listing top n objects in memory.

rev(sort(sapply(ls(envir=globalenv()), function(x) { object.size(get(x,env=globalenv())) })))[1:n]

3. Remove all objects except for those contained in a set list. Useful for removing objects that are taking up too much memory for example.

rm(list=setdiff(ls(), keep))


> ls()
[1] "bar"  "foo"  "rm1"  "rm2"  "rm3" 
> keep <- c("foo", "bar")
> setdiff(ls(), keep)
[1] "keep" "rm1"  "rm2"  "rm3" 
> rm(list=setdiff(ls(), keep))
> ls()
[1] "bar" "foo"

4. Use sqldf. The sqldf package allows you to store and query your data.frames (a data.frame called df for example) using a real database, which reduces memory footprint as well as speed up some operations. See their documentation for more info.

sqldf("select * from df")

5. Save your current workspace! Very useful for resuming analysis elsewhere or sharing workspaces with other people; just point to a shared directory and have them resume with the same objects you were using.


So, what do you think ?