Reducing Web Query Download Size
Internet Explorer 6.0 cannot download files larger than 2 Gigabytes, but newer versions of IE and other browsers will be able to. You can still reduce the size of the file by extracting a set of all companies or stocks by breaking the data into less than 10-year blocks. In addition, the CSV (comma-separated) output format in compressed form (.zip or .gz) typically creates the smallest files.
The other option is to connect to WRDS from UNIX. We have various example programs that will help, and you can even use SAS connect, where you treat WRDS as a remote server and make use of PC SAS.
Still, making large dataset downloads defeats part of the purpose of storing large sets of commonly used data on WRDS. Large extracts more than double the storage needs (on WRDS and your location, plus temporary data storage space) as well as tying up ftp transfer lines.
We suggest instead that you use the data as it sits on WRDS and process what you need as you need it using SAS programs. For example, PC SAS Connect allows you to share data processing tasks between our UNIX server and your PC and download only the results.
If you still want to extract and create large data files, here are some general guides:
- An output from a web query will not be saved and accessible on the WRDS system if it exceeds 2gig, and in some web query cases (such as TAQ and Thomson Reuters) the 'select all' option is not offered.
- You could pull out a large portion or even all of the data by writing and running a simple SAS exporting program, but note that you may have problems finding a place to store large sets. Below is one (generic) way to extract data and put into a compressed text file and this format may work for you. (Note also that your WRDS /projects space is only 250MB in size and you would need to temporarily store a larger file on /sastemp). As a way to output data into comma separated value (CSV) format, see the snippet of SAS code below:
filename datout PIPE 'gzip -c > xdata.csv.gz';
*Note: code above must be invoked using noterminal option;
* i.e., 'sas -noterminal program_name &';
Here 'xdata' is your input dataset name, could be two level such as crsp.msf
These filename statements also work
filename datout 'whatever.csv';
filename datout PIPE 'zip -p > xdata.zip';
The first is uncompressed and the second uses regular zip, but the filename is an ambiguous '-'
We suggest gzip as the easiest ftp (transfer) format.
IMPORTANT: If running a SAS program that uses either PROC IMPORT or PROC EXPORT on our UNIX system, you must use the -noterminal option (sas -noterminal program_name &) or you will get a runtime error and need to terminate the job.
Also, you may need to output to your projects directory if the file is larger than 1000K.
filename datout PIPE 'gzip -c > /projects/school/i=user/xdata.csv.gz';
For an alternative SAS transport data format (xpt), the PROC CPORT code is even simpler.
filename datout PIPE 'gzip -c > xdata.xpt.gz';
proc cport data=xdata file=datout;