Wednesday, 17 August 2016

Introduction to Amazon S3 with Java and REST


Amazon Simple Store Service (S3) is a service from Amazon that allows you to store files into reliable remote storage for a very competitive price; it is becoming very popular. S3 is used by companies to store photos and videos of their customers, back up their own data, and more. S3 provides both SOAP and REST APIs; this article focuses on using the S3 REST API with the Java programming language.

S3 Basics

S3 handles objects and buckets. An object matches to a stored file. Each object has an identifier, an owner, and permissions. Objects are stored in a bucket. A bucket has a unique name that must be compliant with internet domain naming rules. Once you have an AWS (Amazon Web Services) account, you can create up to 100 buckets associated with that account. An object is addressed by a URL, such as The object identifier is a filename or filename with relative path (e.g., myalbum/august/photo21.jpg). With this naming scheme, S3 storage can appear as a regular file system with folders and subfolders. Notice that the bucket name can also be the hostname in the URL, so your object could also be addressed by

S3 REST Security

S3 REST resources are secure. This is important not just for your own purposes, but also because customers are billed depending on how their S3 buckets and objects are used. An AWSSecretKey is assigned to each AWS customer, and this key is identified by an AWSAccessKeyID. The key must be kept secret and will be used to digitally sign REST requests. S3 security features are:

  • Authentication: Requests include AWSAccessKeyID
  • Authorization: Access Control List (ACL) could be applied to each resource
  • Integrity: Requests are digitally signed with AWSSecretKey
  • Confidentiality: S3 is available through both HTTP and HTTPS
  • Non repudiation: Requests are time stamped (with integrity, it's a proof of transaction)

The signing algorithm is HMAC/SHA1 (Hashing for Message Authentication with SHA1). Implementing a String signature in Java is done as follows:

private javax.crypto.spec.SecretKeySpec signingKey = null;
private javax.crypto.Mac mac = null;
// This method converts AWSSecretKey into crypto instance.
public void setKey(String AWSSecretKey) throws Exception
  mac = Mac.getInstance("HmacSHA1");
  byte[] keyBytes = AWSSecretKey.getBytes("UTF8");
  signingKey = new SecretKeySpec(keyBytes, "HmacSHA1");

// This method creates S3 signature for a given String.
public String sign(String data) throws Exception
  // Signed String must be BASE64 encoded.
  byte[] signBytes = mac.doFinal(data.getBytes("UTF8"));
  String signature = encodeBase64(signBytes);
  return signature;

Authentication and signature have to be passed into the Authorization HTTP header like this:

Authorization: AWS <AWSAccessKeyID>: <Signature>.

The signature must include the following information:

  • HTTP method name (PUT, GET, DELETE, etc.)
  • Content-MD5, if any
  • Content-Type, if any (e.g., text/plain)
  • Metadata headers, if any (e.g., "x-amz-acl" for ACL)
  • GMT timestamp of the request formatted as EEE, dd MMM yyyy HH:mm:ss
  • URI path such as /mybucket/myobjectid

Here is a sample of successful S3 REST request/response to create "onjava" bucket:

PUT /onjava HTTP/1.1
Content-Length: 0
User-Agent: jClientUpload
Date: Sun, 05 Aug 2007 15:33:59 GMT
Authorization: AWS 15B4D3461F177624206A:YFhSWKDg3qDnGbV7JCnkfdz/IHY=

HTTP/1.1 200 OK
x-amz-id-2: tILPE8NBqoQ2Xn9BaddGf/YlLCSiwrKP+OQOpbi5zazMQ3pC56KQgGk
x-amz-request-id: 676918167DFF7F8C
Date: Sun, 05 Aug 2007 15:30:28 GMT
Location: /onjava
Content-Length: 0
Server: AmazonS3

Notice the delay between request and response timestamp? The request Date has been issued after the response Date. This is because the response date is coming from the Amazon S3 server. If the difference from request to response timestamp is too high then a RequestTimeTooSkewed error is returned. This point is another important feature of S3 security; it isn't possible to roll your clock too far forward or back and make things appear to happen when they didn't.

Note: Thanks to ACL, an AWS user can grant read access to objects for anyone (anonymous). Then signing is not required and objects can be addressed (especially for download) with a browser. It means that S3 can also be used as hosting service to serve HTML pages, images, videos, applets; S3 even allows granting time-limited access to objects.

Creating a Bucket

The code below details the Java implementation of "onjava" S3 bucket creation. It relies on packages for HTTP, java.text for date formatting and java.util for time stamping. All these packages are included in J2SE; no external library is needed to talk to the S3 REST interface. First, it generates the String to sign, then it instantiates the HTTP REST connection with the required headers. Finally, it issues the request to web server.

public void createBucket() throws Exception
  // S3 timestamp pattern.
  String fmt = "EEE, dd MMM yyyy HH:mm:ss ";
  SimpleDateFormat df = new SimpleDateFormat(fmt, Locale.US);

  // Data needed for signature
  String method = "PUT";
  String contentMD5 = "";
  String contentType = "";
  String date = df.format(new Date()) + "GMT";
  String bucket = "/onjava";

  // Generate signature
  StringBuffer buf = new StringBuffer();
  String signature = sign(buf.toString());

  // Connection to
  HttpURLConnection httpConn = null;
  URL url = new URL("http","",80,bucket);
  httpConn = (HttpURLConnection) url.openConnection();
  httpConn.setRequestProperty("Date", date);
  httpConn.setRequestProperty("Content-Length", "0");
  String AWSAuth = "AWS " + keyId + ":" + signature;
  httpConn.setRequestProperty("Authorization", AWSAuth);
  // Send the HTTP PUT request.
  int statusCode = httpConn.getResponseCode();
  if ((statusCode/100) != 2)
    // Deal with S3 error stream.
    InputStream in = httpConn.getErrorStream();
    String errorStr = getS3ErrorCode(in);

Dealing with REST Errors

Basically, all HTTP 2xx response status codes are success and others 3xx, 4xx, 5xx report some kind of error. Details of error message are available in the HTTP response body as an XML document. REST error responses are defined in S3 developer guide. For instance, an attempt to create a bucket that already exists will return:

HTTP/1.1 409 Conflict
x-amz-request-id: 64202856E5A76A9D
x-amz-id-2: cUKZpqUBR/RuwDVq+3vsO9mMNvdvlh+Xt1dEaW5MJZiL
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Sun, 05 Aug 2007 15:57:11 GMT
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
  <Message>The named bucket you tried to create already exists</Message>

Code is the interesting value in the XML document. Generally, this can be displayed as an error message to the end user. It can be extracted by parsing the XML stream with SAXParserFactory, SAXParser and DefaultHandler classes from org.xml.sax and javax.xml.parsers packages. Basically, you instantiate a SAX parser, then implement the S3ErrorHandler that will filter for Code tag when notified by the SAX parser. Finally, return the S3 error code as String:

public String getS3ErrorCode(InputStream doc) throws Exception
  String code = null;
  SAXParserFactory parserfactory = SAXParserFactory.newInstance();
  SAXParser xmlparser = parserfactory.newSAXParser();
  S3ErrorHandler handler = new S3ErrorHandler();
  xmlparser.parse(doc, handler);
  code = handler.getErrorCode();
  return code;

// This inner class implements a SAX handler.
class S3ErrorHandler extends DefaultHandler
  private StringBuffer code = new StringBuffer();
  private boolean append = false;

  public void startElement(String uri, String ln, String qn, Attributes atts)
    if (qn.equalsIgnoreCase("Code")) append = true;
  public void endElement(String url, String ln, String qn)
    if (qn.equalsIgnoreCase("Code")) append = false;
  public void characters(char[] ch, int s, int length)
    if (append) code.append(new String(ch, s, length));

  public String getErrorCode()
    return code.toString();
A list of all error codes is provided in S3 developer guide. You're now able to create a bucket on Amazon S3 and deal with errors.

File Uploading

Upload and download operations require more attention—S3 storage is unlimited, but it allows 5 GB transfer maximum per object. An optional content MD5 check is supported to make sure that transfer has not been corrupted, although an MD5 computation on a 5 GB file will take some time even on fast hardware.

S3 stores the uploaded object only if the transfer is successfully completed. If a network issue occurs then file has to be to uploaded again from the start. S3 doesn't support resuming or object content partial update. That's one of the limits of the first "S" (Simple) in S3, but the simplicity also makes dealing with the API much easier.

When performing a file transfer with S3, you will be responsible for streaming the objects. A good implementation will always stream objects, as otherwise they will grow in Java's heap; with S3's limit of 5 GB on an object, you could quickly be seeing an OutOfMemoryException.

Beyond This Example

Many other operations are available through the S3 APIs:

  • List buckets and objects
  • Delete buckets and objects
  • Upload and download objects
  • Add meta-data to objects
  • Apply permissions
  • Monitor traffic and get statistics (still a beta API)

Adding custom meta-data to an object is an interesting feature. For example, when uploading a video file, you could add "author," "title," and "location" properties, and retrieve them later when listing the objects. Getting statistics (IP address, referrer, bytes transferred, time to process, etc.) on buckets could be useful too to monitor traffic.