Which storage type is best for unstructured data such as pictures and videos?
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can use Amazon S3 to store and protect any amount of data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides management features so that you can optimize, organize, and configure access to your data to meet your specific business, organizational, and compliance requirements. The following are some examples of Amazon S3 benefits. Show
Durability, availability, and scalabilityAmazon S3 was built from the ground up to deliver 99.999999999% (11 9s) of data durability. With Amazon S3, your objects are redundantly stored on multiple devices across a minimum of three Availability Zones (AZs) in an Amazon S3 Region. Amazon S3 is designed to sustain concurrent device failures by quickly detecting and repairing any lost redundancy, and it also regularly verifies the integrity of your data using checksums. Security and complianceAmazon S3 protects your data with security, compliance, and audit capabilities. Amazon S3 is secure by default. Upon creation, only you have access to Amazon S3 buckets that you create, and you have complete control over who has access to your data. Amazon S3 supports user authentication to control access to data. You can use access control mechanisms such as bucket policies to selectively grant permissions to users and groups of users. Additionally, S3 maintains compliance programs, such as PCI DSS, HIPAA/HITECH, FedRAMP, SEC Rule 17 a-4, EU Data Protection Directive, and FISMA, to help you meet regulatory requirements. AWS also supports numerous auditing capabilities to monitor access requests to your Amazon S3 resources. Flexible managementAWS offers the most flexible set of storage management and administration capabilities. Storage administrators can classify, report, and visualize data usage trends to reduce costs and improve service levels. Objects can be tagged with unique, customizable metadata so you can see and control storage consumption, cost, and security separately for each workload. The S3 Inventory tool delivers scheduled reports about objects and their metadata for maintenance, compliance, or analytics operations. Amazon S3 can also analyze object access patterns to build lifecycle policies that automate tiering, deletion, and retention. Finally, since Amazon S3 works with AWS Lambda, customers can log activities, define alerts, and invoke workflows, all without managing any additional infrastructure. Cost-effective storage classesAmazon S3 offers a range of storage classes that you can choose from based on data access, resiliency, and cost requirements of your workloads. Amazon S3 storage classes are purpose-built to provide the lowest cost storage for different access patterns. You pay only for what you use. The rate you’re charged depends on the size of your objects, how long you stored the objects during the month, and your chosen storage class. Find the best Amazon S3 storage class for your workload. Efficient analyticsAmazon S3 is the only cloud storage platform that lets customers run sophisticated analytics on their data without requiring them to extract and move the data to a separate analytics database. Customers with knowledge of SQL can use Amazon Athena to analyze vast amounts of unstructured data in Amazon S3 on-demand. With Amazon Redshift Spectrum, customers can run sophisticated analytics against exabytes of data in Amazon S3 and run queries that span both the data you have in Amazon S3 and in your Amazon Redshift data warehouses. Largest community of customers and partnersAWS has millions of active customers and tens of thousands of partners globally. Customers across virtually every industry and of every size, including startups, enterprises, and public sector organizations, are running every imaginable use case on AWS. The AWS Partner Network (APN) includes thousands of systems integrators who specialize in AWS services and tens of thousands of independent software vendors (ISVs) who adapt their technology to work on AWS. Get started with object storage by creating an AWS account today. Skip to main content This browser is no longer supported. Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Choose a big data storage technology in Azure
In this articleThis topic compares options for data storage for big data solutions — specifically, data storage for bulk data ingestion and batch processing, as opposed to analytical data stores or real-time streaming ingestion. What are your options when choosing data storage in Azure?There are several options for ingesting data into Azure, depending on your needs. File storage:
NoSQL databases:
Analytical databases: Azure Data Explorer Azure Storage blobsAzure Storage is a managed storage service that is highly available, secure, durable, scalable, and redundant. Microsoft takes care of maintenance and handles critical problems for you. Azure Storage is the most ubiquitous storage solution Azure provides, due to the number of services and tools that can be used with it. There are various Azure Storage services you can use to store data. The most flexible option for storing blobs from a number of data sources is Blob storage. Blobs are basically files. They store pictures, documents, HTML files, virtual hard disks (VHDs), big data such as logs, database backups — pretty much anything. Blobs are stored in containers, which are similar to folders. A container provides a grouping of a set of blobs. A storage account can contain an unlimited number of containers, and a container can store an unlimited number of blobs. Azure Storage is a good choice for big data and analytics solutions, because of its flexibility, high availability, and low cost. It provides hot, cool, and archive storage tiers for different use cases. For more information, see Azure Blob Storage: Hot, cool, and archive storage tiers. Azure Blob storage can be accessed from Hadoop (available through HDInsight). HDInsight can use a blob container in Azure Storage as the default file system for the cluster. Through a Hadoop distributed file system (HDFS) interface provided by a WASB driver, the full set of components in HDInsight can operate directly on structured or unstructured data stored as blobs. Azure Blob storage can also be accessed via Azure Synapse Analytics using its PolyBase feature. Other features that make Azure Storage a good choice are:
Azure Data Lake StoreAzure Data Lake Store is an enterprise-wide hyperscale repository for big data analytic workloads. Data Lake enables you to capture data of any size, type, and ingestion speed in one single secure location for operational and exploratory analytics. Data Lake Store does not impose any limits on account sizes, file sizes, or the amount of data that can be stored in a data lake. Data is stored durably by making multiple copies and there is no limit on the duration of time that the data can be stored in the Data Lake. In addition to making multiple copies of files to guard against any unexpected failures, Data lake spreads parts of a file over a number of individual storage servers. This improves the read throughput when reading the file in parallel for performing data analytics. Data Lake Store can be accessed from Hadoop (available through HDInsight) using the WebHDFS-compatible REST APIs. You may consider using this as an alternative to Azure Storage when your individual or combined file sizes exceed that which is supported by Azure Storage. However, there are performance tuning guidelines you should follow when using Data Lake Store as your primary storage for an HDInsight cluster, with specific guidelines for Spark, Hive, MapReduce, and Storm. Also, be sure to check Data Lake Store's regional availability, because it is not available in as many regions as Azure Storage, and it needs to be located in the same region as your HDInsight cluster. Coupled with Azure Data Lake Analytics, Data Lake Store is specifically designed to enable analytics on the stored data and is tuned for performance for data analytics scenarios. Data Lake Store can also be accessed via Azure Synapse using its PolyBase feature. Azure Cosmos DBAzure Cosmos DB is Microsoft's globally distributed multi-model database. Azure Cosmos DB guarantees single-digit-millisecond latencies at the 99th percentile anywhere in the world, offers multiple well-defined consistency models to fine-tune performance, and guarantees high availability with multi-homing capabilities. Azure Cosmos DB is schema-agnostic. It automatically indexes all the data without requiring you to deal with schema and index management. It's also multi-model, natively supporting document, key-value, graph, and column-family data models. Azure Cosmos DB features:
HBase on HDInsightApache HBase is an open-source, NoSQL database that is built on Hadoop and modeled after Google BigTable. HBase provides random access and strong consistency for large amounts of unstructured and semi-structured data in a schemaless database organized by column families. Data is stored in the rows of a table, and data within a row is grouped by column family. HBase is schemaless in the sense that neither the columns nor the type of data stored in them need to be defined before using them. The open-source code scales linearly to handle petabytes of data on thousands of nodes. It can rely on data redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop ecosystem. The HDInsight implementation leverages the scale-out architecture of HBase to provide automatic sharding of tables, strong consistency for reads and writes, and automatic failover. Performance is enhanced by in-memory caching for reads and high-throughput streaming for writes. In most cases, you'll want to create the HBase cluster inside a virtual network so other HDInsight clusters and applications can directly access the tables. Azure Data ExplorerAzure Data Explorer is a fast and highly scalable data exploration service for log and telemetry data. It helps you handle the many data streams emitted by modern software so you can collect, store, and analyze data. Azure Data Explorer is ideal for analyzing large volumes of diverse data from any data source, such as websites, applications, IoT devices, and more. This data is used for diagnostics, monitoring, reporting, machine learning, and additional analytics capabilities. Azure Data Explorer makes it simple to ingest this data and enables you to do complex ad hoc queries on the data in seconds. Azure Data Explorer can be linearly scaled out for increasing ingestion and query processing throughput. An Azure Data Explorer cluster can be deployed to a Virtual Network for enabling private networks. Key selection criteriaTo narrow the choices, start by answering these questions:
Capability matrixThe following tables summarize the key differences in capabilities. File storage capabilities
NoSQL database capabilities
Analytical database capabilities
ContributorsThis article is maintained by Microsoft. It was originally written by the following contributors. Principal author:
FeedbackSubmit and view feedback for What kind of storage is best suited to handle unstructured data?Isilon: built to tame unstructured data
As the #1 family of scale-out network-attached storage systems in the industry, the Isilon distributed file system products provide a highly effective and cost-efficient way to manage unstructured data.
Which storage can store unstructured data?You can store and manage unstructured data at scale by using NetApp® StorageGRID® technology for secure, durable object storage for private and public clouds. With StorageGRID, you can build a massive (multilocation) single namespace, and you can also integrate a unique information lifecycle policy into that data.
What is the best way to store videos?pCloud. pCloud is the best choice for video storage for personal users. ... . Sync.com. Sync.com is the most secure option for online video storage. ... . Icedrive. Icedrive is another excellent option for online video storage. ... . Google Drive. Google Drive is a great option for online collaboration. ... . IDrive. ... . MEGA. ... . Dropbox. ... . 8. Box.. What is unstructured storage?Unstructured data is information that is not arranged according to a preset data model or schema, and therefore cannot be stored in a traditional relational database or RDBMS. Text and multimedia are two common types of unstructured content.
|