Data Processing and Searching Solutions

 

Our core business specialty is building программы без рут Data Processing and Searching Solutions

имя михей происхождение и значение We built during time various document processing applications. We retrieve documents from various sources, extract and modify their content, generate needed content, index their content to be able to retrieve them later through complex search mechanisms.

http://genius.zaclab.com/demo/vrozhdennaya-epilepsiya-u-detey.html врожденная эпилепсия у детей
 

Document backup and search solution

Overview

High volume client server document backup and analyzing application.

Problem

Most backup/synch solutions are simple, they don’t allow users to see, search, categorize backed up content. The application makes a backup of desktop computers, mail accounts, GDocs accounts, Twitter, Facebook, analyzes and indexes all data, the user can search and categorize data. It is able to restore versions of the files at a chosen date, download or send by mail documents in a converted format like pdf.

Challenge

Implement a scalable architecture that can process thousands of files per second, can handle billions of files and can store them in a safe, secure and redundant way, can extract data from billions of files, index data and make queries of it, can handle versions for billions of files. The architecture can scale on any number of machines. Implemented also a native client application that runs on multiple OSes that monitors files on disk, backs them up, when versioned sends only file differences to save user bandwidth.

Technical details

Stateless client-server architecture. Java solution for server side with Wicket for interface, Solr as indexing engine, Hadoop for storage, Postgres as database, load balancers in front of them, all components have isolated functionality that allows us to add more processing instances if needed. High processing capabilities due to optimized parallel processing: our test system has 6 instances and can process ~1000 Documents/Second. Highly redundant. The Solr indexing engine supports ~100000 inserts/updates per second without locking due to specialized high-capacity RAM Index Buffer. Hosted on a Profitbricks datacenter. C++ application implemented with Qt for Mac and Windows for client side that backs up files, monitors changes, restores files version at the date chosen by the user. The file sending is optimized to send only differences (customized rsync implementation). Implemented also a C++ scalable file transport server that can handle thousands of clients. The file storage is a custom NFS distributed disk storage. After the launch we’ll start implementing a higher volume solution like a Hadoop or JackRabbit implementation.

Things we did

  • Software Architecture
  • Coding
  • User Interface Design
 
 
 
магги болоньезе рецепт правила внутреннего распорядка школы интернат