The Vertoda Blog

A Weblog about Technology, Software and the Vertoda Framework

Installing Ubuntu Server 10.10 as a Virtual Machine using VMWare

Posted by martcon on January 20, 2011

Before you can install programs on Ubuntu LINUX residing on a VMWare Virtual Machine, you need to install VMWare tools. In turn, you must install the gcc compiler for C++ before this can be installed. as follows:

sudo apt-get install build-essential linux-headers-’uname -r’

sudo apt-get install gcc

 

If you have no Internet access the above commands will give an error so you need to mount the CD as follows:

sudo apt-cdrom add

sudo apt-get update

sudo apt-get install build-essential

Sometimes when installing VMWare Tools you may encounter the following error:

The directory of kernel headers (version @@VMWARE@@UTS_RELEASE) does not match your running kernel (version 2.6.35-22-generic-pae). Even if the module were to compile successfully, it would not load into the running kernel.

This error can be overcome by locating the Version.h file. In Ubuntu Server 10.10 this is located at lib/modules/2.6.35.22-generic-pae/build/include. Add the following line to this header file:

#define UTS_RELEASE 2.6.35.22-generic-pae

 The command uname -r gives the version you should use.

Posted in Development Tips | Tagged: , , , , , , , | Leave a Comment »

Fusion Charts & Internet Explorer: Refresh Issue

Posted by martcon on January 4, 2011

When using a Web Page containing a Fusion Chart in Internet Explorer you may sometimes encounter a issue where old data appears to be cached and displayed on the chart. This is a refresh issue but is easily overcome by appending the current time to the dataURL attribute as follows:

dataURL=./Data.xml?CurrDateTime=$cur_time

The current time can be accessed using a language such as PHP or JavaScript. In the above example we used PHP to access the current time and assigned this time to a variable:

$cur_time = time();

The full tag for the chart is as follows:

<object classid=\”clsid:d27cdb6e-ae6d-11cf-96b8-444553540000\”

codebase=\”http://fpdownload.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=8,0,0,0\”

width=\”1000\” height=\”300\” id=\”Trend\” >

<param name=\”movie\” value=\”./FCF_MSLine.swf\” />

<param name=\”FlashVars\” value=\”&dataURL=./Data.xml?CurrDateTime=$cur_time&chartWidth=1000&chartHeight=300\”>

<param name=\”quality\” value=\”high\” />

<embed src=\”./FCF_MSLine.swf\” flashVars=\”&dataURL=./Data.xml?CurrDateTime=$cur_time&chartWidth=1000&chartHeight=300\” quality=\”high\”

width=\”1000\” height=\”300\” name=\”Line\” type=\”application/x-shockwave-flash\”

pluginspage=\”http://www.macromedia.com/go/getflashplayer\” />

</object>

Posted in Development Tips | Tagged: , , , , , | Leave a Comment »

Installing Documentum 6.6 Developer Edition

Posted by martcon on December 22, 2010

When installing Documentum you may encounter the following error:

“DiPADbInstall failed! – Runtime execution failed with child process”

This occurs if you have Microsoft .NET Framework 4.0 installed on your system. Install .NET 2.0 instead and the product will install successfully. As a precautionary measure you should also install Java JRE 1.5 instead of Java 6.

Posted in Development Tips | Tagged: , , , , , , | Leave a Comment »

JavaScript Date.Parse()

Posted by martcon on August 10, 2010

When parsing dates in JavaScript it should be noted that Date.Parse() doesn’t recognise strings with dates in the form dd-mmm-yyyy. One workaround is to first replace the ‘-’ in the string with a space as follows:

// Replace ‘-’ with spaces. Note that /-/g ensures that ALL ‘-’ in the string are replaced.

strToParse = strToParse.replace(/-/g,’ ‘);

// Display the parsed date.

alert(Date.Parse(strToParse);

Posted in Development Tips | Tagged: , , | Leave a Comment »

What is a Smart Object?

Posted by martcon on August 3, 2010

In the context of computer networks and pervasive computing, the term ‘smart object’ is frequently used. However, it can be difficult to define this term. In essence, a smart object is any computing device that can provide real-time data on an ecosystem such as a natural environment or a city. Smart Objects can be roughly categorised as sensors (including wireless sensors), meters (including smart meters), RFID and/or GPS. Their primary function is to enrich the quality and quantity of data for an ecosystem. Their practical effect is reduced consumption and improved efficiency.

Posted in Data Communications & Networking, Green IT | Tagged: , , , , , , | Leave a Comment »

Nature of Cloud Computing & Smart Objects

Posted by martcon on July 12, 2010

The on-demand, self-service, pay-by-use nature of cloud computing is essentially an extension of established trends in computing. From an enterprise perspective, the on-demand nature of cloud computing helps to support the performance and capacity aspects of service-level objectives while the self-service nature of cloud computing allows organizations to create elastic environments that expand and contract based on the workload and target performance parameters. Furthermore, the pay-by-use nature of cloud computing may take the form of equipment leases that guarantee a minimum level of service from a cloud provider.

Virtualization is a key feature of this model as virtualization enables organisations to easily and rapidly create copies of existing environments to support the development activities of testing, development and deployment. Often multiple virtual machines are involved. Minimal cost is involved as different systems and environments can coexist on the same servers as production environments and few resources are used.

Likewise, new applications can be developed and deployed in new virtual machines on existing servers, opened up for use on the Internet, and scaled if the application is successful in the marketplace. This lightweight deployment model has already led to a “Darwinistic” approach to business development as beta versions of software are made public and it is the market that effectively decides which applications will succeed or fail. The latter applications will be retired while the former will be scaled.

Cloud computing extends this trend through automation. Instead of negotiating with an IT organization for resources on which to deploy an application, a compute cloud is a self-service proposition where compute cycles can be purchased and a web interface or Application Programming Interface (API) is used to create virtual machines and establish network relationships between them. Therefore, instead of requiring a long-term contract for services with an IT organization or a service provider, clouds work on a pay-by-use, or pay-by-the-sip model where an application may exist to run a job for a few minutes or hours, or it may exist to provide services to customers on a long-term basis.

Compute clouds are built as if applications are temporary, and billing is based on resource consumption, for example, CPU hours used, volumes of data moved or gigabytes of data stored. The ability to use and pay for only the resources used shifts the risk of how much infrastructure to purchase from the organization developing the application to the cloud provider. It also shifts the responsibility for architectural decisions from application architects to developers. This shift can increase risk which must be managed by enterprises that have processes in place for a reason. System, network, and storage architects need to factor in this risk to cloud computing designs.
So how does this model impact smart networks and smart objects? Given that may of the networks that may be ad-hoc or spontaneous in nature it follows that the processing of data and hence the consumption of resources will vary over time and depending on the network. The pay as you use model for computing is hence very beneficial for these networks as the number of devices and the demand on computing resources will vary. This may also apply even where the number of devices does not vary as data processing requirements will change over time or even over the course of a day for networks such as the smart grid.

Posted in Data Communications & Networking, Green IT | Tagged: , , , , , , , | Leave a Comment »

Exporting to CSV using PHP

Posted by martcon on July 5, 2010

Often, users may wish to export data from a web page to a CSV file. The following outlines how this can be achieved using PHP:

<?php

// Set up the data. \n moves the data to the next row while commas split the data into cells.

$data = “\nTest Data 1, Test Data 2\n”;

// Set up the Raw HTTP Header for diverting the output to a CSV file.

header(“Content-type: application/octet-stream”);
header(“Content-Disposition: attachment; filename=\”myfile.csv\”");

/* The output is now diverted to the CSV file when we echo the data. Users will be asked to open or save  the CSV file using the Internet Browser’s download mechanism. */
echo $data;

?>

Posted in Development Tips | Tagged: , , , | Leave a Comment »

Virtual Machines, Virtualization and Smart Objects

Posted by martcon on June 2, 2010

In recent years, Virtual Machines have become a standard mechanism for deployment. Essentially, a VM is a software implementation of a computer. The VM executes programs and for all intents and purposes acts like a physical machine. This definition encompasses a broad cross-sectional of technology but the two best known categories of VM refer to programming language interpreters like Python and the Java Virtual Machine and the situation where one instance of an operating system (OS) along with one or more applications run in an isolated partition within a computer. These two main categories of VM are referred to as Process and System (or Hardware) VMs respectively i.e. a System VM provides a complete system platform which supports the execution of a complete OS while a Process VM is designed to run a single program which means that it supports a single process. The cardinal point regarding VMs is that the software running inside the VM is limited to the resources and abstractions provided by the VM. In other words, it cannot break out of its virtual world.

The use of a VM for programming languages essentially means that software implemented using the language can run on any OS. This Process (or Application) VM runs as a normal application inside an OS and supports a single process. The VM is created when the process is started and is destroyed when it exits. The Process VM provides a platform-independent programming environment that abstracts and isolates details of the underlying hardware or OS and allows the program to execute in the same way on any platform. A high-level programming language translator called an interpreter is used to implement a Process VM.

The most well known Process VM is that provided by Sun Microsystems for the Java programming language. The Java VM (JVM) is the runtime engine of the Java platform. This allows any program written in Java or indeed any programming language compiled into Java bytecode to run on any computer that has a native JVM. In effect, this means, with very few exceptions such as the rare occasion where hardware such as a Serial Port is interfaced, that a Java program compiled and developed using the JVM for Microsoft Windows can run under the JVM for LINUX, UNIX, MAC OS and other OS. Similarly, Python is an interpreted programming language that requires a VM for runtime execution while Microsoft’s .NET Framework runs a Process VM called the Common Language Runtime (CLR).

The ability of different OS to run in the same computer at the same time by using a System VM prevents applications interfering with each other. This is much more flexible than a dual-boot or multi-boot environment where the user has to choose the OS that they are using at the start. All VMs run simultaneously in such a scenario. This System VM architecture allows the sharing of the underlying physical machine resources between the different VMs running their own OS. The OS in each VM partition are called guest OS and communicate with the hardware via a control program called a VM Monitor (VMM) or Hypervisor. The guest OS do not have to be the same, meaning that Microsoft Windows and LINUX can run in their own VMs on the same computer. The need and desire to run multiple OS was the original motivation for VMs.

A VMM is the control software that creates the VM environment in a computer. Normally, the OS is the master control program for a computer, managing the execution of all applications and acting as an interface with the software applications and the hardware. As you would expect, the OS has the highest privilege level in the computer. In a VM environment on the other hand, the VMM becomes the master control program and has the highest privilege level. The VMM manages the guest OS or applications. In the case of the former, each guest OS manages its own applications as it would in a non-VM environment with the key difference that it has been isolated in the computer by the VM. Together with its applications each guest OS is considered to be a VM. A VMM can run on the bare hardware or on top of an OS. The former is known as a Type I or Native VM while the latter is known as a Type II or Hosted VM.

The principal value of a System VM is that multiple OS environments and applications can co-exist on the same computer in isolation from each other. The VM are provide an instruction set architecture (ISA i.e. the specification of the machine language instructions that the computer follows) that is somewhat different from that of the real machine while the System VM architecture facilitates application provisioning, maintenance, high availability and disaster recovery.  System VMs are frequently used to consolidate servers where different applications and OS that had their own individual server machines to avoid clashes now run in separate VMs on the same physical computer. This is known as Quality of Service (QoS) isolation. The one caveat to note is that a VM is less efficient than a real machine for accessing hardware components as it does so indirectly. This issue also applies for Process VMs and is one reason why it is more efficient to use programming languages such as C++ to interface with hardware and the OS than it is with Java.

As noted, VMs have become a standard mechanism for deployment in recent years. The fundamental building block is a software image which is a copy of the state of a computer system stored in a file. Virtual Machine Images are simply software images installed onto a VM. The other mechanism used is the virtual appliance. This is based on the concept of a software appliance which is itself one or more applications that is combined with a customized OS (known as Just Enough Operating System – JeOS)  to fit the needs of the application(s).  Virtual appliances are software applications designed for deployment in a VM. These VMs include software that is partially or fully configured to perform a specific task such as a Web or database server.

The flexibility provided by VM has been further enhanced by Virtualization. As the name implies, this is the creation of a virtual version of an artefact such as an OS (as previously discussed), a server,  a storage device or network. We alluded to server virtualization earlier when discussing consolidating servers. In essence, this is the masking of resources such as the number and identity of the physical machines, the processors and OS from users. The frees the users from having to manage server resources while increasing the sharing and utilization of same and also provides the ability to expand server resources while hiding the details of this expansion from users.

Multiple storage devices on a network can also be combined into what appears to be a single storage device that can be managed centrally. This is known as storage virtualization and is frequently used in Storage Area Networks (SANs). The final category of Virtualization that we will consider is Network Virtualization which is a mechanism for combining the available resources in a network by splitting the bandwidth into distinct channels each of which is independent from the others. Each channel can then be assigned to a particular server or device in real-time. This disguises the complexity of the network by separating it into manageable parts.

Virtualization, then, abstracts the hardware to the point where software stacks can be deployed and redeployed without being tied to a specific physical computer server. Virtualization enables a dynamic data centre where servers provide a pool of resources that are harnessed when required. The relationship of applications of compute, storage and network resources will then change dynamically so as to meet both workload and business demands.

This effectively means that application deployment is decoupled from server deployment. Applications can thus be deployed and scaled rapidly without having to first procure physical servers. The prevalent abstraction facilitating this is the Virtual Machine. The VM has become the primary mechanism or unit of deployment as it is the least-common denominator interface between providers of services and system developers. Sun Microsystems claims that using a VM as a deployment object is sufficient for 80% of application usage and assists in the rapid deployment and scaling of applications. The use of Virtual Appliances further enhances the ability to create and deploy applications rapidly. It is this combination of Virtual Machines and Appliances as standard deployment objects that is one of the key features of Cloud Computing.

One of the better known VM products is VMWare (http://www.vmware.com) . Citrix (http://www.citrix.com) also provide a comprehensive range of Virtualization software while Oracle VM Virtual Box (http://www.virtualbox.org) is an open source virtualization product.

In tandem with Cloud Computing, Smart Objects and Smart Infrastructure are driving new IT projects. The concept of the Virtual Machine is a useful one for smart objects solutions such as Vertoda. System VMs can be used to encapsulate and isolate smart object data capture and storage mechanisms for different smart ecosystems. In other words, one server can host VMs for several smart networks. The smart objects themselves can run on Process VMs. For example, Sun SPOT Wireless Sensors are built on the Squawk Java VM. The corollary to this is the use of virtualization to process the data for different smart objects. An ecosystem may be made up of wireless sensors, smart meters, RFID and GPS and may require intensive processing. Using Virtualization a unified presentation of the data is possible even though processing may mandate the use of several servers.

The use of the VM as a deployment objects also facilitates Smart Infrastructure solutions. It is possible that such solutions will be deployed in several locations. In such a situation the use of a Virtual Appliance consisting of data capture, organisation and storage software such as the Vertoda Framework will be extremely easy to install in a timely fashion. Finally, it is worth noting the potential for the use of Network Virtualization for smart ecosystems. Given the range and number of devices and users in pervasive computing networks, a strategy of virtualizing the network on a per-user, device category or location basis makes perfect sense.

Posted in Data Communications & Networking, Green IT | Tagged: , , , , , , , , , , , , , , , , , , , , , , , | Leave a Comment »

Parallelization & Smart Objects

Posted by martcon on May 17, 2010

In our previous blog, we discussed Horizontal Scaling and how scaling across multiple computer servers is a key feature of Cloud Computing and has potential benefits for smart objects and smart networks. Another concept which goes hand in hand with horizontal scaling is parallelization. With the advent of Cloud Computing, the scale and implementation of the concepts of parallelization have changed. Parallelization can increase the speed of software operations or increase response time. Simultaneously, Vertical Scaling can be used on symmetric multiprocessors to spawn multiple program threads.

However, as the Sun Microsystems White Paper on Cloud Computing Architecture points out, vertical scaling only has as much parallel processing capability as the server has processors (or cores) – or, at least, as many cores that have been purchased and allocated to a particular Virtual Machine (VM). This is because today’s computing environments are shifting towards x86-architecture servers with two or four programming sockets (i.e. the interfaces which make network programming possible.). It is for this reason that parallelization should be considered on a more macro scale than our previous description as software that can use parallelization across many servers can scale to potentially thousands of servers. This infinitely increases the potential for scalability than was possible with symmetric multiprocessing.

In the traditional physical world of computing, parallelization has been frequently implemented using load balancers or content switches that distribute incoming requests from software programs across a number of servers. Similarly, parallelization in a cloud computing world can be implemented with a load balancing application or a content switch but distributing incoming requests across a number of virtual machines in this situation.  In both scenarios, applications can be designed to recruit additional resources to accommodate workload spikes.

The classic example of parallelization with load balancing is a number of stateless web servers (i.e. a server that treats each request as an independent transaction that is unrelated to any other request) where the incoming workload is distributed across a pool of servers. Of course, there are many other ways to use parallelization in Cloud Computing environments. For example, a Cloud Computing application that uses a significant amount of CPU time to process user data might use a scheduler to receive jobs from users. The scheduler then places the data into a repository and starts a new VM for each job and hands the VM a token that allows it to retrieve the data from the repository. When the VM has completed its task it passes a token back to the scheduler that allows it to pass the completed project back to the user and then terminates.

Applications can be parallelized only to the extent that their data can be partitioned so that independent systems can operate on it in parallel. Any credible application architecture should include a plan for dividing and conquering data. The partitioning of data has a significant impact on the volume of data transferred over networks. There are several examples of parallelization that leverage data partitioning. We have previously discussed Hadoop (http://hadoop.apache.org). As noted previously, this is an implementation of the MapReduce design pattern which is itself an implementation of the master/workers parallelization design pattern. Database sharding, which we discussed previously, can be accomplished through a range of partitioning techniques including vertical partitioning (i.e. partitioning by database table column), range-based partitioning (e.g. by date) and directory-based partitions (i.e. partitioning by distrinct domains). The approach taken really depends on how the data is to be used.

Parallelization is also being used in the finance industry. Major financial institutions have refactored their fraud detection algorithms so that what was once more a batch data-mining operation where patterns and trends were detected from large data sets now runs on a large number of systems in parallel and provides real-time analysis of incoming data. Some High Performance Computing (HPC) applications that deal with three-dimensional data have been designed so that the state of one cubic volume of a gas, liquid or solid can be calculated for time t by one process. This means that the state of the one cube is passed onto the parallel processes representing eight adjoining cubes and the state is calculated for time t+1.

The argument for the use of parallelization is therefore clear. The data management of smart objects and smart networks would also benefit from the adoption of a parallelization strategy as the volume of data and the conversion of that data into meaningful information may necessitate the use of parallelization techniques. The myriad of devices and the lack of standardization in packet formats and data transmission may lead to many different types of data packet listeners and data capture and interpretation software being needed. Consider the example of a system that captures data from a wireless sensor network (WSN) and a smart grid. The smart grid may transfer data to the system using 2.5G or 3G telecommunications while the WSN may transfer data using Zigbee. The packets would be in different formats, would contain different data and would require different software to capture and translate the packets. When one factors in the different Operating Systems (TinyOS, Contiki or indeed none in many cases) and Programming Languages (nesC, C++, Java among others) used, it is clear that bespoke software would be required for the different smart objects, be they sensors, smart meters, GPS readers or RFID tags. These data capture modules would ideally run in parallel so that data could be captured from these devices simultaneously thus providing a richer snapshot of the condition and activities taking place within the environment or infrastructure being monitored.

Partitioning strategies could also play a key role in conjunction with parallelization for the data management of smart objects. Smart networks (or smart dust) could comprise of tens of thousands of computing devices. By adopting a mechanism by which data could be organised and partitioned by group, location or by date captured, data could be distributed horizontally across the Cloud. Similarly, partitioning could be undertaken on a vertical basis where database table columns could be split logically.

Like the other aspects of Cloud Computing that we have discussed in previous blogs, parallelization is another technique that is helping to make Cloud Computing an enabling technology for the data management of smart objects. Vertoda provides data management and middleware that can be used in the Cloud to organize and store smart object data. We are also developing a platform that will greatly enhance the ability to capture data from the myriad of smart objects and manage this data both in Cloud and Enterprise Computing environments.

Posted in Data Communications & Networking | Tagged: , , , , , , , , , , , , , , , , , , | Leave a Comment »

Horizontal Scaling & Smart Objects

Posted by martcon on May 13, 2010

Traditionally, software architects and developers would have expected their applications to run on a powerful server. Recently, however, the trend towards horizontal scaling has been increasing. Rather than expecting applications to run on highly scalable servers, developers havebeen redesigning (or refactoring) Information Systems and software applications so that they can scale horiizontally across a number of computer servers. This refactoring of applications is not a trivial task as both the applications and the data captured, managed and stored by these applications must be designed so that both processing and data can be broken down into smaller chunks. It is this existing architectural trend that has been a key factor propelling the adoption of cloud computing.

There are examples of horizontal scaling in High Performance Computing (HPC), Database Management Systems, CPU-intensive processing and Data-intensive processing.  Horizontal scaling had been used for HPC workloads long before the advent of cloud computing in a Grid Computing framework. Developers have refactored applications to achieve the distribution of HPC workloads across bare-metal compute grids. HPC has been used in many scientific applications. For example, scientists have broken down data for applications such as 3-D climate modelling so that it can be spread across a large number of servers. Grid computing is a predecessor to cloud computing as it uses tools to provision and manage multiple racks of physical servers so that they can all work together to solve a problem. As HPC is extremely demanding in terms of compute power, interprocess communication and input-output (I/O), such workloads would be most suitable for Clouds that provide Infrastructure As A Service (IaaS). Access to bare-metal servers or Type I Virtual Machines (VM) that provide more direct access to I/O devices would be specific examples.

As a sidebar we will define the terms bare-metal and Type I VMs. Bare-metal refers to the underlying physical architecture of a computer or server. Running an Operating System on bare-metal refers to running an unmodified version of the OS on the physical hardware. A Type I or Native VM refers to the scenario where the software layer that provides the virtualization for the VM runs on the bare hardware. Given that many HPC applications leverage the hardware directly for purposes of speed, it is clear that Bare-metal servers or Type I VMs would be suited for such applications.

Database Management Systems can also be adapted to run in cloud computing environments. Database servers can be horizontally scaled and database tables can be partitioned across the servers. This technique is known as sharding and allows multiple instances of database software, be it Oracle, MySQL, SQL Server or any other type of database, to scale performance in a cloud computing environment. Rather than accessing a single, central database, applications now access one of the many database instances depending on which shard contains the requested data.

CPU intensive applications are also good candidates for horizontal scaling. Applications that perform intensive tasks such as frame rendering (the process of transforming logical objects such as points, lines etc. into physical representations) can create a separate VM to render each frame rather than creating a new programming thread, thus enchancing performance. Horizontal scaling is also suitable for data-intensive processing as large amounts of data can be processed and the results coalesced to a coordinating process. For example, Hadoop (http://hadoop.apache.org/), which we discussed in a previous blog, is an open source implementation of the MapReduce framework for processing huge datasets using multiple computers.

The question we will ask in this blog is the role Horizontal Scaling can play for Smart Objects and Smart Networks. As we have illustrated in previous blogs, smart objects provide a rich new pool of real-time or near real-time data that will require processing. This data will need to be processed and stored. Frequently, the data will be need to be converted to meaningful information. For example, applications processing Wireless Sensor Network (WSN) data may have to apply complex mathematical formulae to convert measurements from engineering units to a more meaningful metric. When multiple instances of such data is arriving to a system every second it is clear that such applications may be both CPU and data intensive and would benefit from a horizontal scaling strategy. Similar logic also applies for storing smart object data. It may be difficult to predict the data storage requirements for smart networks at the outset. Rather than running the risk of a single database server becoming full, the data can be categorised and distributed across the cloud.

The use of data from smart networks for intensive tasks such as data mining, prediction formulation and pattern detection/analysis would certainly be CPU and data intensive tasks that would also be candidates for a HPC application. HPC could also be used for simulating and testing smart networks. One example is the use of HPC by the US Army Redstone Technical Test Center to test Ad-Hoc Wireless Sensor Networks. Given the role of smart objects in environmental monitoring and scientific applications – for example, biosensors to detect the presence of chemicals or other agents or monitor human and animal health – one would expect the use of HPC to process smart object data to grow in the coming years.

Posted in Data Communications & Networking | Tagged: , , , , , , , , , , , , , , , , , , , | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.