Both compressed data and encrypted data look similar: they are a string of apparently random characters that seem to bear no relationship to the original data. But there are significant differences between the intent and the process of compression and encryption.
You compress data so it is smaller, thus reducing storage space or transmission times. But since you want to easily retrieve the original data, compression algorithms are standardized and well known. Consider a ZIP file. A ZIP file can be expanded back into its original file(s) on almost any kind of computer system. In most cases, the receiving system needs no additional information than that contained within the compressed file.
Compression algorithms work by finding strings of characters that are repeated within the data, and replacing each occurrence of the string by a much shorter string. If you had, for example, a long paper about George Washington, a simple compression algorithm might replace each occurrence of “George Washington” with “\gw\” thus replacing 17 characters with just 4 each time. Compression algorithms can find lots of duplicated strings like page headers and footers, and fragments involving parts of words or numbers.
You encrypt data so that only certain people can access it. In order to decrypt the data, the receiver needs to know a secret key. Depending on the type of encryption and the length of the key, it can take the fastest computers from seconds to millions of years to brute force decrypt the data. For any scheme more complicated than a simple character substitution (replace each “A” with “x”), the encryption process eliminates the duplicated strings. “George Washington” will most likely be encrypted into different strings at each occurrence.
Therefore trying to compress encrypted data is just a waste of time, and can actually make the data bigger since there is some overhead just to define the type of compression and other parameters needed to decompress it.
Some compression algorithms support some level of encryption. For example, when you create a ZIP file you can specify an encryption key. Many of these algorithms are very weak and subject to easy attack, plus you must send the key to the receiver by some means. I watched a coworker email an encrypted ZIP file to a partner, then send a follow-up email with the password. If the receiver’s email was compromised, then the cybercriminal just received the data and the key.
Both compression and encryption can take significant processing effort on each end. Usually it takes fewer resources to decompress data than to compress it. Since stored data needs only be compressed once, when it is stored, and is often decompressed many times, this attribute is desirable.
Normally, encryption and decryption times are very close to each other on the same platform. Obviously, the actual times depend on the hardware characteristics of the platform.
You should always encrypt sensitive data, whether it is personal or financial data that is protected by regulations or laws, or proprietary information for a company or classified information for a country.
Whether you choose to compress data is a simple business decision: do you save enough money or data transmission time to justify the added cost of compressing and decompression the data.
The last word:
If you need to compress and encrypt data, first compress the data, then encrypt it. That works and you get the full benefit of the compression. However, the process introduces a vulnerability to attack the encryption.
As mentioned earlier, each compression algorithm adds a header in front of the compressed data. That header defines the compression type and a bunch of parameters and is of a fixed format. It is possible to determine the type of compression that an organization uses or accepts by simply trying different compression schemes and see which ones are accepted. It then becomes far easier to attack the encryption since you know how the clear-text message starts.
Keep your sense of humor.