✏ Table of Content :
What is Data Processing ?
Once, the data has been collected, it has to be processed and reduced so that it can be analyzed by the researcher. The data which is collected through a study is usually in a raw form. This form of data often has errors and inconsistencies which are not relevant for the study. This raw data has to be transformed into a relevant set by the researcher through the process of editing, coding and tabulation. This stage is very important for effective research work, as processing the data reduces the errors and biasness resulting in relevant and specific data which is appropriate for analysis.
Data processing is the primary stage of data analysis. It refers to setting-up the collected data in such a format that it can be appropriately coded and analyzed through computer. Without processing the data, it cannot be evaluated and communicated.
While processing the data, the researcher simplifies, conceptualizes, and transforms the selected data into a proper form as per the research objectives. This refining of data continues throughout the research process. Processing of data should be done in such a way that it gives the appropriate analytical outcomes, and the integrity of the original data is maintained.
The raw data obtained from the questionnaires must undergo preliminary preparation before they can be analyzed using statistical techniques. The quality of the results obtained from the statistical techniques and their subsequent interpretation depend to a great degree on how well the data were prepared and converted into a form suitable for analysis.
Definition of Data Processing
Here are definitions of data processing provided by various authors:
According to William H. Inmon:
"Data processing refers to the process of converting raw data into useful information through a series of operations such as sorting, filtering, summarizing, and aggregating."
Ralph Kimball, another influential figure in data warehousing:
"Data processing involves transforming raw data into organized, meaningful, and actionable information by applying various techniques such as data cleansing, integration, transformation, and loading into a data warehouse or data mart."
James Martin, an expert in information technology and software engineering:
"Data processing is the systematic series of operations performed on data to extract meaningful information, including data collection, data entry, data transformation, data storage, data retrieval, and data analysis."
In the book "Principles of Data Integration" by AnHai Doan, Alon Halevy, and Zachary Ives:
"Data processing involves transforming and manipulating data to extract, transform, and load it into a target system or format, ensuring that it is in a usable state for analysis, decision-making, or other purposes."
According to the International Organization for Standardization (ISO):
"A sequence of operations performed on data to extract information, derive new data, or summarize data in a usable form."
Features of Data Processing
Characteristics of data processing can be understood with the help of following points:
1) Provides Accurate Data:
In data processing, a researcher cheeks the accuracy of the data. As a result, if data processing is done properly, all chances of errors in collected data are eliminated. It also ensures that the quality of the dats is maintained by rectifying all the errors and omissions after the data is processed.
2) Provides Comprehensive Data:
After going through data processing, the raw data transforms into meaningful information that can be better understood by the researcher and in turn can be analyzed in an efficient way.
3) Converts in a Suitable Format:
Data processing increases the efficiency of data by transforming it into a suitable format that is computer readable as well as comprehensive, due to which the analysis can be performed in less time.
4) Helps in Decision-Making:
Processing the data also helps in decision-making. Data processing operations transform the raw data into relevant information, which helps the managers to make important decisions and take necessary actions.
5) Provides Comprehensive Data:
Data processing makes the raw data concise, easy to read, and presentable. It becomes easy for the managers to comprehend the data if the data is processed, as it takes less time.
Stages of Data Processing
The stages of data processing can vary depending on the specific context and requirements of the data processing task. However, in a general sense, data processing typically involves the following stages or steps:
1) Data Collection:
This stage involves gathering raw data from various sources such as sensors, databases, files, or manual input. The data can be in different formats, including structured (e.g., databases), semi-structured (e.g., XML), or unstructured (e.g., text documents).
2) Data Preparation:
Once the raw data is collected, it needs to be prepared for processing. This stage may involve data cleaning, which includes removing errors, duplicates, or inconsistencies, as well as handling missing values. Data transformation and normalization may also be performed to ensure data uniformity and compatibility.
3) Data Entry:
In cases where data is not collected automatically, data entry may be required to input the data into a system or database. This stage involves manual data input and validation.
4) Data Processing:
This is the core stage where the actual processing of data takes place. It involves various operations such as sorting, filtering, aggregating, calculating, joining, or transforming the data to extract meaningful insights or derive new data.
5) Data Analysis:
Once the data is processed, it can be subjected to analysis to uncover patterns, trends, correlations, or other insights. Statistical analysis, data mining techniques, machine learning algorithms, or other analytical methods may be applied to gain a deeper understanding of the data.
6) Data Visualization:
This stage involves presenting the processed data in a visual format such as charts, graphs, dashboards, or reports. Data visualization aids in conveying the insights derived from the data in a more understandable and actionable manner.
7) Data Storage and Retrieval:
Processed and analyzed data may need to be stored for future reference or further analysis. It can be stored in databases, data warehouses, or other data storage systems. Retrieval mechanisms are put in place to efficiently access and retrieve the stored data when needed.
8) Data Dissemination:
In some cases, the processed data or derived insights need to be shared or disseminated to relevant stakeholders. This can involve generating reports, presentations, or interactive visualizations that communicate the findings effectively.
Types of Data Processing
There are various types of data processing, each serving different purposes and requirements. Here are some common types of data processing:
1) Batch Processing:
Batch processing involves processing a large volume of data in batches or groups. In this type of processing, data is collected over a period of time and processed at a later stage altogether. Batch processing is typically used for non-real-time applications where immediate results are not required, such as generating reports, performing data backups, or running scheduled tasks.
2) Real-time Processing:
Real-time processing involves handling data as soon as it is generated or received, without any delay. This type of processing is used when immediate responses or actions are required based on incoming data. Real-time processing is commonly used in applications like financial trading systems, real-time monitoring, online transactions, or data streaming.
3) Online Transaction Processing (OLTP):
OLTP involves processing and managing transactional data in real-time. It is commonly used in databases or systems that handle numerous concurrent transactions, such as e-commerce platforms, banking systems, or airline reservation systems. OLTP focuses on maintaining data integrity, ensuring data consistency, and supporting high-speed transactional processing.
4) Online Analytical Processing (OLAP):
OLAP involves processing and analyzing large volumes of data to support complex analytical queries and multidimensional analysis. OLAP is commonly used for decision support systems, data warehouses, or business intelligence applications. OLAP allows users to perform ad-hoc queries, drill-down analysis, and generate reports for strategic decision-making.
5) Interactive Processing:
Interactive processing involves processing data in real-time or near real-time while maintaining an interactive user experience. This type of processing is commonly used in applications that require quick responses and user interaction, such as online gaming, web applications, or interactive dashboards.
6) Stream Processing:
Stream processing involves processing data continuously as it flows or streams in real-time. Stream processing is used for analyzing and deriving insights from data streams that are generated by various sources, such as sensors, social media feeds, or IoT devices. It enables real-time monitoring, anomaly detection, and event-driven processing.
7) Parallel Processing:
Parallel processing involves dividing a large task or dataset into smaller parts and processing them concurrently on multiple processors or computing resources. It helps in achieving faster processing speeds and handling computationally intensive tasks. Parallel processing is commonly used in tasks like big data analytics, scientific simulations, or image and video processing.
Data Processing Operations
Data processing converts the raw and unorganized data into meaningful and organized information. Raw data can be in any form and structure which may not be useful for research. The solution to this problem is processing the data in various steps to make it more relevant and meaningful. Various data processing operations in research methodology are as follows:
1) Validation :
The process of determining, to the extent possible, whether a surveys interviews or observations were conducted correctly and are free of fraud or bias. In many data collection approaches it is not always convenient to closely monitor data collection process wherein to facilitate the accurate data collection each respondent's name, address and phone number may be recorded. While this information is not used for analysis, it does enable the validation process to be completed.
It refers to the process of thoroughly checking the collected data to ensure optimal quality levels. This step transforms raw data into validated data. The validated data are then processed to produce the summary reports. It is far too easy to credit data with accuracy rather than making a sufficient scrutiny of it and of the methods by which It was acquired. The reality of many hurried data-gathering projects should put every researcher on guard. Marketing researchers have, at times, confirmed that one of their most serious concerns is the errors in survey date submitted to them by the research agencies they employ. The alarming quotation may be substantially justified and makes validation a very important step.
Phases of Data Validation
Data validation has following two phases:
i) Field Validation:
Field Validation involves the data validation phenomenon in the actual field of data collection. Following aspects should be validated in the field where the data are collected.
a) Data Collection Process:
Data validation begins by thoroughly examining the data collection process in the field where it occurred. When interviewers are involved, chis validation often occurs while the interviews are taking place. It is important for interviewers and individuals collecting the data to follow the process that has been outlined. If not, the results can be affected. For example, if a mall intercept study is being conducted and interviewers are requested to stop every 15th person and read the questions to the respondent, then it is important every interviewer follow that procedure. Sample selection error can occur if some interviewers stop every 15th person while others stop individuals around the 15th person who they think will take time to complete the survey. It might be the 13th person, the 18th person, or some other number.
b) Proper Screening:
To ensure accuracy of data collected in set prescribed criteria such Household income level, recent purchase of a specific product and brand or even gender or age. For example, an interview procedure may require that only female heads of households with an annual household income of ₹25000 or more be interviewed. In this case validation callback would verify each of these factors. If a screening question is based on gender or participation in some type of lifestyle activity, such as fishing, the respondent's answer to the screening question can be easily compared against the respondent's profile to determine whether or not he or she really qualifies. Often panel members will be pre-screened so that only those who meet the criteria are invited to complete the survey, resulting in a superior sample.
c) Fraud:
The third area to be checked in the field may be the most difficult to detect, and that is fraud. With fraud, interviewers or data collectors falsify the data by completing the questionnaire themselves, or they may fill in questions the respondent left blank in order to complete the questionnaire. The latter fraudulent situation can Occur when individuals are paid for completed surveys. It is easier and faster to fill in a few missing questions than. 19 discard the questionnaire and find a new respondent. The former situation may occur when individuals responsible for data collection are under time and cost constraints. To ensure deadlines are met at the quoted price, it may be tempting to falsify the data rather than go back to the client and either admit they did not collect the number of surveys promised, or ask for additional time.
ii) Validation within the Firm:
The next component of validation process is the checking of data completeness and data usability within the research firm. It is the crucial responsibility of the research firm or research department where the data are being tabulated and analyzed.
a) Data Completeness:
Data collection duties are often outsourced to firms that specialize in this process. Once the data are returned to the central office, an additional editing phase is typically undertaken. During this phase, the completeness and the usefulness of the data are examined. Surveys may contain incomplete information. Sometimes individuals unknowingly skip a question, while other times people refuse to answer one or more questions. Surveys missing answers to entire sections or pages are likely to be of little value to the firm, and are typically eliminated at this stage on the basis of incompleteness. For example, if one of the survey objectives sought to determine whether significant differences in attitudes existed on the basis of gender, age, and education, then a survey lacking answers to these demographic questions is of no value. However, it also must be remembered that skip patterns, often direct individuals away from answering questions that are not relevant. In this case, what appears to be at first glance, an incomplete survey, may in fact contain all the relevant information for that subject.
b) Data Usability:
It is an evaluation by the investigator to determine if the analytical date are of sufficient quality for the intended purpose and can be relied upon by the investigator with the appropriate degree of I confidence to support the conclusions that will be made using the data.
Guidelines for Data Validation
Some guidelines for the validation of data are given below:
i) Check-Backs:
Between 10 and 20 per cent of the respondents whose names appear on the questionnaire are telephoned to check that they were in fact interviewed. The respondents are randomly selected from the work of all interviewers involved in the project. The respondent is asked some of the questions raised in the interview, and the answers are checked against those recorded by the interviewer. The respondent is also asked to comment on the manner and behavior of the interviewer.
ii) Review the Questionnaire and Interviewing Instructions:
The questionnaire is checked to make sure that the respondent meets the sample requirements. If the interviewer was told to interview a woman between 20 and 25 years old but the respondent's classification data shows a 40-year old man, that respondent should not have been interviewed.
iii) Evaluate the Reputation of the Interviewers:
Each interviewers call sheet is checked to make sure that the interview was conducted according to the correct sampling procedures.
2) Editing :
This is the second step in data processing. Raw data is subjected to many errors and omissions which can occur during the data collection process. In editing, these errors are corrected so that readers do not gel confused or misled. Editing converts the raw data into a presentable format so that further analysis and interpretations can be done efficiently.
Editing basically comprises of inspecting, correcting. and modifying the unorganized data so that the data become relevant and meaningful. This in tum reduces the confusion which can arise from wrong and inaccurate data. It ensures that there are no omissions and errors, and the data is in readable form, which helps in maintaining the flow of information throughout the process of research.
Essentials of Data Editing
Editing should be done keeping the following essentials in mind:
i) Completeness:
For effective editing, it should be considered that there should be no omissions. Therefore, it is essential that all the questions have been asked and the corresponding responses have been recorded respectively. In case, any data is missing, the researcher either can deduce the missing data based on other data in the questionnaire, or can fill the data by recalling it.
ii) Accuracy:
Data editing should be done keeping in mind, that the data recorded should be accurate. Researchers must check the reliability of answers during the data collection itself, which is not possible every time. Accuracy of the responses can be estimated with the help of 'check questions' included in the questionnaire specifically for important data. "Check questions" lean either directly estimate the ambiguity of the response, or can help the researcher to deduce the correct responses. Researchers can also conclude the response with the help of other related questions in the questionnaire. Sometimes, researchers can also contact the respondents again to get the correct response.
iii) Consistency:
An important consideration while editing the data is maintaining consistency in responses. It should be checked that answers are given in the same manner the questions have been asked. In other words, all the responses for a question should be answered in a similar way by all respondents. Since, the inappropriate and inadequate answers create confusion and wrong interpretation, hence the researchers should make sure that the responses lead to proper conclusions.
Stages of Data Editing
Data editing can be performed in two stages:
i) Field Editing:
Field editing is performed during the data collection. All the responses collected are checked for errors and omissions. When collecting the data. the researchers instantly check for accuracy, uniformity and completeness of the answers. Field editing can be done in two ways:
a) By the Researcher:
Due to shortage of time, researchers while taking interviews, note down the responses in the form of symbols or short notes. After completion of the interview, researcher reviews the answers, corrects them if necessary, and completes the questionnaire by specifying the answers to each question.
b) By the Supervisor:
Another form of field editing is done by the supervisor of the team of interviewers appointed for collecting the data from a sample of respondents. The supervisor maintains quality of data by ensuring that all interviewers complete their task honestly. This is done by checking and reviewing the responses of the interviews and correcting the errors at the first stage itself.
ii) Office/Central Editing:
When all completed forms are brought at the central office, an individual or a team performs the editing activities on these forms. This process is called "office editing' or 'central editing'. Office editing is much accurate than field editing. It is more suitable for mail surveys as field editing cannot be performed in this case, unlike the questionnaires and interview schedules.
3) Coding :
Coding is the process of converting the data into meaningful categories and then assigning symbols to each of these categories. These symbols are known as codes. The categories are formulated in such a way that these are mutually exclusive, i.e., an answer should belong to a single class only. The categories should also have the exhaustiveness, which refers that all the questions must be assigned to certain class and none of them should be left uncategorized. The classes or categories should be uni-dimensional which implies that a category should contain same or similar responses.
For example, while coding the responses the researcher assigns a code '1' to all male respondents and '2' to all the female respondents. It can be seen that this process reduces the entire data of responses into two mutually exclusive classes of 'Male' and 'Female'. Also the logic that governs the assigning of a code to a data is uniform i.e. "Gender". Coding is done to limit the data into a smaller set of finite classes. These can then be tabulated and analyzed. These classes are homogenous in nature. For coding. it is essential to edit the data first, as from the raw data, it would be very difficult to conclude meaning.
Coding includes three basic activities, vis, formulating the categories, allocating answers in those categories, and finally assigning codes to those categories. Since, the responses are reduced to limited categories, it becomes easy to analyze data. The decisions regarding the categorization should be made during constructing the questionnaire to make the data collection easy and effective.
Principles of Data Coding
Coding should be done keeping in mind the following essential principles:
i) Relevance:
This principle of coding implies that a coding class should be constructed according to the objective of the study being undertaken. The categories should be able to contain the data necessary to test hypotheses else formulating categories would become useless.
ii) Exhaustiveness:
This principle states that all raw data should belong to some particular categories. In other words, no data should remain uncategorized in the classes created by the researcher. Researchers often include categories like "Other" to capture data which does not fit in any of the pre-formulated categories. Sometimes, it is possible that new data is entered at later stages of research. In this case, the categories should be able to include the new data also. If maximum data are representing the 'other' category, then there is an error in the categorization process. It can be problematic especially in case of objective type questions.
iii) Mutual Exclusiveness:
This means that a data point should belong to one class only. In other words, none of the classes should have any item in common. For example, if a question is asked -"Which games do you play-Cricket, Football, Badminton, Chess" Many respondents will choose more than 2 options from these. Hence while coding the responses the data will not be mutually exclusive. This problem can be solved by providing by providing operational definitions.
One of the functions of operational definition is to define categories that are made for mutually exclusive elements. In this case operational definitions of the games can be classified under "indoor game" and "outdoor game" which will clarify the responses. To specify the categories in a better way, operational definitions can be written along with the categories. It clarifies the situation and reduces the confusion about categorizing.
iv) Uni-Dimensional:
This principle implies that there should be a single concept for a class. In other words, a category should be defined using a single dimension only. If a category is defined in terms of more than one dimension, then it will not remain mutually exclusive, unless the cells are defined using a combined dimensions. For example, if a data set defines the occupations.as manager, salesman, teacher, engineer, and artist, then locating the unemployed teacher in defined categories is difficult. Hence, it can be located if the categories are defined with combined dimensions like employed teacher, unemployed teacher, employed engineer, etc.
Procedure of Data Coding
Coding is done through the following stages:
i) Identifying Open Coding:
In this stage, the researcher tries to identify the possible open categories to locate the data, Formulating the open categories is done by constantly comparing the data to incidents, and incidents to categories till an optimum, limit-is reached. Generally, the researchers prefer maximum 10 categories, but the number of categories depends upon the information needed to be explored, based upon the research objective.
ii) Axial Coding:
Once, the open category formation has been completed, the researcher moves on to the next stage, i.e., axial coding. In this stage, a coding paradigm is developed. This is done by choosing a particular category, out of the possible open categories, as a "core category". Now, the researcher reviews various aspects of data collection and then forms some categories related to the core category. He tries to identify the phenomenon, the causal factors, the intervening variables. the strategies and the potential limiting factors. This complete information is displayed with the help of blocks and arrows which then helps to develop the coding paradigm.
iii) Selective Coding:
The final stage of coding process is selective coding. In this stage, the researcher develops a theory regarding the coding process by inter-relating the coding categories presented in coding paradigm. The paradigm is refined and presented as a model or theory of the complete process. This theory or model includes propositions that indicate the possible concepts to be explored and tested in future.
4) Classification :
Classification is the process of creating the homogeneous classes into which the edited and coded data can be grouped on the basis of their common characteristics. By classifying the data in different groups, meaningful results can be extracted. The complete data set is divided into predefined groups in such a way that none of the data is left unclassified.
For example, a researcher can classify a group of 100 respondents into smokers (60) and non-smokers (40). By analyzing this classification, it can be noticed that all respondents in the respective categories are either smokers or non-smokers, and hence are homogenous in the sense that they share the habit of smoking or non-smoking.
Principles in Classification of Data
Given principles should be followed for effective classification of data:
i) Unambiguous Classification:
Classifying the data should reduce the ambiguity. This objective can be achieved by formulating the classes that are homogeneous in nature based upon the common characteristics of data. The categories should be defined clearly, so that they do not confuse or mislead the researcher. For example, in the classification of smoking habit. formulating only two classes of "smoker" and 'non-smoker can confuse the researchers to locate the habit of "occasional smoker".
ii) Single Classification Principle:
In classification, a class should be defined based on single dimension at a time. If more than one dimension is taken into account for one category. it may lead to confusion. Hence, it should be kept in mind that a category should contain only one dimension. This helps the researchers to analyze and interpret easily. For example, population can be characterized by many dimensions, such as age, geographic area, gender, occupation, religion, etc. Here, different categories should be formulated for each dimension, such as, age, geographic area, gender, etc., respectively.
iii) Mutually Exclusive Categories:
Each data should belong to one class only. There should be no data which belongs to more than one class.
iv) Mutually Exhaustive Categories:
The entire data should be covered in the defined classification categories. No data should be left out in the classification process and none of the responses should be omitted during classification.
v) Action-Oriented:
Classification of data should be carried-out based on the objectives of the research. Therefore, it implies that classification of data and the number of classes to be formulated are based on the objective of the research under study.
vi) Distinctive Categories:
Sub-categories in a classification should also be distinctive and different enough from each other so that they are suitable for different set of problems.
vii) Relevant for Research Project:
Classification should be done with the aim of studying and analyzing the research problem more accurately. as the classification done without any objective will be useless for the research.
Types of Data Classification
Classification is of the following types:
i) Classification According to Attributes:
Here the classification is done on basis of some common descriptive characteristics or attributes. These are qualitative in nature, as they cannot be measured, but their existence can be observed clearly. The examples of these attributes, are beauty, honesty, truthfulness, etc. The data obtained through this classification are called "statistics of attributes".
Classification by attributes can be done using either one attribute or more than one attribute. In the former case, the classification is done on the basis of one attribute and only two categories are created, one for the data having that attribute and the other that does not have that attribute. This is called "simple classification". In the latter case, the researcher classifies the data based on more than one attribute, and the number of classes would be twice the number of attributes. It is called "manifold classification". It can be represented as,
Total Number of Categories = 2n,
Where, 'n' Number of attributes
ii) Classification According to Class-Intervals:
Data can also be classified on the basis of numerical characteristics. These characteristics are quantitative in nature, and hence are measured with the help of statistical units. Here, the characteristics are classified on the basis of the defined class-intervals. The examples of these attributes are weight, height, age, etc. These characteristics are called as "statistics of variables".
5) Tabulation :
Tabulation (Tabular Representation of Data), is a method of presenting the data in the form of tables so that the results can be interpreted easily. The purpose of tabulation is to present maximum possible information in minimum possible space. In tabulation, the data are represented in a compact form in such a way, that the quality and utility of information is not lost.
Tabulation is the final stage of data processing which provides the data for further analysis. It serves the purpose of presenting the data to the client in an aesthetic and easily understandable format Tabulation is a kind of visual representation that aids the researchers in further analysis and interpretation as well as presentation of data in final stage.
Types of Data Tabulation (Tables)
Tables can be categorized as follows:
i) Based on the Number of Characters Used:
These tables are categorized based on the number of characters used to classify the data, and are of two types:
a) Simple Table or One-Way Table:
In a one-way table, the classification is made on the basis of only one character. Since, it is the simplest form of table, hence it is also called "simple table".
b) Complex Tables:
When a table is prepared on the basis of more than one characteristic, it is known as "complex table". In these tables, more than one classification is represented. In higher level tables both the rows and columns are included. Here, the column represents the category of the item and row represents value of the item.
c) Two-Way Table:
Two-way tables are made with two characteristics to classify the dew. It is also called "double tabulation". Two-way tables are constructed with the help of one-way table. It is much easier to analyze a two- way table compared to one-way table.
d) Three-Way Table:
When a table is constructed with the help of three inter- related characteristics, it is known as "three-way, table" or "table tabulation". By showing three characteristics at a time. a three-way table shows more information regarding a particular phenomenon compared to one-way and two-way tables.
e) Multiple-Way Table:
A multiple-way table is constructed in the same way as one-way, two-way, and three-way tables are constructed. But the difference here is that there are multiple characteristics which are used to tabulate the data. Usually these types of tables are constructed while collecting census data. These are also known as "manifold tables".
ii) Based on the Purpose of Study:
On the basis of purpose, tables are of following types:
a) General Purpose Table:
General purpose tables represent large quantity of data at a 1 time. It acts as storage of huge amount of information in a compact form. It is usually n very big table as it tries to accommodate all the information in a single table. It is also called a "Repository table" or "Reference table" as it helps the researchers in taking references for information. Census table and appendices are some of the examples of general purpose tables.
b) Special Purpose Table:
These tables are comparatively small as less amount of information is represented which are specifically related to the research objective. These tables are also called "summary table" or "text table". These tables help in comparative data analysis related to the specific research problem.
Rules of Data Tabulation
The rule regarding the construction of table and general rules (or precautions or points to be kept in mind) are as follows:
i) Rules Regarding the Table Structure:
a) Table Number:
When a number of tables are constructed, serial numbers of the tables should be given to each table.
b) Title:
The title of the table should be clear and precise indicating "what, where, and when" of the data in that order.
c) Number of Rows and Columns:
The number of rows and columns depends on the nature of the data. Description of rows. columns, sub-rows sub-columns should be well defined. Columns and rows should be numbered when it is desired to facilitate reference to specific parts of a table.
d) Captions and Stubs:
The headings of various columns and rows (often called captions and stubs respectively) should not consist of more lines or wards than the bare 'minimum' required.
e) Ruling and Spacing:
Horizontal and vertical lines should be drawn to separate adjacent rows and columns. Different colors or lines of differed dimensions (ie., light, heavy. etc.) should be used to differentiate sub-rows and sub-columns from the main rows and main columns. The appropriate space must be given between the figures.
f) Size of Columns:
The size (breadth) of the columns should be according to the information to be written in those columns.
g) Arrangement of Items:
The placement of items in the rows or columns should be, if possible, in natural order or according to their importance.
h) Units and Derivatives:
The unit of measurement used should be clearly stated such as prices in rupees', 'population in crore', etc.
i) Explanatory Notes:
To make certain points clear, the explanatory notes (foot notes or head notes) should be incorporated directly in body of the table or under or above the table. If the information gain in the table is not self-explanatory, sufficient information should be given as foot notes.
j) Source:
The source of information must be mentioned just below the table particularly in case of secondary data. Such notes give: a) The name of the person/ institution. b) Place and other details and c) An idea about their reliability.
k) Totals:
The sub-totals for each separate classification and a general total for all combined classes should be given at the bottom and/or right side of the figures whose totals are taken.
ii) General Rules:
a) Attractive Shape:
The table should be neat and attractive. The size of the table should be neither too big nor too small.
b) Simplicity:
A table should be simple and self-explanatory and according to the object of statistical investigation.
c) Place of Approximation:
If approximation is done for the figures written in the table it should be mentioned.
d) Free from Irrelevant Data:
A table must be free from all types of irrelevant data.
e) Use of Circle or Box, etc.:
Certain figures which are to be emphasized should be in distinctive type or in circle, box or between thick lines.
f) Miscellaneous Columns:
A table should have miscellaneous column or remark column for the information which cannot be grouped in the classification mode.
g) Non-availability of the Data:
If some information is not available due to one or other reason a line (-) or not available (N.A.) or a cross (x) should be used, along with a classification of these symbols.
Significance of Data Tabulation
Significance of tabulation can be understood with the help of following points:
i) Simplification of Complicated Data:
The biggest advantage of tabulation is that it simplifies the complicated data and presents it in a comprehensive format for the reader. Tabulated data is easy to understand and interpret. It reduces ambiguity which helps in data analysis and concluding the findings.
ii) Helps in Comparing Data:
Since the data is presented in a very systematic manner in tables, therefore researchers can easily compare data and draw inferences simultaneously. It makes the data analysis comparatively easy to carry out.
iii) Relevant and Unambiguous Presentation:
Tabulation helps the researchers to present the data correctly. It eliminates the repetition and redundancy of the data and shows the relevant information only.
iv) Important for Data Analysis:
Tabulation is the intermediate step between data collection and analysis. Analysis cannot be performed on raw data, which makes it necessary for the data to be tabulated in a systematic and comprehensive manner. Once the data is tabulated, it is possible to proceed for further analysis.
v) Gives a Bird's Eye View:
The data once arranged in a suitable form, gives the condition of the situation at a glance, or gives a bird's eye view.
vi) Gives Overview:
Tabulating the data gives the reader an overview about the data without getting into the details of its collection process. It acts as a summary by providing all information in a very systematic and compact form.
vii) Detecting Missing Data and Omissions:
Tabulation also provides a chance to detect the missing data and omissions. Thus, constructing tables bring greater accuracy in the analysis.
viii) Maximum Representation of Data:
Tabulation reduces the huge size of data by representing it in a minimum possible space. It increases the efficiency of research by facilitating the researcher to draw graphs and charts on the basis of tables.
Advantages of Data Processing
1) Efficient Decision Making:
Data processing enables organizations to gather, organize, and analyze vast amounts of data quickly and accurately. This leads to more informed and data-driven decision making, as decision-makers have access to relevant and timely information.
2) Improved Accuracy:
Automated data processing reduces the risk of human error that may occur during manual data entry and analysis. By using algorithms and automated tools, data processing ensures consistency, accuracy, and reliability of information, leading to higher quality outcomes.
3) Cost Reduction:
Data processing can help organizations optimize their operations and reduce costs. By analyzing data related to processes, resource allocation, and efficiency, companies can identify areas for improvement, eliminate redundancies, and make more informed decisions regarding resource allocation, leading to cost savings.
4) Enhanced Productivity:
Automated data processing eliminates repetitive manual tasks, allowing employees to focus on more value-added activities. This can lead to increased productivity and efficiency, as well as freeing up resources that can be utilized in other critical areas of the organization.
5) Competitive Advantage:
Effective data processing enables organizations to gain a competitive edge in the market. By analyzing data on customer behavior, market trends, and competitor activities, businesses can identify opportunities, develop targeted strategies, and make data-driven decisions that can give them an advantage over their competitors.
6) Improved Customer Insights:
Data processing helps organizations gain a deeper understanding of their customers. By analyzing customer data, such as purchase history, preferences, and feedback, businesses can personalize their offerings, improve customer satisfaction, and enhance the overall customer experience.
7) Risk Mitigation:
Data processing plays a crucial role in risk management. By analyzing historical and real-time data, organizations can identify potential risks and take proactive measures to mitigate them. This can include detecting fraudulent activities, predicting market fluctuations, or identifying operational weaknesses.
8) Innovation and New Opportunities:
Data processing can uncover hidden patterns, correlations, and trends within large datasets, leading to new insights and innovative ideas. By exploring and analyzing data, organizations can discover new business opportunities, develop innovative products and services, and drive growth.
9) Enhanced Planning and Forecasting:
Accurate data processing allows organizations to make more precise forecasts and predictions. By analyzing historical data and using advanced algorithms, businesses can anticipate future trends, demand patterns, and market fluctuations. This enables better resource planning, inventory management, and proactive decision-making.
10) Compliance and Regulatory Requirements:
Data processing helps organizations meet compliance and regulatory requirements. By processing and storing data securely, maintaining data integrity, and providing audit trails, businesses can ensure they adhere to legal and industry-specific regulations, avoiding penalties and reputational damage.
Disadvantages of Data Processing
1) Cost:
Implementing data processing systems can be expensive. It requires significant investment in hardware, software, and skilled personnel. Additionally, the maintenance and upgrade costs can add up over time.
2) Complexity:
Data processing involves complex algorithms, data structures, and programming languages. Developing and maintaining data processing systems requires a high level of expertise. It can be challenging for organizations without sufficient technical knowledge and resources.
3) Data quality issues:
Data processing heavily relies on the quality of input data. If the data is incomplete, inaccurate, or inconsistent, it can lead to faulty results and unreliable insights. Data cleansing and preprocessing become crucial to ensure accurate data processing outcomes.
4) Privacy and Security Risks:
Data processing involves storing and manipulating sensitive information, which increases the risk of privacy breaches and data security threats. Unauthorized access, data leaks, and cyber-attacks are significant concerns. Adequate security measures must be in place to protect the data throughout the processing pipeline.
5) Time-consuming:
Processing large datasets can be time-consuming, particularly when complex algorithms and computations are involved. The time taken for data collection, cleansing, transformation, and analysis can delay decision-making processes.
6) Dependency on technology:
Data processing relies heavily on technology infrastructure and software tools. If there are technical failures, system crashes, or compatibility issues, the processing tasks may be disrupted, causing delays and potential data loss.
7) Bias and interpretation challenges:
Data processing may introduce biases or misinterpretations. The algorithms used in data processing can be influenced by inherent biases in the data itself or the assumptions made during the processing stage. This can result in skewed or inaccurate results, leading to flawed decision-making.
8) Lack of human context:
Data processing primarily focuses on quantitative analysis, often lacking the human context and qualitative insights that can be obtained through human judgment and experience. Human intuition and critical thinking can provide valuable perspectives that may not be captured through automated data processing alone.
Recommended Terms :