Big Data Issue & Challenge: Lack of Skill, Learn Big data through Open Courseware

Abstract— This paper attempt to offer broad definition of Big data and it’s various characteristics, highlight and differentiate the Data Science and Big Data. The growth of various data formats from big giant companies and governance. This paper’s primary focus is to defines Lack of Big data Skills as issue and challenge, a particular focus of this paper is to give current result of various lacking of skills in Data Science and Big data, as solution this paper offers Online Professional Collage degrees and MOOC(Massive Online Open Course) courses, A particular distinguishing feature of this paper is to define various MOOC Courses and resources that may helps people to become Big Data expert and Data Scientist to overcome the issue of Big data, that is Lacking of various Skills.

I.  INTRODUCTION:

Big data is the data which is expensive to extract, transform, load for decision making in an Enterprise, having challenges
include capture, duration, storage, search, sharing, transfer, analysis and visualization.
“Big Data is any data that is expensive to manage and hard to extract value from” – Michael Franklin (University of
Barkley)
“Big data, have entered into world beyond just static data that collected” – TDWI
“Big data is overwhelming of data, this data has challenges volume of data, unstructured way of data, confidentiality” –
Adobe at TDWI
“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is
doing it, so everyone claims they are doing it.”
–Dan Ariely
“I’m a data janitor. That’s the sexiest job of the 21st century. It’s very flattering, but it’s also a little baffling”
– Josh Wills, a senior director of data science at Cloudera
“Given enough data, everything is statistically significant” – Douglas Merrill

Characteristics of Big Data:

Current Big data usually defined with following 5’v and C.
Volume: The amount / size of data available for processing. The characteristic notify whether data is big data or not.
Variety: The different formats of data in enterprise like sensor data, flat file, xml file, documents , binary data, Relational
database etc.
Velocity: The speed in which data has been generated and processed.
Veracity: The quality and accuracy of data, whether that contains missing or noisy values.
Variability: The inconstancy shown by data at times, thus hampering the process of being able to handle and manage the
data effectively.
Complexity: To process the entire data is complex task as concern with Big data, it should be connected, it should be
correlated for accurate decision making, complexity might face while pre-processing data.

II. DATA SCIENCE

Data Science is the field where data is acquires, processes, manipulates. Data science is combination of Engineering,
Mathematics, Data Warehouse, Data Mining, Database System, Machine Learning, Artificial Intelligence, Algorithm,
Programming, and Statistics. The phenomenon in technology development significantly exposes the staggering growth of
data, as much as growth of data goes much and much more, data science skill requires more to handle that growth and
scale of new types of data from sensor, social media, website logs, click steaming etc.
In other words, data science can be broken down into four essential parts.
Mining Data: Collecting and formatting various types of data based on pattern mechanism.
Statistics: Gathered information must be analyzed.
Interpret: Representation or visualization in the form of Presentation, charts, graphs, reports.
Leverage: Studying Implication of the data, application of data, tools & technologies of data, Interaction and prediction
of data.

<Image 1>

Figure 1. Define: Data Science

“I worry that the Data Scientist role is like mythical “Webmaster” of the 90s: master of all trades.”
-Aaron Kimball, CTO Wibidata.
What data science tells us: [1]
 If you are a DBA, you need to learn to deal with unstructured data.
 If you are a Statistician, you need to learn to deal with data that doesn’t fit in memory.
 If you are a Software Engineer, you need to learn Statistical Modelling and how to communicate with results.
 If you are a Business Analyst, you need to learn about Algorithms and tradeoffs at scale.

III. DIFFERENTIATE BIG DATA AND DATA SCIENCE

Data science is field where different areas such as Engineering, Mathematics, Data Warehouse, Data Mining, Database
System, Machine Learning, Artificial Intelligence, Algorithm, Programming, and Statistics included. Where big data has
modern applications and technologies to manage and process those data, Big Data includes datasets whose size and type
make them impractical to process and analyse with traditional database technologies.
Data science is very impressive field in 21st century, the person who knows above skills is known as “Data Scientist”.
Data Scientist defined as “ A good Scientist understand importance of:
 Their eyes search for Information on the web.
 Algorithmic Strategizing.
 Verctorized operations.
 Have knowledge of latest tools and technologies to handle data.
 Efficient in data mining, statistics, mathematics, artificial intelligence.
“I keep saying that the sexy job in the next 10 years will be statisticians. And I’m not kidding.” – HAL VARIAN, chief
economist at Google.

IV. ISSUE : LACK OF BIG DATA SKILLS

By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as
1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions –
McKinsey Global Institute.
46% of organizations cite inadequate staffing or skills for big data analytics – TDWI Research.
More than three-fourths of 169 executives surveyed say staffing and training issues are the greatest obstacles to making
the most of big data – Ventana Research.
While the majority of executives (58%) believe finding the right technology is the biggest challenge their companies face
in analyzing data, the majority (56%) of IT decision-makers charged with implementing Big Data programs believe
finding the right staff is a bigger challenge than finding the right technology – Avanade [2].
83% of data scientists surveyed felt that new technology would increase the demand for data scientists, and 64% believe
that it will outpace the supply of available talent – EMC [3].
Today there is a shortage of trained Big Data technology experts, in addition to a shortage of analytics experts. This labor
supply constraint will act as an inhibitor of adoption and use of Big Data technologies, and it will also encourage vendors
to deliver Big Data technologies as cloud-based solutions – IDC [4] .

There will be a 24% increase in demand for professionals with management analysis skills over the next eight years. The
need for this specialized talent is being fuelled by an increased use of business analytics by companies to better
understand the explosion of data – U.S. Bureau of Labor Statistics [5] .

<Image2>

Figure 2. Increasing Demand and Lack of Big Data Skills.
Source: http://www.edureka.in/blog/increasing-demand-for-hadoop-and-nosql-skills/

V.  SOLUTION

A. Characteristics of Big Data

Big data is now buzzword and rapid growth of data science already generated big demand of data scientist that diverts
people to pursue data science degree.
Recently, UC Barkley started to offer Online Data science degree with $ 60,000 [6] , In the past few years, as data
science has become the “sexiest” Job of the century, other top universities, like North-western and New York University,
have moved into this area.
MIT also offers online Big data degree (“Tackling the Challenges of Big Data”) with $ 545 [7].
All above online degree cost a lot to people, so people not having really good financial condition cannot afford the high
amount of cost to pursue big data course, and at other side Big data and Data science is sexiest job of 21st century [8] so
everybody wants to become big data expert and data scientist, Gartner Says Big Data Creates Big Jobs: 4.4 Million IT
Jobs Globally to Support Big data by 2015 [9].
So this stage must confuse people a lot that which is best path to do? Solution is Open Couseware (MOOC), researchers
and students believe that MOOC (Massive Open Online Course) is golden boon for Research literacy, Open Courseware
that can help people to become big data expert, Data scientist. As above high cost Online degrees from Berkeley, MIT
requires computer science knowledge, MOOC also requires some sense of Computer Science knowledge.
Who offers Open Courseware programs?
 Edx.org
 Coursera.org
 Udacity.com
etc.

Above 3 are most popular portals to accomplish Open Courseware program from well knowledgeable professors at top
reputed University of world such as Harvard University, Stanford University, MIT University, IIT etc.

B. Most valuable Big Data / Data Science Open Courseware’s

From Udacity.com
1. Intro to Hadoop and MapReduce , How to Process Big Data by Cloudera [10].
2. Machine Learning: Supervised Learning Conversations on Analyzing Data [11].
3. Machine Learning: Unsupervised Learning Conversations on Analyzing Data [12].
4. Machine Learning: Reinforcement Learning Conversations on Analyzing Data [13].
5. Intro to Data Science Learn What It Takes to Become a Data Scientist [14].
From Coursera.org
1. Introduction to Data Science by University of Washington [15].
2. The Caltech-JPL Summer School on Big Data Analytics [16].
3. Big Data Science with the BD2K-LINCS Data Coordination and Integration Center [17].
4. Web Intelligence and Big Data by IIITD and IITD [18].
5. Statistics One by Princeton University, [19].
6. Algorithms: Design and Analysis, Part 1 by Stanford University [20].
7. Machine Learning by Stanford University [21].
8. Probabilistic Graphical Models by Stanford University [22].
From Edx.org
1. Introduction to Big Data with Apache Spark by University of Berkeley [23].
2. Introduction to Linear Models and Matrix Algebra by Harvard University [24].
3. Introduction to Probability – The Science of Uncertainty by University of MIT [25].
4. Introduction to Metrics for Smart Cities by University of SCMT [26].
5. Applications of Linear Algebra Part 1 by Davidson University [27].
6. Advanced Statistics for the Life Sciences by Harvard University [28].
7. Statistics and R for the Life Sciences by Harvard University [29].
8. Introduction to Computational Thinking and Data Science by University of MIT [30].
9. Scalable Machine Learning by University of Berkeley [31].
10. Wiretaps to Big Data: Privacy and Surveillance in the Age of Interconnection by Cornell University [32].

C. Other

1. Data Analysis Learning Path by MySlideRule [33].
2. Learn Data Science by LearnDS [34].
3. Learn R tool by datacamp [35].
4. Learn R tool by Data Science Central [36].
5. Learn R tool by Cyclismo [37].
6. Learn R tool by Code School [38].
7. Learn R tool by SwirlStats [39].

D. Books
1. Data Integration for Dummies a wiley brand by Brian Underdahl, Informatica.
2. The data analytics handbook researchers + academics by Brian Liou.
3. The data analytics handbook ceo’s + managers by Brian Liou.
4. The data analytics handbook data analysts + data scientists by Brian Liou.
5. Big data Imperatives Apress by Soumendra Mohanty
6. Mahout in action for data mining with MapReduce
7. Big Data Imperatives: Enterprise Big data warehouse, BI Implementations and analytics by Soumendra
mohanty, Madhu Jagadeesh, harsha srivatsa.
8. Big Data: Challenges and opportunities by Infosys Labs Briefings.
9. Hadoop, The definitive guide by Tom White, O’reilly.
10. Hadoop in Action by Chuck Lam, Manning.
11. Planning for Big Data, A CIO’s Handbook to the changing data landscape, O’reilly radar team.

E. Youtube Channels
1. TDWI – TDWI is your source for in-depth education and research on all things data.
2. EuroPython – EuroPython isn’t exactly a conference; it’s a chance to hang out with friends that you haven’t even
met yet.
3. EMC Academic Alliance – EMC Academic Alliance-Technology Curriculum
4. edureka! – Edureka provides online training courses for BigData and Hadoop, Hadoop Admin, Cassandra, Data
Science, Cloud Computing, Android Development.
5. CS50 – CS50 is a free online class introducing students to the basics of computer science. CS50 is taught by
David Malan of Harvard University.
6. Cloudera, Inc. – Cloudera Inc. is an American-based software company that provides Apache Hadoop -based
software, support and services, and training to business customers.
7. bipublisher – Oracle BI Publisher is the reporting solution to author, manage, and deliver all your reports and
documents easier and faster than traditional reporting tools.
8. Hortonworks – Hortonworks develops, distributes and supports a 100% open source distribution of Apache
Hadoop for the enterprise, also training, support & services.
9. Tech Gig – India’s Most Passionate Technology Community, Learn & stay updated on your skills, compete with
fellow techies and showcase your expertise to the community.

VII. REFERENCES

[1] Bill hove, University of Washington, [Online]. Available. https://www.coursera.org/course/datasci
[2] [Online]. Available. Ease Big Data Hiring Pain with Cascading | CIO
www.cio.com/article/…/ease-big-data-hiring-pain-with-cascading.html
[3] Intent to Plan: M.S. in Analytics and M.S. in Data Science Dakota State University and South Dakota State
University. [Online]. Available. www.sdbor.edu/theboard/agenda/2013/December/CommA/III_A.pdf
[4] IDC Releases First Worldwide Big Data Technology and Services Market Forecast, Shows Big Data as the Next
Essential Capability and a Foundation for the Intelligent Economy.
[Online]. Available. http://www.businesswire.com/news/home/20120307005036/en/IDC-
Releases-Worldwide-Big-Data-Technology-Services
[5] Analytics Goes to the Head of the Class: Northwestern University News
http://www.northwestern.edu/newscenter/stories/2011/12/ibm-analytics-masters.html
[6] [Online]. Available. http://datascience.berkeley.edu/
[7] [Online].Available. https://mitprofessionalx.edx.org/courses/MITProfessionalX/6.BDX/2T2014/about#overview
[8] [Online]. Available. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
[9] [Online]. Available. http://www.gartner.com/newsroom/id/2207915
[10] [Online]. Available. https://www.udacity.com/course/ud617
[11] [Online]. Available. [11] https://www.udacity.com/course/ud675
[12] [Online]. Available. [12] https://www.udacity.com/course/ud741
[13] [Online]. Available. [13] https://www.udacity.com/course/ud820
[14] [Online]. Available. [14] https://www.udacity.com/course/ud359
[15] [Online]. Available. [15] https://www.coursera.org/course/datasci
[16] [Online]. Available. [16] https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x#.
VNYdJEeUcrI
[17] [Online]. Available. https://www.coursera.org/course/bigdataschool
[18] [Online]. Available. https://www.edx.org/course/introduction-linear-models-matrix-harvardx-ph525-
2x#.VNWKuUeUdOI
[19] [Online]. Available. https://www.coursera.org/course/bd2klincs
[20] [Online]. Available. https://www.coursera.org/course/bigdata

[21] [Online]. Available. https://www.coursera.org/course/stats1
[22] [Online]. Available. https://www.coursera.org/course/algo
[23] [Online]. Available. https://www.coursera.org/course/pgm
[24] [Online]. Available. https://www.edx.org/course/introduction-probability-science-mitx-6-041x-
0#.VNWK30eUdOI
[25] [Online]. Available. https://www.edx.org/course/introduction-metrics-smart-cities-ieeex-scmtx-
1x#.VNWLCEeUdOI
[26] [Online]. Available. https://www.edx.org/course/applications-linear-algebra-part-1-davidsonx-d003x-
1#.VNWLXUeUdOI
[27] [Online]. Available. https://www.edx.org/course/advanced-statistics-life-sciences-harvardx-ph525-
3x#.VNWLh0eUdOI
[28] [Online]. Available. https://www.edx.org/course/statistics-r-life-sciences-harvardx-ph525-
1x#.VNWLiUeUdOI
[29] [Online]. Available. https://www.edx.org/course/introduction-computational-thinking-data-mitx-6-00-
2x-0#.VNWMF0eUdOI
[30] [Online]. Available. https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x
[31] [Online]. Available. https://www.edx.org/course/wiretaps-big-data-privacy-surveillance-cornellx-
engri1280x#.VNWNcEeUcrI
Increasing demand for ‘Hadoop and NOSQL Skills’
[32] [Online]. Available. http://www.edureka.in/blog/increasing-demand-for-hadoop-and-nosql-skills/
[33] [Online]. Available. https://www.mysliderule.com/learning-paths/data-analysis/
[34] [Online]. Available. http://learnds.com/
[35] [Online]. Available. https://www.datacamp.com/
[36] [Online]. Available. http://www.datasciencecentral.com/profiles/blogs/r-tutorial-for-beginners-a-
quick-start-up-kit
[37] [Online]. Available. http://www.cyclismo.org/tutorial/R/
[38] [Online]. Available. http://tryr.codeschool.com/
[39] [Online]. Available. http://swirlstats.com/

 

 

 

Apache Hadoop Installation on Ubuntu

What is Apache Hadoop ?

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

For More: http://hadoop.apache.org/

Hadoop Single Node Cluster Installation

Apache Hadoop Installation Steps

Hadoop is more faster on Linux rather than windows, so we choose Ubuntu as our Hadoop platform.
1. Make sure your Ubuntu OS is installed on machine or on Virtual Machine.
2. Entire Hadoop framework is written in Java so you requires installed Java on your Ubuntu Machine.

2.1 Installing JAVA JDK
Login as root
sudo –s
Install Jave Runtime
sudo apt-get install default-jre
Install Java JDK (OpenJDK v 1.7 or newer)
sudo apt-get update
sudo apt-get install default-jdk
Checking Java verison
Java –version

2.2 Adding Dedicate Hadoop User
sudo addgroup hadoop
sudo adduser –ingroup hadoop hduser

2.3 Install Secure Shell

Install (SSH and RSYNC)
sudo apt-get install ssh
For ensure and Check
which ssh
/usr/bin/ssh
which sshd
/usr/sbin/sshd
sudo apt-get install rsync
Create and Setup SSH Certificates (Setup passphraseless ssh)
To enable password-less login, generate a new SSH key with an empty passphrase:
Use Hadoop User:
ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

OR

su hduser
ssh-keygen -t rsa -P “”
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

start by

ssh localhost

2.4 Install Hadoop

Fetch Hadoop Mirror
wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-
2.6.0/hadoop-2.6.0.tar.gz
Extract Compressed file
tar xvzf hadoop-2.6.0.tar.gz
Move Compressed data to local(Running Directory)
mv hadoop-2.6.0 /usr/local/hadoop
sudo chown hduser:hadoop -R /usr/local/hadoop

2.4 Edition in Hadoop Configuration files

Make sure that at your Operating System, JAVA_HOME environment variable is set.
Ex. Ref. export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
and HADOOP_INSTALL should be your hadoop root directory path.
Ex. Ref. export HADOOP_INSTALL=usr/local/hadoop/hadoop_2.6_x
You should have to do changes in following files
I.
~/.bashrc
II.
/usr/local/hadoop/etc/hadoop/hadoop-env.sh
III.
/usr/local/hadoop/etc/hadoop/core-site.xml
IV.
/usr/local/hadoop/etc/hadoop/yarn-site.xml
V.
/usr/local/hadoop/etc/hadoop/mapred-site.xml
VI.
/usr/local/hadoop/etc/hadoop/hdfs-site.xml
For editing the Apache Hadoop Configuration files, you need any text-editor such as vi, pico,
nano etc. over here , we are using nano as text editor.

1. ~/.bashrc
nano ~/.bashrc
Add the following lines of code at the end of the configuration file.

#APACHE HADOOP ENVIRONMENT VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=usr/local/hadoop/hadoop_2.6_x
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_INSTALL/lib”
#APACHE HADOOP ENVIRONMENT VARIABLES END

Save the file when exit via CTRL + X.
Refresh the file environment with command

source ~/.bashrc

Now, Best practice is to go at /usr/local/hadoop_2.6.x/etc/hadoop/ because all the configuration files are located over here, so you can modify from here.

<Image 1>

Figure 1: Hadoop File Configuration location

2. hadoop-env.sh

nano hadoop-env.sh
Update the JAVA_HOME= /usr/lib/jvm/java-7-openjdk-amd64
Save and quit it.

3. core-site.xml

nano core-site.xml

Add following block inside configuration tag
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Save and Quit it

4. yarn-site.xml

nano yarn-site.xml

Add following block inside configuration tag
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Save and Quit it

5. mapred-site.xml

Copy the template
cp mapred-site.xml.template
mapred-site.xml
Modify the file
nano mapred-site.xml
Add the following block to the configuration tag
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Save and quit the file

6. hdfs-site.xml

Create the two following directories for hadoop storage
mkdir -p /usr/local/hadoop_store/hdfs/namenode
mkdir -p /usr/local/hadoop_store/hdfs/datanode
Modify the file
nano hdfs-site.xml
Add the following block to the configuration tag
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
Save and quite

2.5 Change the folders permission

Set your Hadoop User as Owner of the folders
sudo chown hadoopuser – R /usr/local/hadoop
sudo chown hadoopuser – R /usr/local/hadoop_store
Grant the full permission access to both of the directories
sudo chmod -R 777 /usr/local/hadoop
sudo chmod -R 777 /usr/local/hadoop_store

2.5.1 Format the New Hadoop File System Use only hadoop user not super Admin

hdfs namenode –format

2.5.2  Start hadoop
Use only hadoop user not super Admin

start-dfs.sh
start-yarn.sh

OR
to run all
start-all.sh

2.5.3  Web View

ResourceManager – http://localhost:8088/
NameNode – http://localhost:50070/

Download PDF Version : https://goo.gl/0JN6S5

 

Opportunities in Machine Learning, Deep Neural Network, Data Mining, Data Science

When all big companies are talking about AI (Artificial Intelligence) based products why not students should get final semester internship in AI? This is the question we will address in this post.

Following applications may help you relate yourself to AI when it comes to applications: Computer Vision, Deep Learning, Image / Video Processing, Neural Network, Data Analytics and Machine Learning.

Some use cases:

  1. Capture the picture of a cloth and the app will give you the shopping websites where it is available!
  2. Capture the picture of the foot, the shopping app automatically measure the size of your foot and recommend you a product!
  3. Upload your full image and choose from different dresses, automation app will show display how dress will fit on you and how cool you will with selected dress!
    Now, you can change different parts of the dress, merge some parts with other dresses. You can customize one dress from five dresses! After customization, application will show other recommendation dresses based on your customized dress!

Why e-commerce companies such as Flipkart, Jabong etc. still lack this feature! Why should not you start research and development in this area?

Well, you should. There are immense opportunities in the area. This may give lend a great job offer or even funding!

Following companies are working with same area:

Oops! Machine Learning, Neural Network, Deep Learning, Computer Vision are 40-50 years old technologies! Why now suddenly people have started talking about it?

Ans: Previously, high configuration hardware was not available, current prices of hardware have decreased and it is easily available. For example, if you want 100 GB RAM and good processor, you can go through AWS to get machine on rent and use it via cloud and that’s it! Now GPU helps a lot to process very large batch jobs. Don’t worry to setup everything, for experiment and research purpose you can use 1) Nvidia API  2) Deep Neural Network Library.

That’s it ! Amazing ! Cool !

93% of data is generated in last two years, so obviously professionals in data analytics, ETL, data cleaning, data prediction will be in high demand and if you mix up these and if you can automate data mining with machine learning then it will be the sexiest product!

Previously, great tools were not available such as NoSQL, Apache Ecosystem (Hadoop, Spark, Kafka, etc.). Now open source technologies provide you ready libraries to implement in your product.

IoT (Internet of Things) came, everybody is using censors in cars, wearable, healthcare, in-house sensors (for example, if person is under the fan then it runs otherwise it stops) so it can save energy.

Cloud technologies are easily available, e.g. you can connect Arduino with Google Cloud, you can get data of Arduino to Google Cloud, once you get data you can do everything that you want, doing prediction, doing regression, cleaning data, plotting data , representing data in charts, storing data to a database, using that data to send alert to user via mobile, email have become easily available.

Que: Dude! People just talk about these technologies, I haven’t seen any demo product in India?
Ans: Nope! People are not only talking about these technologies but actually working and competing with companies like Google – NEST and Samsung SmartThings. Leaf Technology LLC, start-up by IIT-B have made their own smart house product called “Air”.

Que: Holy cow!  I want to learn IoT (Internet of Things) at any cost! What should I do?
Ans: Learn about this with Open Course Ware (MIT-OCW), do research and build a product! Nurture ideas! Implement in your product! And oh! you are lucky! IIT Bombay is going to host “Asia’s largest Techfest” at their campus, and there will be “International IoT Conference”.

Que: Can we directly start to learn all above technologies?
Ans: Yes but you should know somewhat of Linear Algebra, Calculus etc.
Que: Oh! Shit. This is what we have learned in Sem-2 (Advance Mathematics), Sem-4 (Statistics)? Don.t
Ans: Yes but we really didn’t care about this subjects back than and we were not aware that this can be very foundation of our career.

Que: But now what?

Ans: No worries! World is always open, learn from OCW.
Linear Algebra – Foundations to Frontiers    URL
Applications of Linear Algebra Part 1  URL
Applications of Linear Algebra Part 2  URL

Yeah! super-cool! We have completed this course, sounds like now we can solve real life problems, and make great products with new novel ideas!

Now I know Programming + Linear Algebra , I am feeling like “Hacker” !

hacker

Que: I have learned Linear Algebra, Data Mining (Logistic Regression, Logistic Classification, Clustering, Recommendation, Document Retrieval, Deep Neural Network) and I also know programming and Android too. Now?
Ans: Awesome mate! Now you are superman! Now you can make your app using all above methodologies and help community by making the sexiest product!

Que: Dude! We have got a lot traffic, our app is unable to handle that much data ?
Ans: Did you forget? it is big data problem! scalability issues. You need to build large scale machine learning algorithms to analyze the data.
Que: It looks confusing !
Ans: Use GraphLab (Now Dato – Free for Education), Apache Flink, Apache Spark – MLlib, H2o.ai (Former oxdata).

You can also use machine learning as services such as:

1. IBM Watson Developer Cloud
2. Amazon Machine Learning
3. Microsoft Azure Machine Learning
4. Salesforce Apex Machine Learning
5. WSO2 Machine Learning

Really marvelous , super cool !

Are Yar… We are bored at Kachchh – Gujarat, where we are using great technologies and nobody else knew what we were doing, how can we share our novel ideas with other, and vice a versa.

No problem! There are all these conferences and workshops where you meet other people who are working in your area:

  1. NCVPRIPG-2015 (National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics) from Dec 16-19 at IIT Patna, India. URL
  2. IEEE Workshop on Computational Intelligence: Theories, Applications and future directions from Dec 14-17 at IIT Kanpur.
  3. CVIP-2016 (International conference on computer vision and image processing) from Feb 26-28, 2016 at IIT Roorkee, India URL
  4. iKDD: ACM India special interest group on knowledge discovery and data mining from March 13-16,2016 at Pune. URL
  5. Xerox Research Centre India (XRCI), January 21-22, 2016
  6. http://fire.irsi.res.in/fire/home
  7. Forum for Information Retrieval Evaluation
  8. Agile India 2016 | 14 – 21 March 2016 | Chancery Pavilion, Bangalore

Kind Suggestion: Don’t forgot your alumni shared  his experience, knowledge with you. Now it’s your turn ! To share your experience and guide your juniors – freshman students.   

Go ahead, take a step!

Que: What ?
Ans: may I have to answer this too ! Obviously no.
But let’s answer this too.
1. Make a group of students with different expertise in areas from above.
2. Host meet-up at our Department of CS – Kachchh Uni, invite other freshman students too.
3. Host training session / workshops for other freshman students, also include tutorial sections.
4. And that’s it! This is entire cycle.
5. Have fun !!

Oh ! Super-cool, Now we are feeling like Hero !

hero

India: Major Intangible Achievements

Independence Day Greetings

While there are endless negative discussions on what India COULD do or what has NOT been done, people generally shy away from talking of India’s major achievements in intangible forms. In this post, I try to highlight four such issues in which India has achieved, in my opinion, enormous successes:

Education: I went to see the Harrappan ruins site at Dholavira sometime in 2010. Unknowingly we drove past the main site and went on to reach a small hamlet with less than ten families living there in small huts. The road ended there because beyond that point Rann of Kutch starts. I was astonished to see a cement concrete building there with only one room! My astonishment turned into pride when I realized that it was a school! We don’t know if anyone from that school will make it to the top but the wheels have already been set in motion!

Equality of Opportunity: Our society has been an ill-organized one. We are known to discriminate people on the basis of birth. We judge people by their caste and not by the content of their character! But among all these chaos we have some mysterious order which provides opportunities to people to do tasks which they were not, from the point of view of vicious racists, born to do! There was a man born into a very poor Muslim family living in a small town on southern coast. In his childhood he sold newspapers for a living. He was simply not born to do great things, he had no resources to do that, but he became an eminent scientist and later the President of India, and that too the most loved one! Yes, I am talking about Dr. Kalam!

Faith in Democracy: I was deputed in a sensitive village of Shikarpur (Shikārpur) in Rapar (Kutch) tehsil during 2013 assembly elections. My booth was guarded by a platoon of Border Security Force armed with Beretta rifles. The commanding officer of the platoon and I both expected some difficulties during the polling. Early morning on the poll day we ran out of forms which are required to be filled-up when some blind voter wants to cast the vote. At least a hundred voters with age above sixty-five turned up for voting that day who could not see properly. While the suspecting minds may say it may have been forced voting, I observed that it was their faith in the system that led them from bed to the booth!

Tolerance: At least 80 districts across 10 states are affected by Naxalite-Maoist insurgency. These insurgent organizations are declared terrorist organizations by the government. But surprisingly, the government has never done anything more than selective actions to stop them by force! India is a military superpower but the nation’s vow to not to use the power against these organizations shows its tolerance level. A majority of people regards Naxals as misguided people and in the words of Ex-Home minister Shivraj Patil “they are our children.”

Happy Independence Day!

(Image: Santa Banta Design Lab)

Shaping a Career – II

In the previous post, we discussed about the dilemma of jobs and the basic essentials of a job. Let’s talk about another important topic here: What to do if you land in a non-programming job?

As a computer science graduate, people often ask you for help in issues ranging from their PC running slow to how to download an app in Android! But this is social life, what if it happens to you in professional life?

As a computer science student, you had learnt programming languages like Java, C, C++ and many others along with other important subjects like databases, web designing and what not! Now, after graduating, what happens when they make you sit in an office to do calculations in Excel and apply special effects in PowerPoint slides! How to handle it? How to utilize your talent even in a situation that is totally different from your imagination?

So, it’s quite possible and happen in many cases, that after graduating you may have a non-programming job. In such cases, my first advice is to try to adjust with and adapt to the environment. Every job needs some basic virtues in: Morality, Honesty, Integrity and Dedication towards the job. If you have these virtues other things come sooner or later. So adapt these qualities first! Focus on other skills required for that particular job. For example, if you work as a lecturer, you must improve your communication skills.

One cannot to be great in every kind of work, there is a single field in which one can master. If you ask any sportsman to handle economy or manage national budget, he can’t! An economist can’t play like a sportsman! But at the same time we see that sportsman can manage his own finances and the economist can play a game for fun. Same applies to you.

You can always achieve in any profession, but it needs time. You have to struggle, you have to survive, you have to deal with the obstacles! When you are doing a non-programming job, the biggest problem you face, is to adapt to the job. Secondly, you can show that you are a special asset for the company as you are a computer expert and how you can be of help given the opportunity.

So if you are doing any non-programming job, enjoy it while you are looking forward to better chances!

“People will love you. People will hate you. And none of it will have anything to do with you” – Abraham Hicks

Shaping a Career – I

I would like to talk to you about the most important thing for which you study. Theoretically it is knowledge, you study to gain knowledge, but in practical scenario, we all are studying to earn income by doing job or business.

All of you are on edge of completion of your education and going to start your professional career in a year or two. I would like to take this opportunity to share the picture of today’s corporate and government world and bust some myths you have in your mind.

First and foremost thing is there is no perfect job. While selecting job there are three parameters: First: Place of Job, second: Work Environment and third: Salary.

It is very very tough to achieve all three aspects in your job. But you have to select what you want from your life.

This decision is tricky one! For example, many of you want a job only in programming field but you are not willing to go to Ahmedabad or vice-versa. So you need to decide your commitment and thought process.  When you select job you are committed to be there for at least 8-10 years. If not with same organization but in same field. You can’t switch over your field every 4-5 years. So career is where you are going to utilize your prime years from age of 25 to 60. So for 35 years and more then 8 hours a day you are going to do that work. That is the reason you require to focus more and try to select best career for yourself.

In the next blog I will write about Government Job v/s Private Job. Thank you for reading me.

“Sometimes there is no next time, no time-outs, no second chances. Sometimes Now or Never” – Alan Bannet

A Dive in Python

Hello guys! Its a great pleasure to write my first blog post on CS Voice. In this post I introduce the most popular programming language, the language that I use regularly in my life as a programmer, Python.

As Wikipedia says “Python is a widely used general-purpose, high-level programming language.” As compared to system languages like Java, .Net and C++, I found Python very easy and handy language. It is open source, interpreted, interactive and object-oriented.

Python is used in some of the largest Internet sites on earth like Reddit, Dropbox and Youtube. The popular Python web framework Django powers both Instagram and Pinterest! Large organizations that make use of Python include Google, Yahoo!, CERN, NASA and Yup! It is widely used in scientific research and engineering applications too.

Currently Python is on version 3.x, if you are using Linux OS such as Ubuntu (my personal favorite) you need not to install anything! It is already there, but for Windows OS you need to download and install it externally.

To understand how easy python is look at the following Hello World program and compare it with your C, C++ or Java program:

print “Hello World”

Yes! It takes only one line! Isn’t it so simple? Short and easy to read and understand that’s what Python is! Famous IDEs to write code in python are pycharm, VIM, Sublime, Eclipse.

Python is used in our neighbourhoods as well, companies who work on Python in Gujarat are IQR Consultancy, Biztech IT Consultancy, BrowseInfo, N-Tech Technologies, Inextrix Technologies, Odoo India and many more.

In the upcoming article I shall talk more about Python! Meanwhile, go Google it!

Please give your feedbacks and also do write something on this blog!

Hello world!

Welcome to CS Voice! This blog is created to serve as a platform for students, faculties and guests to voice their thoughts under a common platform. We will be regularly posting on this blog, addressing variety of topics, ranging from programming tutorials, technical how-tos, informative notes and inspiring stories!

Authors can send their posts for publishing by email to cs.voice@kutchuni.edu.in. Please see author guidelines before sending a post!