Notebook

Basics of R ¶

R is open source programming language used primarily for stastistical analysis. ¶

R can be downloaded free from https://www.r-project.org/ ¶

Its most popular IDE (integrated developemnt environment) is Rstudio (free download https://posit.co/download/rstudio-desktop/).
VS code (free download https://code.visualstudio.com/download) can also be used for jupyter notebook with R kernel.
This document has been prepared via R-kernel based jupyter notebook in VS code

Datatypes in R ¶

Basic Data Types ¶

Numeric: Represents real numbers (e.g., 10.5, 3.14).
Integer: Represents whole numbers (e.g., 10L, -3L). The L suffix explicitly defines an integer.
Logical: Represents Boolean values (TRUE or FALSE).
Character: Represents text or strings (e.g., "Hello", "R").
Complex: Represents complex numbers (e.g., 3 + 2i).
NULL: Represents an empty or null object.
NA: Represents missing or undefined data.
NaN: Represents "Not a Number" (e.g., 0/0).
Inf: Represents infinity (e.g., 1/0).

Data Structures ¶

Vector: A sequence of elements of the same data type â†’ c(1, 2, 3)
List: A collection of elements of different data types â†’ list(1, "a", TRUE)
Matrix: A two-dimensional array of elements of the same data type â†’ matrix(1:6, nrow=2)
Array: A multi-dimensional extension of a matrix â†’ array(1:12, dim=c(2, 3, 2))
Data Frame: A table-like structure where columns can be of different data types â†’ data.frame(name=c("Alice", "Bob"), age=c(25, 30))
Factor: Represents categorical data with fixed levels â†’ factor(c("Male", "Female", "Male"))

Assignment operators war (= Vs <-) ¶

It is commonly followed approach that
<- is the standard assignment operator in R.
= can also be used for assignment, but it's mainly used for naming arguments in functions.
while using in real life scenerio, = assigns the value but <- creates variables further <<- force assigns the value to a variable in the global environment.
Although use of = for assignment is generally not recommended to avoid confusion, it doesnot necessarily create a proble. I find it rather straightforward to use.

In [1]:

x <- 10
y = 20 
print(x)
print(y)

[1] 10
[1] 20

During this session, = is used as assignment variable unless explicitely required for <- or <<-

Variable Names and Assignment ¶

If you are using jupyter notebook, use shift+enter to runt active cell.
If you are using Rstudio, press control+enter to run active line or select the code and press control+enter to run selected code

In [2]:

a=1
b="ram"
c=1.0
print(a)
print(b)
print(c)

[1] 1
[1] "ram"
[1] 1

Beauty of interactive programming is use of print() command is not necessary; still it gives the result.
in Rstudio the out will be in consol window.

In [3]:

a
b
c

'ram'

Lets get started!

In [4]:

#createing vector and operation with it (most common task)
jp=c(3.8,4.5,4.6,4.2,3.9,4.7,4.9)
mt=c(3.7,4.2,4.5,4.0,3.4,4.5,4.7)

It should be noted that any content on a single line after # is treated as a comment and will not be considered part of the code.

In [5]:

#calculation of mean
mean(jp)
mean(mt)

4.37142857142857

4.14285714285714

In [6]:

#calculation of standard deviation
sd(jp)

0.415187851918806

In [7]:

#correlation
cor(jp,mt)

0.959665503616133

In [8]:

#creation of Dataframe from vector
js=data.frame(jp,mt)
js

A data.frame: 7 Ã— 2
jp	mt
<dbl>	<dbl>
3.8	3.7
4.5	4.2
4.6	4.5
4.2	4.0
3.9	3.4
4.7	4.5
4.9	4.7

In [9]:

#summary () summerizes the data. It can be used with vector as well as dataframe
summary(js)

       jp              mt       
 Min.   :3.800   Min.   :3.400  
 1st Qu.:4.050   1st Qu.:3.850  
 Median :4.500   Median :4.200  
 Mean   :4.371   Mean   :4.143  
 3rd Qu.:4.650   3rd Qu.:4.500  
 Max.   :4.900   Max.   :4.700

In [10]:

#individual columns of dataframe can be accessed using dataframename$columnname
js$jp

In [11]:

#individual element of vector can be accessed with vectorname[position]
jp[2] #provides second element of jp

4.5

In [12]:

#head(dataframe) will give first 6 rows of the dataframe
head(js)

A data.frame: 6 Ã— 2
	jp	mt
	<dbl>	<dbl>
1	3.8	3.7
2	4.5	4.2
3	4.6	4.5
4	4.2	4.0
5	3.9	3.4
6	4.7	4.5

In [13]:

#head(dataframe, n) will fetch first n rows instead
head(js,2)

A data.frame: 2 Ã— 2
	jp	mt
	<dbl>	<dbl>
1	3.8	3.7
2	4.5	4.2

Logical operator

syntax	operator name	description
&	and	Returns TRUE if both condition are TRUE else returns FALSE
\|	or	Return TRUE if any condition is True
!	not	Returns TRUE if condition is not TRUE
==	equal	Returns TRUE if both value are equal

| ! | not | Returns TRUE if condition is not TRUE |

In [14]:

x=5

is.numeric(x) #TRUE
is.numeric(x)& x<0 #FALSE
is.numeric(x)& x<10 #TRUE
is.numeric(x)| x<0 #TRUE
!is.character(x) #TRUE

TRUE

FALSE

TRUE

lifehack:: usign control+space while writting code forces IDE to show suggestion if not already showing: for example

In [15]:

isTRUE(FALSE) #obviously

FALSE

In [16]:

# Basic Plotting in R
hhno=c(1,2,3,4,5,6)
age=c(25,32,43,34,52,29)
salary=c(18000,23400,54321,34000,65000,21000)
gender=c('M','F','F','M','F','M')
surveydata=data.frame(hhno,age,salary,gender) #creation of dataframe from vector

#Basic plot
plot(age,salary)

No description has been provided for this image

In [17]:

#read data from exernal sources
dataurl="https://raw.githubusercontent.com/madhuko/madhuko.github.io/refs/heads/main/datasets/R/employee.csv"
employee=read.csv(dataurl)
head(employee)

A data.frame: 6 Ã— 9
	id	gender	educ	jobcat	salary	salbegin	jobtime	prevexp	minority
	<int>	<chr>	<int>	<int>	<int>	<int>	<int>	<int>	<int>
1	1	m	15	3	57000	27000	98	144	0
2	2	m	16	1	40200	18750	98	36	0
3	3	f	12	1	21450	12000	98	381	0
4	4	f	8	1	21900	13200	98	190	0
5	5	m	15	1	45000	21000	98	138	0
6	6	m	15	1	32100	13500	98	67	0

In [18]:

#table() create frequency tables from vectors or factors. It counts the occurrences of each unique value in a dataset.
table(employee$gender)

  f   m 
216 258

In [19]:

# prop.table() gives the proportion of the table
table1=table(employee$gender)
prop.table(table1)

        f         m 
0.4556962 0.5443038

In [20]:

min(employee$salary)
max(employee$salary)
mean(employee$salary)
summary(employee$salary)

15750

135000

34419.5675105485

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  15750   24000   28875   34420   36938  135000

R Basics Part 1