R to Python: Just the basics

tidyverse

python

This is me learning the snake language

Published

Jul 23, 2025

The job hunt is in full effect and I am pushing myself to do more and more things in Python out of fear that that there will be no R option at any future job. I feel as if I have a relatively okay handle on Polars and am comfortable enough to do data cleaning when prompted. However, one of the pain points with me is that I generally have a low level of understanding of various things in Python. I am going to use this blog post as a living notebook on Python. I am bascially just going to work through the ‘Zero to Python Textbook’ and as much as possible translate it to equivalent R syntax. Right now when writing in Python I first have to translate things to R and then back again to make sure to make it make sense.

Operators

So there are only so many ways to compare things in both languages or combine things. For the most part these are very much the same. However there are some key differences to be aware of. For

R
Python

# polynomial 
3^2

[1] 9

## floor division 
5 %/% 2

[1] 2

## remainder division

5 %% 2

[1] 1

# polynomial 
3**2

# floor division 
5//2

# remainder division 

5 % 2

Assignments

Object assignment definitely differs between both languages as an R user I am a bit of an anomly and use = for assignment in R so that was never that big of a transition for me. There are a variety of assignment operators that appear in the wild that I have really only use but don’t know the uses for them. The most common one is += but you can do the same thing with any other mathematical operation. Basically the left part will tell python to do math stuff and then assign it so lets look at that

x = 1 

x+= 1 

x *= 2 

x**=2

x /= 2

x //= 2

x

4.0

The streets say that the := or walrus operator was somewhat of a controversial inclusion which seems weird but basically you are expressesing a value while also assigning it

x = (y := 2) + 2**2

x

Comparisions

Python has the same basic comparision operators just using words rather than %operator% however

Python
R

1 == 1

True

1 != 1

False

1 in range(1, 10)

True

12 not in range(1,10)

True

1 == 1

[1] TRUE

1 != 1

[1] FALSE

1 %in% 1:10

[1] TRUE

!12 %in% 1:10

[1] TRUE

Loops

One of the things that is incredibly infuriating with python for me is that the indentation thing never made anysense. Why should we care about it other than making our code look pretty. The problem is that python uses indentation to denote code blocks in lieu of using {} like R or some general purpose programming languages like Java. So when we are looping over things in R we do

for(i in 1:5){

print(i^2)
 
}

[1] 1
[1] 4
[1] 9
[1] 16
[1] 25

This works becuase R isn’t relying on the spacing to tell it what code is in the loop. Whereas in python if we did

for i in range(1,5):
print(i**2)

expected an indented block after 'for' statement on line 1 (<string>, line 2)

We get an error. So if we make it a little bit more complicated we are using spacing to tell python what is inside a loop, function definition, etc

for i in range(1,6):
    i = i * 3
    if i%2 == 0:
        print(f'{i} is even')
    else:
        print(f'{i} is odd')

3 is odd
6 is even
9 is odd
12 is even
15 is odd

One thing I have never really understood is when and how to use else if so we should learn how to do this. I think if’s are fairly straight forward if something is true do it. Same with else if the if statment isn’t met doing something else. The best way I can explain it to myself is that else if is colloquilly more equivelent to or if

vals = [1,3,4,5,56,7,78,9,7]

for i in vals:
    if i % 2 == 0:
        print(f'{i} is even')
    elif i == 5:
        print(f'{i} triggered the elif')
    else:
        print(f'{i} is odd')

1 is odd
3 is odd
4 is even
5 triggered the elif
56 is even
7 is odd
78 is even
9 is odd
7 is odd

Data types

So there are two kinds of data types primitives and containers

Table 1

Data Type	Explainer	Example
Primitives	Not divisible. So you cannot make a numeric data type into a smaller unit	2 is a numeric
Containers	Divisible so you can break off elements into smaller bits	['hello', 'world']: is a list

So to make the example for Table 1 a bit more concrete we can just define a simple list with a mix of primitive data types and just index the list.

examp_list = ['Hello World', 2332342342342342] 

type(examp_list[0])

<class 'str'>

type(examp_list[1])

<class 'int'>

I am going to skip the section on the various types of primitive data types because they are kind of the same in R. The one exception is that base R doesn’t come with f-strings basked in out of the box. Otherwise there are only so many ways to bake the primitive data types. So I will more of my time on containers.

Tuples and Lists

So a tuple is a container that stores an ordered collection of items of the same or different primitives but is not mutable. So lets define a tuple and a list. Both have the same indexing syntax so you can index do regular and negative indexing.

tp = (1, '2', 3, 2*2)

type(tp)

<class 'tuple'>

lt = [1,'2', 3, 2 *2 ]

type(lt)

<class 'list'>

print('This is the first element of the  tuple', tp[0], 'this is the first element of the  list', lt[0])

This is the first element of the  tuple 1 this is the first element of the  list 1

print('this is the last element of the tuple', tp[-1], 'this is the last element of the list', lt[-1])

this is the last element of the tuple 4 this is the last element of the list 4

Additionally you can use slicing to grab a range of elements. One thing that feels weird as an R user is you can some interesting things like example 2

slice_one = lt[1:4]

slice_two = lt[1:2:4]

print(slice_two)

['2']

However, one of the major differences is if we wanted to change the underlying object. You can change the elements of a list but you cannot change the elements of a tuple

tp[1] = 2

TypeError: 'tuple' object does not support item assignment

lt[1] = 2

lt

[1, 2, 3, 4]

So this will update the second element of the this. If we wanted to add things to a list we can simply do


lt.append(5)

If we wanted to remove items from a list we would simply do


del lt[0]

The interesting thing about python is that lists are not neccessarily equivelent as vectors in R but we can do stuff we would normally would do with vectors

R
Python

vec_one = c(1:10)
vec_two = c(11:20)


sum(c(vec_one, vec_two))

[1] 210

vec_one = list(range(1,11))

vec_two = list(range(11,21))

sum(vec_one + vec_two)

Dictionaries

Dictionaries in Python hold key values pairs. Which as an R user is a little bit foreign since we don’t neccessarily have something that is exactly equivelent. The closest equivelent I could think of would be a named list or a named vector. But that isn’t neccessarily the same thing. One of the nice things about dicts is that you can reference things by the key, but something that is a bit weird is that you can’t really do it by index position. This is likely for a good reason, but just not someting I am used to. However, if you wanted like the first element of the first key you would just index it like a list since well the value of it is a list.

my_dict = {'fruits':['apples', 'pears'], 'numbers':[1,2,3]}

my_dict['fruits']

['apples', 'pears']

my_dict['numbers']

[1, 2, 3]

my_dict['fruits'][0]

'apples'

So this definetly matters when we go and thing about iterating things. Since we have to use different syntaxes. So if you wanted to print out the all the items in a list then you could do this.

for i in lt:
    print(i)

however in a dictionary you only get the keys and not the values which was what I was looking for. You would have to do something like this.

for key, value in my_dict.items():
    print(key, value)

fruits ['apples', 'pears']
numbers [1, 2, 3]

This also matters when you want to add things or delete things. If we did something like this we are just overwriting the existing dictionary.


my_dict['fruits'] = 'mango'

my_dict['numbers'] = 100

If we wanted to actually add things without overwriting an existing dictionary you have lots of options which I will cover in the next sections. However we can start adding new key value pairs like this

my_dict['Cities'] = ['Atlanta','New York City', 'San Francisco']


my_dict

{'fruits': 'mango', 'numbers': 100, 'Cities': ['Atlanta', 'New York City', 'San Francisco']}

You can also update the dictionary using update

my_dict.update({'States': ['Georgia', 'New York', 'California']})

print(my_dict)

{'fruits': 'mango', 'numbers': 100, 'Cities': ['Atlanta', 'New York City', 'San Francisco'], 'States': ['Georgia', 'New York', 'California']}

More efficient appending

List Compression

I am skipping ahead a little bit but I wanted to learn this since I have only ever implemented but don’t have a full understanding of what is going on and when and why to use it. So lets say I wanted to make a new list and fill it with its square. The R user in me would do something like this

numbs <- list(1, 2, 3, 4, 5,6,7,8,9,10)


numbs = lapply(numbs, \(x) x * x)

numbs

[[1]]
[1] 1

[[2]]
[1] 4

[[3]]
[1] 9

[[4]]
[1] 16

[[5]]
[1] 25

[[6]]
[1] 36

[[7]]
[1] 49

[[8]]
[1] 64

[[9]]
[1] 81

[[10]]
[1] 100

You could do a similar thing in python.

numbs = []

for i in range(10):
    numbs.append(i * i)


numbs

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

One of the problems that you can run into is that for a lot of stuff growing a list can take awhile. In R that is why we tend to prefer using functions along with lapply or map over for loops. So if we wanted to convert some temperatures from farenheit to celsius we would generally prefer to write a function and then apply it to a list rather than use a for loop to do this. Python has a few more tricks up its sleeve to accomplish this. If we wanted a straight forward translation from the tidyverse to python we could do.

def temp_converter(temp):
     return (temp-32) *5/9


temp_list = [32, 212, 100]


c = map(temp_converter, temp_list)

list(c)

[0.0, 100.0, 37.77777777777778]

this is totally fine! But there are some unncessary intermediate steps and really just more me trying to force it into my tiny little functional programming mind. Instead we can use list comprehesion to speed this process up and is more in line with python.

c = [temp_converter(temp_list) for temp_list in temp_list]

c

[0.0, 100.0, 37.77777777777778]

One of the benefits of this is that you can add control flows to really quickly and really flexible change elements of a list. So lets say that some celcius that leaks into our little list. Normally we would want to address this leakage, but for pedagocical purposes lets just add control flows.

temp_list = [32, 212, 100, 0] 


c = [temp_converter(temp_list) for temp_list in temp_list if temp_list not in [0,100]]


c

[0.0, 100.0]

Updating dictionaries

So in the last section we learned that updating dictionaries is a bit more delicate. One big thing that you have to keep in mind is the types within the dictionary. So our dictionary is really just two little lists with a dictionary trench coat. So we have to use list appending. So lets do that.


new_dict = {'fruits': ['watermelons', 'strawberries'],  'numbers' : [4,5,6]}

new_dict['fruits'].append('cherry')

new_dict['numbers'].append(7)

This is fine but not neccessarily the most efficient way to do things outside of canned examples. The most likely case is that we have a new dictionary to help us update things.


update_dict = {'fruits': ['mangos', 'rasberry', 'jackfruit'], 'numbers':[2,4,5,6,7]}

new_dict['fruits'].extend(update_dict['fruits'])

Since our dictionaries hold lists we could also theoretically use list comprehesion like this.

new_dict['numbers'] = [temp_converter(num) for num in new_dict['numbers']]

new_dict

{'fruits': ['watermelons', 'strawberries', 'cherry', 'mangos', 'rasberry', 'jackfruit'], 'numbers': [-15.555555555555555, -15.0, -14.444444444444445, -13.88888888888889]}

We can also combine dictionaries using the | operator

josh_vals = {'Name': 'Josh Allen', 'Location' : 'Atlanta', 'job': 'Grad Student'}

georgia_vals = {'Nickname': 'Peach State', 'mascot': 'White Tailed Dear', 'Power 5 Schools': ['UGA', 'Georgia Tech']}

josh_vals | georgia_vals

{'Name': 'Josh Allen', 'Location': 'Atlanta', 'job': 'Grad Student', 'Nickname': 'Peach State', 'mascot': 'White Tailed Dear', 'Power 5 Schools': ['UGA', 'Georgia Tech']}

We can also modify in place using a special operator

josh_vals |= georgia_vals

If we wanted to keep it within a Python framework we can use something called dictionary compression were we can do something like this


squares_dict = {i: i**2 for i in range(1,11)}

Or if we had somethign like this

my_dict = {'United States' : 'Washington DC', 'France' : 'Paris', 'Italy' : 'Rome'}

{capital: country for country, capital in my_dict.items()}

{'Washington DC': 'United States', 'Paris': 'France', 'Rome': 'Italy'}

Object Orientation

So we went into like basic basics. However, now we are going to spend some time on object orientation. One thing that is really confusing right now is that a lot of stuff that people end up doing in python is that they use object oriented programming to do the heavy lifting. While R has this it is a lot more common in replication files to either code literally everything in various scripts or use a more functional oriented approach and targets. The “problem” is that when we transfer this approach over to Python is clunky and you are going to look like a weirdo to your colleagues.

Creating Objects

So there are reserved stuff in Python that denote the creation of an object.

dt = pl.DataFrame({'': ['__new__', '__init__', 'classes', 'instances'], 'What it Does': ['Responsible for creating instances of a class. Takes class as the first argument and any other arguments passed to the class constructor', 'Responsible for initializing the state of the new object, and to perform any other neccessary setup', 'defines functions called methods which outline behavior and actions that an object created from the class can perform with its data', 'an object that is built from a class and contains real data']})


GT(dt)

	What it Does
__new__	Responsible for creating instances of a class. Takes class as the first argument and any other arguments passed to the class constructor
__init__	Responsible for initializing the state of the new object, and to perform any other neccessary setup
classes	defines functions called methods which outline behavior and actions that an object created from the class can perform with its data
instances	an object that is built from a class and contains real data

This is a bit dense to get through without examples. So lets create some objects. We can create a list of attributes that define a dog. We could define a dictionary like this


dog = {'name': 'Spot', 'age': '7', 'breed': 'Husky', 'color': ['Black', 'White']}

This is for sure fine for simple applications but is a little cumbersome as our code base grows. Defining classes allows us to good comparisions while allowing us a lot more flexibility. So lets start by defining a class

1class Dog:
2    def __init__(self, name, age, breed,  color):
3        self.name = name
        self.age = age
        self.breed = breed
        self.color = color

1: Defines the name of the class
2: Defines the attributes of the class
3: Creates an attribute that assigns the value of name to the parameter

The self.blah bits created in __init__ are attributes or as they are formally called instances that we can add.

mel = Dog(name = 'Melonie', age = 6, breed = 'Mixed', color = ['Brown', 'White'])

mel.age

Now we can update the values of mel dynamically.

mel.age = 6 * 7

mel.age

We can now make the dog class a little bit more robust by adding class methods by definining little functions inside of the class.


class Dog:
    latin_name = 'Canis familiaris'
    def __init__(self, name: str, age: int):
        self.name = name
        self.age = age
    def description(self):
        return f'{self.name} is {self.age} years old'
    def dog_year(self):
        return f'{self.name} is {self.age * 7} in dog years'
    def tricks(self, tricks):
        return f'{self.name} knows {tricks}'

We can also make child classes which inherit properties and methods from parent classes. There are a lot of different kinds of inheritances. At a super technical level they definitely differ in important ways but from a mile high view they all come down to how many classes the child class can inherit from.

class Dog: 
    def __init__(self, name):
        self.name = name
    def display_name(self):
        print(f"This dog's name is {self.name}")
class labs(Dog): # single inheritance 
    def sound(self):
        print("labs are sweet but a little dopey")

class guide_dog(labs): # mulitlevel inheritance guide_dog inherits the lab class which also inherits 
                        # the dog class
    def guide(self):
        print(f"Labs are really trainable and make great guide dog. {self.name} will be a great guide dog")

class friendly:
    def greet(self):
        print('Most dogs are friendly')

lab = labs(name = 'Air Bud')


lab.display_name()

This dog's name is Air Bud

guide = guide_dog('Vision')

guide.display_name()

This dog's name is Vision

guide.guide()

Labs are really trainable and make great guide dog. Vision will be a great guide dog

Object oriented programming in python is a lot easier than doing stuff with pure functions. However, without guard rails than we aren’t practicing it safely.

class Dog:
    latin_name = 'Canis familiaris'
    def __init__(self, name: str, age: int):
        self.name = name
        self.age = age
    def description(self):
        return f'{self.name} is {self.age} years old'
    def dog_year(self):
        return f'{self.name} is {self.age * 7} in dog years'
    def tricks(self, tricks):
        return f'{self.name} knows {tricks}'

pass_string = Dog(name = 'Clifford', age = 'This is a string')

pass_string.dog_year()

'Clifford is This is a stringThis is a stringThis is a stringThis is a stringThis is a stringThis is a stringThis is a string in dog years'

Which creates weird results and we can’t rely on future you or a another member of the team to look through every file to figure out what the input should be.

Class, instance, and static methods

One way we can do this is enforcing types in our functions. There are several flavors which can be used to enforce various behaviours

Instance methods

Instance methods are the most common and are really just functions defined in the body of the class. However they take on a different flavor when we are using them inside the body of the class

class Dog:
    def __init__(self, age: int, name:str):
        if not isinstance(name, str):
            raise ValueError("Name must be a string")
        self.name = name
        if not isinstance(age, int):
            raise ValueError("Age must be an integer")
        self.age  = age
    def description(self, age:int, name:str):
        print(f"{self.name} is {self.age} old")


doggo = Dog(name = 7, age = 'This is a string')

ValueError: Name must be a string

Now our class is a little bit more robust we can’t start passing things off that we shouldn’t. One thing that we can do is actually bypass the the name/age checks

dog = Dog(age=5, name="Buddy")

dog.description(age = 'five', name = 123)

Buddy is 5 old

One thing that you will notice is that the description doesn’t care about these violations. Which is not ideal so to make this even more robust we can add some fairly simple checks

class Dog:
    def __init__(self, name:str, age:int):
        if not isinstance(name, str):
            raise ValueError("name should be a string")
        self.name = name
        if not isinstance(age, int) or age <= 0:
            raise ValueError("age should be an integer or be greater than 1")
        self.age = age
    def description(self, name, age):
        if age != self.age or name != self.name:
            raise ValueError("Provided age or name does not match dog attributes")
        print(f"{self.name} is {self.age} years old")


dog = Dog(age=5, name="Buddy")

dog.description(age = 'five', name = 123)

ValueError: Provided age or name does not match dog attributes