trimming white spaces, trimming unwanted text in R

Posted on: November 18, 2017 Posted by: Guy Manova Comments: 0

trimming white spaces, trimming unwanted text in R

Trim spaces and other unwanted text in R

This tutorial illustrates how you can trim spaces and other unwanted text in R (compared to the trim function in Excel which works somewhat differently)

In Excel

trim spaces underscores other text R

# pre-session options

rm(list = ls())
# getwd()
# setwd(“C:/Users/your preferred path”)

####### a simple case first, which works the same in Excel and in R #######
####### trimming extra white spaces from start and end #####################

stringVec <- c(” David Johnson”, ” Jason Gardner “, “Emma Ferdinand “)
write.csv(as.data.frame(stringVec), “stringVec.csv”)
trimws(stringVec)

trim spaces underscores other text R

########## when the extra white space is in the middle – won’t work ####
########## (unlike Excel) ##############################################

stringVec2 <- c(” David Johnson”, ” Jason Gardner “, “Emma Ferdinand “)
write.csv(as.data.frame(stringVec2), “stringVec2.csv”)
trimws(stringVec2)

trim spaces underscores other text R

# what can be done here, is to get assistance from a regular expression operator
# (i.e. “regex”, a big topic which i do not cover here)
# we need to use a regular expression \\s+ to eliminate whitespace repetition of more than 1

stringVec2 <- trimws(gsub(“\\s+”, ” “, stringVec2))
stringVec2

trim spaces underscores other text R

# although there are more ways to do this, it’s also a very convenient way to trim other unwanted
# charecters if you know what they are. in this case, we’ll get rid of all preceding, trailing and
# repeating underscores:

stringVec3 <- c(“_David_Johnson”, “_Jason____Gardner “, “Emma_Ferdinand__”)

# we’ll replace the underscores with spaces, trim, and then replace the remaining space delimiter

# back to an underscore:

stringVec3 <- gsub(” “, “_”,(trimws(gsub(“\\_+”, ” “, stringVec3))))

trim spaces underscores other text R

library(dplyr)
library(stringr)

stringVec3 <- c(“_David_Johnson”, “_Jason____Gardner “, “Emma_Ferdinand__”)

stringVec3 <- stringVec3 %>%
str_replace_all(“\\_+”,” “) %>%
str_trim(side = “both”) %>%
str_replace_all(” “, “_”)

trim spaces underscores other text R

trim spaces underscores other text R

Leave a Reply:

Your email address will not be published. Required fields are marked *