Read Csv With Variable Names as Column Headers Matlab
Reading and Writing CSV Files
Overview
Teaching: 30 min
Exercises: 0 minQuestions
How do I read data from a CSV file into R?
How do I write data to a CSV file?
Objectives
Read in a .csv, and explore the arguments of the csv reader.
Write the contradistinct data prepare to a new .csv, and explore the arguments.
The about common manner that scientists shop data is in Excel spreadsheets. While there are R packages designed to access data from Excel spreadsheets (eastward.g., gdata, RODBC, XLConnect, xlsx, RExcel), users often discover information technology easier to save their spreadsheets in comma-separated values files (CSV) and then utilize R's built in functionality to read and manipulate the data. In this brusk lesson, we'll learn how to read information from a .csv and write to a new .csv, and explore the arguments that allow you read and write the data correctly for your needs.
Read a .csv and Explore the Arguments
Let's start by opening a .csv file containing information on the speeds at which cars of dissimilar colors were clocked in 45 mph zones in the four-corners states (CarSpeeds.csv
). We will use the built in read.csv(...)
function call, which reads the information in as a data frame, and assign the data frame to a variable (using <-
) so that it is stored in R's memory. Then we will explore some of the basic arguments that can be supplied to the role. Start, open the RStudio project containing the scripts and data yous were working on in episode 'Analyzing Patient Data'.
# Import the data and await at the outset 6 rows carSpeeds <- read.csv ( file = 'data/motorcar-speeds.csv' ) head ( carSpeeds )
Color Speed Country 1 Bluish 32 NewMexico two Carmine 45 Arizona 3 Blue 35 Colorado 4 White 34 Arizona 5 Cherry-red 25 Arizona 6 Blue 41 Arizona
Changing Delimiters
The default delimiter of the
read.csv()
function is a comma, but you tin can use other delimiters by supplying the 'sep' argument to the function (due east.thousand., typingsep = ';'
allows a semi-colon separated file to be correctly imported - see?read.csv()
for more data on this and other options for working with different file types).
The call above will import the data, but we accept not taken advantage of several handy arguments that can be helpful in loading the data in the format nosotros want. Let'due south explore some of these arguments.
The default for read.csv(...)
is to set the header
argument to True
. This means that the commencement row of values in the .csv is set as header information (column names). If your data set does not have a header, set up the header
statement to FALSE
:
# The first row of the data without setting the header argument: carSpeeds [ i , ]
Color Speed State 1 Blue 32 NewMexico
# The first row of the data if the header argument is prepare to FALSE: carSpeeds <- read.csv ( file = 'data/motorcar-speeds.csv' , header = FALSE ) carSpeeds [ 1 , ]
V1 V2 V3 ane Color Speed State
Clearly this is non the desired behavior for this data ready, but it may be useful if you have a dataset without headers.
The stringsAsFactors
Argument
In older versions of R (prior to 4.0) this was perhaps the well-nigh important argument in read.csv()
, particularly if yous were working with categorical data. This is because the default behavior of R was to convert character strings into factors, which may make information technology difficult to do such things as replace values. It is of import to be aware of this behaviour, which we volition demonstrate. For example, allow's say we discover out that the information collector was color blind, and accidentally recorded green cars every bit being bluish. In order to correct the data ready, let's replace 'Blue' with 'Green' in the $Color
column:
# Here nosotros will use R's `ifelse` function, in which we provide the test phrase, # the issue if the result of the exam is 'TRUE', and the event if the # result is 'Imitation'. We volition likewise assign the results to the Color cavalcade, # using '<-' # First - reload the information with a header carSpeeds <- read.csv ( file = 'data/car-speeds.csv' , stringsAsFactors = TRUE ) carSpeeds $ Colour <- ifelse ( carSpeeds $ Colour == 'Blue' , 'Green' , carSpeeds $ Color ) carSpeeds $ Color
[one] "Dark-green" "1" "Greenish" "v" "four" "Green" "Green" "2" "5" [ten] "4" "4" "5" "Light-green" "Green" "two" "four" "Green" "Dark-green" [19] "5" "Green" "Greenish" "Green" "4" "Green" "iv" "4" "4" [28] "4" "5" "Green" "four" "five" "ii" "4" "2" "ii" [37] "Green" "4" "ii" "4" "two" "2" "4" "four" "five" [46] "ii" "Green" "4" "four" "2" "2" "4" "5" "4" [55] "Dark-green" "Dark-green" "two" "Light-green" "5" "two" "4" "Light-green" "Greenish" [64] "5" "2" "iv" "4" "two" "Dark-green" "5" "Green" "four" [73] "v" "5" "Light-green" "Green" "Green" "Green" "Light-green" "5" "2" [82] "Greenish" "five" "ii" "ii" "4" "iv" "5" "5" "5" [91] "5" "4" "4" "iv" "v" "2" "5" "2" "two" [100] "v"
What happened?!? It looks similar 'Blue' was replaced with 'Light-green', but every other color was turned into a number (as a grapheme cord, given the quote marks before and after). This is because the colors of the cars were loaded equally factors, and the cistron level was reported following replacement.
To see the internal construction, we can use another office, str()
. In this case, the dataframe's internal structure includes the format of each column, which is what we are interested in. str()
will be reviewed a trivial more in the lesson Data Types and Structures.
# Reload the data with a header (the previous ifelse call modifies attributes) carSpeeds <- read.csv ( file = 'data/motorcar-speeds.csv' , stringsAsFactors = Truthful ) str ( carSpeeds )
'information.frame': 100 obs. of 3 variables: $ Colour: Factor west/ 5 levels " Crimson","Blackness",..: iii i 3 5 four 3 3 two v 4 ... $ Speed: int 32 45 35 34 25 41 34 29 31 26 ... $ Land: Factor west/ 4 levels "Arizona","Colorado",..: 3 i two 1 i i 3 2 ane ii ...
We tin come across that the $Color
and $State
columns are factors and $Speed
is a numeric column.
Now, allow's load the dataset using stringsAsFactors=Fake
, and see what happens when nosotros try to replace 'Blueish' with 'Green' in the $Color
column:
carSpeeds <- read.csv ( file = 'information/auto-speeds.csv' , stringsAsFactors = FALSE ) str ( carSpeeds )
'data.frame': 100 obs. of 3 variables: $ Colour: chr "Blue" " Red" "Blue" "White" ... $ Speed: int 32 45 35 34 25 41 34 29 31 26 ... $ State: chr "NewMexico" "Arizona" "Colorado" "Arizona" ...
carSpeeds $ Color <- ifelse ( carSpeeds $ Colour == 'Blue' , 'Green' , carSpeeds $ Color ) carSpeeds $ Color
[1] "Green" " Red" "Greenish" "White" "Crimson" "Green" "Greenish" "Black" "White" [ten] "Reddish" "Red" "White" "Green" "Dark-green" "Black" "Carmine" "Green" "Greenish" [19] "White" "Green" "Green" "Greenish" "Red" "Green" "Red" "Red" "Blood-red" [28] "Red" "White" "Dark-green" "Red" "White" "Blackness" "Red" "Black" "Black" [37] "Dark-green" "Scarlet" "Black" "Red" "Black" "Black" "Crimson" "Carmine" "White" [46] "Black" "Light-green" "Red" "Cherry-red" "Black" "Black" "Red" "White" "Red" [55] "Green" "Dark-green" "Black" "Light-green" "White" "Black" "Cherry-red" "Green" "Light-green" [64] "White" "Black" "Crimson" "Scarlet" "Blackness" "Light-green" "White" "Light-green" "Blood-red" [73] "White" "White" "Green" "Green" "Light-green" "Light-green" "Green" "White" "Black" [82] "Green" "White" "Black" "Black" "Ruby-red" "Cherry-red" "White" "White" "White" [91] "White" "Ruddy" "Cerise" "Red" "White" "Black" "White" "Black" "Blackness" [100] "White"
That'due south better! And we can encounter how the data at present is read equally character instead of cistron. From R version four.0 onwards we exercise not have to specify stringsAsFactors=Simulated
, this is the default behavior.
The as.is
Argument
This is an extension of the stringsAsFactors
statement, but gives you lot command over private columns. For example, if we desire the colors of cars imported every bit strings, just we want the names of usa imported as factors, nosotros would load the data set every bit:
carSpeeds <- read.csv ( file = 'data/motorcar-speeds.csv' , as.is = 1 ) # Note, the i applies as.is to the offset cavalcade only
Now we can see that if nosotros try to supercede 'Blue' with 'Greenish' in the $Color
column everything looks fine, while trying to replace 'Arizona' with 'Ohio' in the $State
column returns the factor numbers for the names of states that we haven't replaced:
'data.frame': 100 obs. of 3 variables: $ Colour: chr "Blueish" " Red" "Blueish" "White" ... $ Speed: int 32 45 35 34 25 41 34 29 31 26 ... $ State: Gene westward/ 4 levels "Arizona","Colorado",..: 3 1 two ane i 1 3 two 1 2 ...
carSpeeds $ Color <- ifelse ( carSpeeds $ Colour == 'Blue' , 'Greenish' , carSpeeds $ Colour ) carSpeeds $ Color
[ane] "Green" " Red" "Green" "White" "Cerise" "Green" "Greenish" "Black" "White" [10] "Blood-red" "Blood-red" "White" "Greenish" "Green" "Black" "Ruby-red" "Green" "Green" [xix] "White" "Green" "Green" "Green" "Red" "Green" "Carmine" "Reddish" "Scarlet" [28] "Red" "White" "Green" "Red" "White" "Black" "Crimson" "Black" "Blackness" [37] "Green" "Red" "Black" "Red" "Black" "Black" "Scarlet" "Carmine" "White" [46] "Black" "Green" "Scarlet" "Cherry" "Black" "Black" "Reddish" "White" "Red" [55] "Dark-green" "Green" "Black" "Greenish" "White" "Black" "Red" "Greenish" "Greenish" [64] "White" "Black" "Red" "Red" "Black" "Green" "White" "Green" "Cherry-red" [73] "White" "White" "Green" "Green" "Greenish" "Greenish" "Dark-green" "White" "Black" [82] "Green" "White" "Black" "Black" "Red" "Red" "White" "White" "White" [91] "White" "Reddish" "Red" "Cherry-red" "White" "Black" "White" "Blackness" "Blackness" [100] "White"
carSpeeds $ Country <- ifelse ( carSpeeds $ Country == 'Arizona' , 'Ohio' , carSpeeds $ Land ) carSpeeds $ State
[1] "3" "Ohio" "two" "Ohio" "Ohio" "Ohio" "3" "2" "Ohio" "2" [xi] "4" "4" "four" "iv" "four" "3" "Ohio" "3" "Ohio" "four" [21] "iv" "4" "3" "2" "two" "three" "2" "four" "2" "4" [31] "three" "2" "2" "4" "ii" "two" "iii" "Ohio" "4" "2" [41] "2" "3" "Ohio" "4" "Ohio" "2" "iii" "3" "3" "2" [51] "Ohio" "4" "4" "Ohio" "iii" "two" "four" "2" "four" "4" [61] "four" "two" "iii" "2" "three" "2" "3" "Ohio" "three" "4" [71] "4" "2" "Ohio" "iv" "2" "2" "2" "Ohio" "3" "Ohio" [81] "4" "2" "2" "Ohio" "Ohio" "Ohio" "4" "Ohio" "4" "4" [91] "4" "Ohio" "Ohio" "three" "2" "2" "iv" "3" "Ohio" "4"
We tin can see that $Color
column is a character while $State
is a cistron.
Updating Values in a Cistron
Suppose we desire to continue the colors of cars every bit factors for some other operations we want to perform. Write lawmaking for replacing 'Blue' with 'Greenish' in the
$Color
cavalcade of the cars dataset without importing the information withstringsAsFactors=Fake
.Solution
carSpeeds <- read.csv ( file = 'data/car-speeds.csv' ) # Replace 'Blue' with 'Dark-green' in cars$Color without using the stringsAsFactors # or as.is arguments carSpeeds $ Color <- ifelse ( as.grapheme ( carSpeeds $ Color ) == 'Bluish' , 'Green' , as.character ( carSpeeds $ Color )) # Catechumen colors dorsum to factors carSpeeds $ Color <- as.factor ( carSpeeds $ Color )
The strip.white
Argument
It is not uncommon for mistakes to have been made when the information were recorded, for case a space (whitespace) may have been inserted before a data value. By default this whitespace will be kept in the R environment, such that '\ Red' will exist recognized as a different value than 'Scarlet'. In social club to avoid this type of error, utilise the strip.white
argument. Permit's see how this works by checking for the unique values in the $Color
column of our dataset:
Here, the data recorder added a infinite before the color of the machine in 1 of the cells:
# We use the congenital-in unique() office to extract the unique colors in our dataset unique ( carSpeeds $ Colour )
[1] Green Cherry White Crimson Blackness Levels: Carmine Black Light-green Red White
Oops, nosotros encounter 2 values for cherry-red cars.
Allow's endeavor again, this time importing the data using the strip.white
argument. Notation - this statement must be accompanied past the sep
argument, by which we indicate the blazon of delimiter in the file (the comma for most .csv files)
carSpeeds <- read.csv ( file = 'data/car-speeds.csv' , stringsAsFactors = False , strip.white = TRUE , sep = ',' ) unique ( carSpeeds $ Color )
[1] "Blue" "Red" "White" "Black"
That'due south better!
Specify Missing Data When Loading
It is mutual for data sets to accept missing values, or mistakes. The convention for recording missing values frequently depends on the private who collected the data and tin can be recorded equally
n.a.
,--
, or empty cells " ". R recognises the reserved character stringNA
as a missing value, only not some of the examples above. Let's say the inflamation scale in the data prepare we used beforeinflammation-01.csv
actually starts ati
for no inflamation and the cypher values (0
) were a missed observation. Looking at the?read.csv
help page is there an statement we could utilise to ensure all zeros (0
) are read in asNA
? Possibly, in thecar-speeds.csv
data contains mistakes and the person measuring the car speeds could not accurately distinguish betwixt "Blackness or "Blue" cars. Is at that place a way to specify more than one 'string', such every bit "Black" and "Blue", to be replaced byNA
Solution
read.csv ( file = "data/inflammation-01.csv" , na.strings = "0" )
or , in
car-speeds.csv
apply a character vector for multiple values.read.csv ( file = 'information/car-speeds.csv' , na.strings = c ( "Black" , "Bluish" ) )
Write a New .csv and Explore the Arguments
Afterwards altering our cars dataset by replacing 'Bluish' with 'Greenish' in the $Colour
column, we at present want to save the output. There are several arguments for the write.csv(...)
function telephone call, a few of which are peculiarly important for how the data are exported. Let'south explore these now.
# Export the data. The write.csv() function requires a minimum of two # arguments, the data to be saved and the name of the output file. write.csv ( carSpeeds , file = 'data/car-speeds-cleaned.csv' )
If you open up the file, you'll see that information technology has header names, because the data had headers inside R, but that there are numbers in the beginning column.
The row.names
Statement
This argument allows u.s.a. to prepare the names of the rows in the output data file. R'due south default for this statement is Truthful
, and since it does not know what else to name the rows for the cars data set, it resorts to using row numbers. To correct this, nosotros tin can fix row.names
to Faux
:
write.csv ( carSpeeds , file = 'data/car-speeds-cleaned.csv' , row.names = Fake )
Now we run into:
Setting Column Names
There is also a
col.names
statement, which tin can be used to set the column names for a data set without headers. If the information prepare already has headers (due east.g., we used theheaders = TRUE
statement when importing the data) then acol.names
argument will be ignored.
The na
Argument
In that location are times when we desire to specify certain values for NA
south in the information set (east.g., we are going to pass the data to a program that only accepts -9999 as a nodata value). In this case, we want to ready the NA
value of our output file to the desired value, using the na argument. Permit'southward meet how this works:
# Showtime, replace the speed in the 3rd row with NA, past using an alphabetize (foursquare # brackets to indicate the position of the value nosotros want to replace) carSpeeds $ Speed [ 3 ] <- NA head ( carSpeeds )
Color Speed State 1 Blueish 32 NewMexico 2 Red 45 Arizona 3 Blue NA Colorado iv White 34 Arizona 5 Ruby-red 25 Arizona half-dozen Bluish 41 Arizona
write.csv ( carSpeeds , file = 'data/car-speeds-cleaned.csv' , row.names = Imitation )
Now we'll prepare NA
to -9999 when nosotros write the new .csv file:
# Note - the na argument requires a string input write.csv ( carSpeeds , file = 'data/motorcar-speeds-cleaned.csv' , row.names = Imitation , na = '-9999' )
And we run into:
Key Points
Import information from a .csv file using the
read.csv(...)
function.Empathise some of the key arguments available for importing the information properly, including
header
,stringsAsFactors
,as.is
, andstrip.white
.Write data to a new .csv file using the
write.csv(...)
functionEmpathise some of the key arguments available for exporting the data properly, such as
row.names
,col.names
, andna
.
Source: https://swcarpentry.github.io/r-novice-inflammation/11-supp-read-write-csv/
Post a Comment for "Read Csv With Variable Names as Column Headers Matlab"