Skip to content Skip to sidebar Skip to footer

Read Csv With Variable Names as Column Headers Matlab

Reading and Writing CSV Files

Overview

Teaching: 30 min
Exercises: 0 min

Questions

  • How do I read data from a CSV file into R?

  • How do I write data to a CSV file?

Objectives

  • Read in a .csv, and explore the arguments of the csv reader.

  • Write the contradistinct data prepare to a new .csv, and explore the arguments.

The about common manner that scientists shop data is in Excel spreadsheets. While there are R packages designed to access data from Excel spreadsheets (eastward.g., gdata, RODBC, XLConnect, xlsx, RExcel), users often discover information technology easier to save their spreadsheets in comma-separated values files (CSV) and then utilize R's built in functionality to read and manipulate the data. In this brusk lesson, we'll learn how to read information from a .csv and write to a new .csv, and explore the arguments that allow you read and write the data correctly for your needs.

Read a .csv and Explore the Arguments

Let's start by opening a .csv file containing information on the speeds at which cars of dissimilar colors were clocked in 45 mph zones in the four-corners states (CarSpeeds.csv). We will use the built in read.csv(...) function call, which reads the information in as a data frame, and assign the data frame to a variable (using <-) so that it is stored in R's memory. Then we will explore some of the basic arguments that can be supplied to the role. Start, open the RStudio project containing the scripts and data yous were working on in episode 'Analyzing Patient Data'.

                          # Import the data and await at the outset 6 rows                                          carSpeeds                                          <-                                          read.csv              (              file                                          =                                          'data/motorcar-speeds.csv'              )                                          head              (              carSpeeds              )                                                  
                          Color Speed     Country 1  Bluish    32 NewMexico two   Carmine    45   Arizona 3  Blue    35  Colorado 4 White    34   Arizona 5   Cherry-red    25   Arizona 6  Blue    41   Arizona                      

Changing Delimiters

The default delimiter of the read.csv() function is a comma, but you tin can use other delimiters by supplying the 'sep' argument to the function (due east.thousand., typing sep = ';' allows a semi-colon separated file to be correctly imported - see ?read.csv() for more data on this and other options for working with different file types).

The call above will import the data, but we accept not taken advantage of several handy arguments that can be helpful in loading the data in the format nosotros want. Let'due south explore some of these arguments.

The default for read.csv(...) is to set the header argument to True. This means that the commencement row of values in the .csv is set as header information (column names). If your data set does not have a header, set up the header statement to FALSE:

                          # The first row of the data without setting the header argument:                                          carSpeeds              [              i              ,                                          ]                                                  
                          Color Speed     State 1  Blue    32 NewMexico                      
                          # The first row of the data if the header argument is prepare to FALSE:                                          carSpeeds                                          <-                                          read.csv              (              file                                          =                                          'data/motorcar-speeds.csv'              ,                                          header                                          =                                          FALSE              )                                          carSpeeds              [              1              ,                                          ]                                                  
                          V1    V2    V3 ane Color Speed State                      

Clearly this is non the desired behavior for this data ready, but it may be useful if you have a dataset without headers.

The stringsAsFactors Argument

In older versions of R (prior to 4.0) this was perhaps the well-nigh important argument in read.csv(), particularly if yous were working with categorical data. This is because the default behavior of R was to convert character strings into factors, which may make information technology difficult to do such things as replace values. It is of import to be aware of this behaviour, which we volition demonstrate. For example, allow's say we discover out that the information collector was color blind, and accidentally recorded green cars every bit being bluish. In order to correct the data ready, let's replace 'Blue' with 'Green' in the $Color column:

                          # Here nosotros will use R's `ifelse` function, in which we provide the test phrase,                                          # the issue if the result of the exam is 'TRUE', and the event if the                                          # result is 'Imitation'. We volition likewise assign the results to the Color cavalcade,                                          # using '<-'                                          # First - reload the information with a header                                          carSpeeds                                          <-                                          read.csv              (              file                                          =                                          'data/car-speeds.csv'              ,                                          stringsAsFactors                                          =                                          TRUE              )                                          carSpeeds              $              Colour                                          <-                                          ifelse              (              carSpeeds              $              Colour                                          ==                                          'Blue'              ,                                          'Green'              ,                                          carSpeeds              $              Color              )                                          carSpeeds              $              Color                                                  
                          [one] "Dark-green" "1"     "Greenish" "v"     "four"     "Green" "Green" "2"     "5"      [ten] "4"     "4"     "5"     "Light-green" "Green" "two"     "four"     "Green" "Dark-green"  [19] "5"     "Green" "Greenish" "Green" "4"     "Green" "iv"     "4"     "4"      [28] "4"     "5"     "Green" "four"     "five"     "ii"     "4"     "2"     "ii"      [37] "Green" "4"     "ii"     "4"     "two"     "2"     "4"     "four"     "five"      [46] "ii"     "Green" "4"     "four"     "2"     "2"     "4"     "5"     "4"      [55] "Dark-green" "Dark-green" "two"     "Light-green" "5"     "two"     "4"     "Light-green" "Greenish"  [64] "5"     "2"     "iv"     "4"     "two"     "Dark-green" "5"     "Green" "four"      [73] "v"     "5"     "Light-green" "Green" "Green" "Green" "Light-green" "5"     "2"      [82] "Greenish" "five"     "ii"     "ii"     "4"     "iv"     "5"     "5"     "5"      [91] "5"     "4"     "4"     "iv"     "v"     "2"     "5"     "2"     "two"     [100] "v"                      

What happened?!? It looks similar 'Blue' was replaced with 'Light-green', but every other color was turned into a number (as a grapheme cord, given the quote marks before and after). This is because the colors of the cars were loaded equally factors, and the cistron level was reported following replacement.

To see the internal construction, we can use another office, str(). In this case, the dataframe's internal structure includes the format of each column, which is what we are interested in. str() will be reviewed a trivial more in the lesson Data Types and Structures.

                          # Reload the data with a header (the previous ifelse call modifies attributes)                                          carSpeeds                                          <-                                          read.csv              (              file                                          =                                          'data/motorcar-speeds.csv'              ,                                          stringsAsFactors                                          =                                          Truthful              )                                          str              (              carSpeeds              )                                                  
            'information.frame':	100 obs. of  3 variables:  $ Colour: Factor west/ 5 levels " Crimson","Blackness",..: iii i 3 5 four 3 3 two v 4 ...  $ Speed: int  32 45 35 34 25 41 34 29 31 26 ...  $ Land: Factor west/ 4 levels "Arizona","Colorado",..: 3 i two 1 i i 3 2 ane ii ...                      

We tin come across that the $Color and $State columns are factors and $Speed is a numeric column.

Now, allow's load the dataset using stringsAsFactors=Fake, and see what happens when nosotros try to replace 'Blueish' with 'Green' in the $Color column:

                          carSpeeds                                          <-                                          read.csv              (              file                                          =                                          'information/auto-speeds.csv'              ,                                          stringsAsFactors                                          =                                          FALSE              )                                          str              (              carSpeeds              )                                                  
            'data.frame':	100 obs. of  3 variables:  $ Colour: chr  "Blue" " Red" "Blue" "White" ...  $ Speed: int  32 45 35 34 25 41 34 29 31 26 ...  $ State: chr  "NewMexico" "Arizona" "Colorado" "Arizona" ...                      
                          carSpeeds              $              Color                                          <-                                          ifelse              (              carSpeeds              $              Colour                                          ==                                          'Blue'              ,                                          'Green'              ,                                          carSpeeds              $              Color              )                                          carSpeeds              $              Color                                                  
                          [1] "Green" " Red"  "Greenish" "White" "Crimson"   "Green" "Greenish" "Black" "White"  [ten] "Reddish"   "Red"   "White" "Green" "Dark-green" "Black" "Carmine"   "Green" "Greenish"  [19] "White" "Green" "Green" "Greenish" "Red"   "Green" "Red"   "Red"   "Blood-red"    [28] "Red"   "White" "Dark-green" "Red"   "White" "Blackness" "Red"   "Black" "Black"  [37] "Dark-green" "Scarlet"   "Black" "Red"   "Black" "Black" "Crimson"   "Carmine"   "White"  [46] "Black" "Light-green" "Red"   "Cherry-red"   "Black" "Black" "Red"   "White" "Red"    [55] "Green" "Dark-green" "Black" "Light-green" "White" "Black" "Cherry-red"   "Green" "Light-green"  [64] "White" "Black" "Crimson"   "Scarlet"   "Blackness" "Light-green" "White" "Light-green" "Blood-red"    [73] "White" "White" "Green" "Green" "Light-green" "Light-green" "Green" "White" "Black"  [82] "Green" "White" "Black" "Black" "Ruby-red"   "Cherry-red"   "White" "White" "White"  [91] "White" "Ruddy"   "Cerise"   "Red"   "White" "Black" "White" "Black" "Blackness" [100] "White"                      

That'due south better! And we can encounter how the data at present is read equally character instead of cistron. From R version four.0 onwards we exercise not have to specify stringsAsFactors=Simulated, this is the default behavior.

The as.is Argument

This is an extension of the stringsAsFactors statement, but gives you lot command over private columns. For example, if we desire the colors of cars imported every bit strings, just we want the names of usa imported as factors, nosotros would load the data set every bit:

                          carSpeeds                                          <-                                          read.csv              (              file                                          =                                          'data/motorcar-speeds.csv'              ,                                          as.is                                          =                                          1              )                                          # Note, the i applies as.is to the offset cavalcade only                                                  

Now we can see that if nosotros try to supercede 'Blue' with 'Greenish' in the $Color column everything looks fine, while trying to replace 'Arizona' with 'Ohio' in the $State column returns the factor numbers for the names of states that we haven't replaced:

            'data.frame':	100 obs. of  3 variables:  $ Colour: chr  "Blueish" " Red" "Blueish" "White" ...  $ Speed: int  32 45 35 34 25 41 34 29 31 26 ...  $ State: Gene westward/ 4 levels "Arizona","Colorado",..: 3 1 two ane i 1 3 two 1 2 ...                      
                          carSpeeds              $              Color                                          <-                                          ifelse              (              carSpeeds              $              Colour                                          ==                                          'Blue'              ,                                          'Greenish'              ,                                          carSpeeds              $              Colour              )                                          carSpeeds              $              Color                                                  
                          [ane] "Green" " Red"  "Green" "White" "Cerise"   "Green" "Greenish" "Black" "White"  [10] "Blood-red"   "Blood-red"   "White" "Greenish" "Green" "Black" "Ruby-red"   "Green" "Green"  [xix] "White" "Green" "Green" "Green" "Red"   "Green" "Carmine"   "Reddish"   "Scarlet"    [28] "Red"   "White" "Green" "Red"   "White" "Black" "Crimson"   "Black" "Blackness"  [37] "Green" "Red"   "Black" "Red"   "Black" "Black" "Scarlet"   "Carmine"   "White"  [46] "Black" "Green" "Scarlet"   "Cherry"   "Black" "Black" "Reddish"   "White" "Red"    [55] "Dark-green" "Green" "Black" "Greenish" "White" "Black" "Red"   "Greenish" "Greenish"  [64] "White" "Black" "Red"   "Red"   "Black" "Green" "White" "Green" "Cherry-red"    [73] "White" "White" "Green" "Green" "Greenish" "Greenish" "Dark-green" "White" "Black"  [82] "Green" "White" "Black" "Black" "Red"   "Red"   "White" "White" "White"  [91] "White" "Reddish"   "Red"   "Cherry-red"   "White" "Black" "White" "Blackness" "Blackness" [100] "White"                      
                          carSpeeds              $              Country                                          <-                                          ifelse              (              carSpeeds              $              Country                                          ==                                          'Arizona'              ,                                          'Ohio'              ,                                          carSpeeds              $              Land              )                                          carSpeeds              $              State                                                  
                          [1] "3"    "Ohio" "two"    "Ohio" "Ohio" "Ohio" "3"    "2"    "Ohio" "2"     [xi] "4"    "4"    "four"    "iv"    "four"    "3"    "Ohio" "3"    "Ohio" "four"     [21] "iv"    "4"    "3"    "2"    "two"    "three"    "2"    "four"    "2"    "4"     [31] "three"    "2"    "2"    "4"    "ii"    "two"    "iii"    "Ohio" "4"    "2"     [41] "2"    "3"    "Ohio" "4"    "Ohio" "2"    "iii"    "3"    "3"    "2"     [51] "Ohio" "4"    "4"    "Ohio" "iii"    "two"    "four"    "2"    "four"    "4"     [61] "four"    "two"    "iii"    "2"    "three"    "2"    "3"    "Ohio" "three"    "4"     [71] "4"    "2"    "Ohio" "iv"    "2"    "2"    "2"    "Ohio" "3"    "Ohio"  [81] "4"    "2"    "2"    "Ohio" "Ohio" "Ohio" "4"    "Ohio" "4"    "4"     [91] "4"    "Ohio" "Ohio" "three"    "2"    "2"    "iv"    "3"    "Ohio" "4"                      

We tin can see that $Color column is a character while $State is a cistron.

Updating Values in a Cistron

Suppose we desire to continue the colors of cars every bit factors for some other operations we want to perform. Write lawmaking for replacing 'Blue' with 'Greenish' in the $Color cavalcade of the cars dataset without importing the information with stringsAsFactors=Fake.

Solution

                                  carSpeeds                                                      <-                                                      read.csv                  (                  file                                                      =                                                      'data/car-speeds.csv'                  )                                                      # Replace 'Blue' with 'Dark-green' in cars$Color without using the stringsAsFactors                                                      # or as.is arguments                                                      carSpeeds                  $                  Color                                                      <-                                                      ifelse                  (                  as.grapheme                  (                  carSpeeds                  $                  Color                  )                                                      ==                                                      'Bluish'                  ,                                                      'Green'                  ,                                                      as.character                  (                  carSpeeds                  $                  Color                  ))                                                      # Catechumen colors dorsum to factors                                                      carSpeeds                  $                  Color                                                      <-                                                      as.factor                  (                  carSpeeds                  $                  Color                  )                                                                  

The strip.white Argument

It is not uncommon for mistakes to have been made when the information were recorded, for case a space (whitespace) may have been inserted before a data value. By default this whitespace will be kept in the R environment, such that '\ Red' will exist recognized as a different value than 'Scarlet'. In social club to avoid this type of error, utilise the strip.white argument. Permit's see how this works by checking for the unique values in the $Color column of our dataset:

Here, the data recorder added a infinite before the color of the machine in 1 of the cells:

                          # We use the congenital-in unique() office to extract the unique colors in our dataset                                          unique              (              carSpeeds              $              Colour              )                                                  
            [1] Green  Cherry  White Crimson   Blackness Levels:  Carmine Black Light-green Red White                      

Oops, nosotros encounter 2 values for cherry-red cars.

Allow's endeavor again, this time importing the data using the strip.white argument. Notation - this statement must be accompanied past the sep argument, by which we indicate the blazon of delimiter in the file (the comma for most .csv files)

                          carSpeeds                                          <-                                          read.csv              (                                          file                                          =                                          'data/car-speeds.csv'              ,                                          stringsAsFactors                                          =                                          False              ,                                          strip.white                                          =                                          TRUE              ,                                          sep                                          =                                          ','                                          )                                          unique              (              carSpeeds              $              Color              )                                                  
            [1] "Blue"  "Red"   "White" "Black"                      

That'due south better!

Specify Missing Data When Loading

It is mutual for data sets to accept missing values, or mistakes. The convention for recording missing values frequently depends on the private who collected the data and tin can be recorded equally n.a., --, or empty cells " ". R recognises the reserved character string NA as a missing value, only not some of the examples above. Let's say the inflamation scale in the data prepare we used before inflammation-01.csv actually starts at i for no inflamation and the cypher values (0) were a missed observation. Looking at the ?read.csv help page is there an statement we could utilise to ensure all zeros (0) are read in as NA? Possibly, in the car-speeds.csv data contains mistakes and the person measuring the car speeds could not accurately distinguish betwixt "Blackness or "Blue" cars. Is at that place a way to specify more than one 'string', such every bit "Black" and "Blue", to be replaced by NA

Solution

                                  read.csv                  (                  file                                                      =                                                      "data/inflammation-01.csv"                  ,                                                      na.strings                                                      =                                                      "0"                  )                                                                  

or , in car-speeds.csv apply a character vector for multiple values.

                                  read.csv                  (                                                      file                                                      =                                                      'information/car-speeds.csv'                  ,                                                      na.strings                                                      =                                                      c                  (                  "Black"                  ,                                                      "Bluish"                  )                                                      )                                                                  

Write a New .csv and Explore the Arguments

Afterwards altering our cars dataset by replacing 'Bluish' with 'Greenish' in the $Colour column, we at present want to save the output. There are several arguments for the write.csv(...) function telephone call, a few of which are peculiarly important for how the data are exported. Let'south explore these now.

                          # Export the data. The write.csv() function requires a minimum of two                                          # arguments, the data to be saved and the name of the output file.                                          write.csv              (              carSpeeds              ,                                          file                                          =                                          'data/car-speeds-cleaned.csv'              )                                                  

If you open up the file, you'll see that information technology has header names, because the data had headers inside R, but that there are numbers in the beginning column.

csv written without row.names argument

The row.names Statement

This argument allows u.s.a. to prepare the names of the rows in the output data file. R'due south default for this statement is Truthful, and since it does not know what else to name the rows for the cars data set, it resorts to using row numbers. To correct this, nosotros tin can fix row.names to Faux:

                          write.csv              (              carSpeeds              ,                                          file                                          =                                          'data/car-speeds-cleaned.csv'              ,                                          row.names                                          =                                          Fake              )                                                  

Now we run into:

csv written with row.names argument

Setting Column Names

There is also a col.names statement, which tin can be used to set the column names for a data set without headers. If the information prepare already has headers (due east.g., we used the headers = TRUE statement when importing the data) then a col.names argument will be ignored.

The na Argument

In that location are times when we desire to specify certain values for NAsouth in the information set (east.g., we are going to pass the data to a program that only accepts -9999 as a nodata value). In this case, we want to ready the NA value of our output file to the desired value, using the na argument. Permit'southward meet how this works:

                          # Showtime, replace the speed in the 3rd row with NA, past using an alphabetize (foursquare                                          # brackets to indicate the position of the value nosotros want to replace)                                          carSpeeds              $              Speed              [              3              ]                                          <-                                          NA                                          head              (              carSpeeds              )                                                  
                          Color Speed     State 1  Blueish    32 NewMexico 2   Red    45   Arizona 3  Blue    NA  Colorado iv White    34   Arizona 5   Ruby-red    25   Arizona half-dozen  Bluish    41   Arizona                      
                          write.csv              (              carSpeeds              ,                                          file                                          =                                          'data/car-speeds-cleaned.csv'              ,                                          row.names                                          =                                          Imitation              )                                                  

Now we'll prepare NA to -9999 when nosotros write the new .csv file:

                          # Note - the na argument requires a string input                                          write.csv              (              carSpeeds              ,                                          file                                          =                                          'data/motorcar-speeds-cleaned.csv'              ,                                          row.names                                          =                                          Imitation              ,                                          na                                          =                                          '-9999'              )                                                  

And we run into:

csv written with -9999 as NA

Key Points

  • Import information from a .csv file using the read.csv(...) function.

  • Empathise some of the key arguments available for importing the information properly, including header, stringsAsFactors, as.is, and strip.white.

  • Write data to a new .csv file using the write.csv(...) function

  • Empathise some of the key arguments available for exporting the data properly, such as row.names, col.names, and na.

joneshathemand.blogspot.com

Source: https://swcarpentry.github.io/r-novice-inflammation/11-supp-read-write-csv/

Post a Comment for "Read Csv With Variable Names as Column Headers Matlab"