The file greti.csv
in the Github repository contains biometric data of 101 individual birds.
summary()
, str()
, names()
).The files cc_age.csv
and cc_wing.csv
contain different biometric data of the same individuals, but in two separate tables.
The file birdlist.Rdata
in the github repository contains more bird biometric data.
dat <- read.csv('greti.csv', header=T)
head(dat)
## RING SPEC SEX WING WT
## 1 L555028 GRETI F 76 18.7
## 2 L555044 GRETI F 72 18.9
## 3 L555050 GRETI F 74 18.8
## 4 L555050 GRETI F 75 19.0
## 5 L555052 GRETI F 75 20.1
## 6 L555052 GRETI F 75 20.2
str(dat)
## 'data.frame': 97 obs. of 5 variables:
## $ RING: Factor w/ 80 levels "L555027","L555028",..: 2 3 5 5 6 6 8 8 9 13 ...
## $ SPEC: Factor w/ 1 level "GRETI": 1 1 1 1 1 1 1 1 1 1 ...
## $ SEX : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
## $ WING: int 76 72 74 75 75 75 73 72 76 74 ...
## $ WT : num 18.7 18.9 18.8 19 20.1 20.2 18.5 18.2 19.2 21.7 ...
summary(dat)
## RING SPEC SEX WING WT
## L555027: 3 GRETI:97 F:49 Min. :70.00 Min. :16.80
## L555052: 3 M:48 1st Qu.:73.00 1st Qu.:18.50
## L555054: 3 Median :75.00 Median :19.00
## L555050: 2 Mean :75.11 Mean :19.04
## L555056: 2 3rd Qu.:77.00 3rd Qu.:19.70
## L555100: 2 Max. :80.00 Max. :21.70
## (Other):82 NA's :5
sex_wt_mean <- tapply(dat$WT, dat$SEX, mean)
sex_wt_mean
## F M
## 18.50612 19.59043
There are missing values in the wing length column WING (NAs). So “just” calculating a mean does not work:
sex_wing_mean <- tapply(dat$WING, dat$SEX, mean)
sex_wing_mean
## F M
## NA NA
To avoid this, we can use a custom “mean” function that ignores the NA’s:
mean_no_na <- function(x) {
return(mean(x, na.rm=T))
}
And now we can use tapply()
to calculate the means using this function:
sex_wing_mean <- tapply(dat$WING, dat$SEX, mean_no_na)
sex_wing_mean
## F M
## 73.57447 76.71111
One way to do this is to use match()
to match data for each sex (the wing and weight means) to each sex in the dat
table. To be able to do this, we first want to express sex_wing_mean
and sex_wt_mean
as dataframes.
sex_wing_mean <- as.data.frame(sex_wing_mean)
sex_wt_mean <- as.data.frame(sex_wt_mean)
We can now do a match()
for each individual bit of data, creating a new column for each. Two important things to note here. First, we have to make sure we refer to the “first” and only column in each of the “means” tables. Second, when using the match, we have no explicit “sex” column in each of the “means” tables. In this cases, the code for each sex is the row name in the tables. So we need to refer to this in the match using the row.names()
function.
dat$sex_wing_mean <- sex_wing_mean[,1][match(dat$SEX, row.names(sex_wing_mean))]
dat$sex_wt_mean <- sex_wt_mean[,1][match(dat$SEX, row.names(sex_wt_mean))]
cc_age <- read.csv("cc_age.csv", header=T)
cc_wing <- read.csv("cc_wing.csv", header=T)
We can examine the data sets by using e.g. str()
. This gives us, among other things how many “levels” there are in the variable ring_no, which indicates the different individuals. Note that there are fewer individuals in cc_wing
table. This means that we should match the cc_wing
data into the cc_age
table; if we do it the other way around we will lose some of the individuals for which we have age data but no wing lengths.
cc_age$wing <- cc_wing$wing_length[match(cc_age$ring_no, cc_wing$ring_no)]
head(cc_age)
## ring_no species_name age wing
## 1 KPJ625 Chiffchaff 3 63
## 2 KPJ623 Chiffchaff 3 62
## 3 KPJ621 Chiffchaff 3 63
## 4 KPJ617 Chiffchaff 3J NA
## 5 KPJ613 Chiffchaff 3 56
## 6 KPJ601 Chiffchaff 3J 62
We can now use the data in the new column to calculate the mean wing length for each age category. Again, there are missing values in the wing length data so we again need to “ignore” these when calculating the mean.
tapply(cc_age$wing, cc_age$age, mean)
## 3 3J 4
## NA NA 58.75
Instead of explicitly using the function we defined for this, we can also do this quickly in a single line (without creating an explicit function first). This is a bit less easy to read, but it does exactly the same thing:
tapply(cc_age$wing, cc_age$age, function(x) mean(x, na.rm=T))
## 3 3J 4
## 60.56000 60.22727 58.75000
load("birdlist.Rdata")
str(birdlist)
## List of 2
## $ cc:'data.frame': 101 obs. of 3 variables:
## ..$ ring_no : Factor w/ 101 levels "BLP601","BLP620",..: 101 100 99 98 97 96 12 11 10 9 ...
## ..$ age : Factor w/ 3 levels "3","3J","4": 1 1 1 2 1 2 2 2 2 2 ...
## ..$ wing_length: int [1:101] 63 62 63 NA 56 62 57 61 63 63 ...
## $ ww:'data.frame': 141 obs. of 3 variables:
## ..$ ring_no : Factor w/ 141 levels "BLP621","BLP622",..: 141 140 139 138 137 136 9 8 7 6 ...
## ..$ age : Factor w/ 3 levels "3","3J","4": 1 1 2 1 2 2 3 3 3 3 ...
## ..$ wing_length: int [1:141] 66 68 63 67 66 66 62 64 64 64 ...
So this is a list of two dataframes. We can calculate the mean for one of the columns in each dataframe using lapply. Note we have to make sure to ignore any missing values!
lapply(birdlist, function(x) mean(x$wing_length, na.rm=T))
## $cc
## [1] 59.93548
##
## $ww
## [1] 65.62698
We can do more complex operations by creating our own custom function. In this case, the function calculates the mean wing length for each age class, in each list element (i.e. for each species).
age_means <- function(x) {
tapply(x$wing_length, x$age, function(x) mean(x, na.rm=T))
}
We now apply this custom function to the list by doing:
lapply(birdlist, age_means)
## $cc
## 3 3J 4
## 60.56000 60.22727 58.75000
##
## $ww
## 3 3J 4
## 65.27500 65.09091 66.03125