************************************** * DATA MANIPULATION AND HOUSEKEEPING * ************************************** set mem 300m set matsize 200 set logtype text set more off log using c:\klaus\REU2009\stata\logs\session_0701, replace use c:\klaus\REU2009\stata\data\res_monthly *Preliminaries: Data inspection ******************************** tab origid sum use sum use,detail tabstat use, stat(mean sd min max) tabstat use, by(year) stat(mean sd min max) tabstat use, by(month) stat(mean sd min max) * Task 1: Create variable labels * *********************************** label var age "age of building in years" * YOUR TURN * label the variable "bedrooms" as "number of bedrooms" *Task 2: Sort data and create a running id ****************************************** sort origid year month gen runid=_n *Task 3: Create a group id for individuals, starting at 1 ********************************************************* sort runid egen id=group(origid) * YOUR TURN * create a period id for each month in the time series (1 through 36), call it "period" *Task 4: Capture the total number of observations for each individual ********************************************************* *(naturally, this will be 36 for all, but in many other applications you may have an unbalanced panel; this is also useful to check if you have a balanced panel, and to quickly eliminate short panels if needed) sort id by id: gen panelobs=_N *YOUR TURN *do the same for observations per period, call the variable "perobs" *Task 5: Generate a separate dummy variable for each period *********************************************************** tab period, gen(p) * YOUR TURN * generate a separate dummy variable for each year. Choose the prefix "y" tab year, gen (y) *Task 6: Generate lagged variables ********************************** xtset id period gen lag1per=L.period gen lag2per=L2.period *Task 7: Re-arrange the order of your variables *********************************************** order runid id panelobs perobs *Task 8: Collapse your data to annual consumption levels for each household ******************************************************** * first create household-year id egen id_year=group(id year) collapse (mean) id origid year lotsize sqft baths fixtures bedrooms age (sum) use prcp cdd hdd, by(id_year) save c:\klaus\REU2009\stata\data\res_annual, replace * YOUR TURN * collapse further to total use over the three years *Task 9: Re-load annual data and reshape into wide format (one row per household) ********************************************************** clear use c:\klaus\REU2009\stata\data\res_annual * drop id_year - no longer needed drop id_year reshape wide use age prcp cdd hdd, i(id) j(year) * YOUR TURN *back to long format log close