Southland Wx Posted March 22, 2011 Share Posted March 22, 2011 SoI'll preface this by saying I don't have very good programming skills, but if it's explained well enough, I might be able to walk myself through it. I want to "scrape" about 10 years of upper air data for a number of stations and dump just a handful of parameters into an Excel spreadsheet. I have about 20 stations to target and I just want 850 mb T and 500 mb height. I can get the upper air data either off of U of Wyoming or Plymouth State. http://vortex.plymou...asoundlist.html http://weather.uwyo....r/sounding.html Plymouth I can only get one day at a time. Wyoming I can get one month at a time. Still, it's a manual effort to go day by day and input all of the parameters into a spreadsheet. I can do maybe one year per station per day. Since the output html is identical for both websites... Wyoming for instance is: http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=YYYY&MONTH=MM&FROM=DDHH&TO=DDHH&STNM=STATN The bolded items are the only ones that change.... Year, month, date/hour (I just want 00z), and station #. Since it's a similar web address, I'm assuming there could be a way to draw something up to maybe make a Macro in Excel to go out and fetch data from these websites in sequential order. I just have no idea how one would write this. I could fetch all the data, then do a sort and voila. I guess alternatively I could just export all the WYoming data by month...month by month manually.. into Excel and sort, which *could* work. But this seems more fun. If anyone has any ideas, I'd appreciate it. Thanks. Link to comment Share on other sites More sharing options...
okie333 Posted March 22, 2011 Share Posted March 22, 2011 SoI'll preface this by saying I don't have very good programming skills, but if it's explained well enough, I might be able to walk myself through it. I want to "scrape" about 10 years of upper air data for a number of stations and dump just a handful of parameters into an Excel spreadsheet. I have about 20 stations to target and I just want 850 mb T and 500 mb height. I can get the upper air data either off of U of Wyoming or Plymouth State. http://vortex.plymou...asoundlist.html http://weather.uwyo....r/sounding.html Plymouth I can only get one day at a time. Wyoming I can get one month at a time. Still, it's a manual effort to go day by day and input all of the parameters into a spreadsheet. I can do maybe one year per station per day. Since the output html is identical for both websites... Wyoming for instance is: http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=YYYY&MONTH=MM&FROM=DDHH&TO=DDHH&STNM=STATN The bolded items are the only ones that change.... Year, month, date/hour (I just want 00z), and station #. Since it's a similar web address, I'm assuming there could be a way to draw something up to maybe make a Macro in Excel to go out and fetch data from these websites in sequential order. I just have no idea how one would write this. I could fetch all the data, then do a sort and voila. I guess alternatively I could just export all the WYoming data by month...month by month manually.. into Excel and sort, which *could* work. But this seems more fun. If anyone has any ideas, I'd appreciate it. Thanks. WGET and batch files... or a quick custom program... would work better than Excel. Getting the HTML into Excel is another issue altogether. Link to comment Share on other sites More sharing options...
wxmx Posted March 22, 2011 Share Posted March 22, 2011 SoI'll preface this by saying I don't have very good programming skills, but if it's explained well enough, I might be able to walk myself through it. I want to "scrape" about 10 years of upper air data for a number of stations and dump just a handful of parameters into an Excel spreadsheet. I have about 20 stations to target and I just want 850 mb T and 500 mb height. I can get the upper air data either off of U of Wyoming or Plymouth State. http://vortex.plymou...asoundlist.html http://weather.uwyo....r/sounding.html Plymouth I can only get one day at a time. Wyoming I can get one month at a time. Still, it's a manual effort to go day by day and input all of the parameters into a spreadsheet. I can do maybe one year per station per day. Since the output html is identical for both websites... Wyoming for instance is: http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=YYYY&MONTH=MM&FROM=DDHH&TO=DDHH&STNM=STATN The bolded items are the only ones that change.... Year, month, date/hour (I just want 00z), and station #. Since it's a similar web address, I'm assuming there could be a way to draw something up to maybe make a Macro in Excel to go out and fetch data from these websites in sequential order. I just have no idea how one would write this. I could fetch all the data, then do a sort and voila. I guess alternatively I could just export all the WYoming data by month...month by month manually.. into Excel and sort, which *could* work. But this seems more fun. If anyone has any ideas, I'd appreciate it. Thanks. Windows? What programming languages you are more comfortable with, or at least have some experience? What are the uses for the pulled data? Table parsing can be a bit hair pulling, but the good news is that it's a fixed width format. If you have done vbscripting (for macro programming) you can check the WinHttpRequest object, which will help you pull the data from an URL. Link to comment Share on other sites More sharing options...
friedmators Posted March 23, 2011 Share Posted March 23, 2011 Matlab is my one stop destination for massive data parsing. Link to comment Share on other sites More sharing options...
bobbutts Posted March 23, 2011 Share Posted March 23, 2011 Can you explain exactly which pages you want to scrape from and which fields? As long as the list of pages can be formulated and the data is standardized on them I could probably do this pretty easily. I'd scrape with linux command line tools (lynx, awk, sed) and import to a mysql db, then use php excel output from mysql to export to your desired format. Need to know exactly what data to collect and from what pages. Link to comment Share on other sites More sharing options...
wxmx Posted March 23, 2011 Share Posted March 23, 2011 Can you explain exactly which pages you want to scrape from and which fields? As long as the list of pages can be formulated and the data is standardized on them I could probably do this pretty easily. I'd scrape with linux command line tools (lynx, awk, sed) and import to a mysql db, then use php excel output from mysql to export to your desired format. Need to know exactly what data to collect and from what pages. You can probably skip mysql and php excel if you save it as a csv file. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.