简体   繁体   中英

How can I transform geospatial data read as lat/lon coordinates from a CSV to match data read from a SHP?

Introduction

I am reading two files with geospatial data, one as .shp and one as .csv (using options = c('X_POSSIBLE_NAMES=lat', 'Y_POSSIBLE_NAMES=lon' ). The .shp file has unusual values in the bbox field when it's read, but graphs well. The .csv file has the right values in the bbox field when it's read, but graphs badly. Here are the read messages and dputs.

Read messages

Reading layer `WWW` from data source `XXX.shp' using driver `ESRI Shapefile'
Simple feature collection with 222 features and 12 fields
geometry type:  LINESTRING
dimension:      XY
bbox:           xmin: 745400.7 ymin: 2402049 xmax: 998753.2 ymax: 2671392
epsg (SRID):    32645
proj4string:    +proj=utm +zone=45 +datum=WGS84 +units=m +no_defs

and

options:        X_POSSIBLE_NAMES=lat Y_POSSIBLE_NAMES=lon 
Reading layer `YYY` from data source `ZZZ`.csv' using driver `CSV'
Simple feature collection with 5 features and 3 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 24.70942 ymin: 87.91372 xmax: 24.70942 ymax: 88.20073
epsg (SRID):    NA
proj4string:    NA

dputs

structure(list(Transporta = 31159.5791334, Transpor_1 = 1, Transpor_2 = 2, 
    Transpor_3 = 0, TransportS = 1, Transpor_4 = 120.8226354, 
    Transpor_5 = 40.37383139, Transpor_6 = 0.334157844, Transpor_7 = 0.6603, 
    Transpor_8 = 0.8694, Transpor_9 = structure(NA_integer_, .Label = character(0), class = "factor"), 
    class = structure(1L, .Label = "0", class = "factor"), geometry = structure(list(
        structure(c(745439.842100001, 745546.5795, 745632.3839, 
        745788.981400001, 745917.343899999, 745985.294699999, 
        746066.299, 746154.675899999, 746368.8333, 746569.5261, 
        746772.2455, 746881.8819, 747004.401999999, 747073.7139, 
        747272.5434, 747470.2489, 747669.9781, 747812.230399999, 
        747975.2242, 748282.6014, 748497.1264, 748706.9185, 749043.816300001, 
        749292.6146, 749610.7259, 749925.135099999, 750001.7305, 
        750624.1876, 750825.9755, 750948.688199999, 751059.9489, 
        751163.9131, 751327.932799999, 751563.730499999, 751814.7569, 
        751971.032300002, 752126.386900001, 752195.9161, 752213.0906, 
        752386.998999999, 752474.422399999, 752562.3503, 752732.1897, 
        752984.445600001, 753125.1189, 753316.914400001, 753451.7143, 
        753804.5972, 753975.4802, 754070.401899998, 754153.675599999, 
        754287.556099999, 754405.755000002, 754531.939100001, 
        754559.107699999, 754588.9883, 754515.016200001, 754515.4296, 
        754545.5989, 754566.464399999, 754662.8412, 754914.5953, 
        755505.306800001, 756006.1144, 757647.819899998, 757818.6545, 
        757857.6266, 757878.117400001, 757888.392300001, 757979.6043, 
        758039.2786, 758104.023299999, 758273.8898, 758353.649, 
        758363.7606, 758425.7181, 758461.5542, 758412.883999999, 
        758361.6557, 758337.5731, 758369.8289, 758400.552799999, 
        758599.791, 758782.7463, 758948.5656, 759066.780899999, 
        759122.1718, 759140.5877, 759163.8683, 759310.9385, 759418.1858, 
        761587.9766, 761697.946600001, 761732.9896, 761735.1635, 
        761771.6095, 761834.645200001, 761880.7744, 761988.7883, 
        762117.364700001, 762363.861100001, 762461.7972, 762577.650000001, 
        2549587.0462, 2549520.9314, 2549439.3669, 2549255.2505, 
        2549117.1188, 2548990.8909, 2548851.1706, 2548730.2398, 
        2548549.1469, 2548369.9096, 2548184.485, 2548055.6101, 
        2547843.9775, 2547702.8617, 2547377.6804, 2547071.7279, 
        2546836.4809, 2546710.2235, 2546607.1267, 2546468.6995, 
        2546397.9558, 2546368.2405, 2546431.9041, 2546481.667, 
        2546520.1404, 2546550.2619, 2546534.1647, 2546540.4899, 
        2546584.3733, 2546647.1947, 2546721.98, 2546784.4899, 
        2546843.9544, 2546872.2121, 2546826.1245, 2546737.2423, 
        2546642.6158, 2546522.2403, 2546392.8828, 2545671.1662, 
        2545403.889, 2545106.76, 2544870.7301, 2544665.9495, 
        2544572.9174, 2544320.907, 2544112.5035, 2543574.9242, 
        2543279.2004, 2543016.8945, 2542750.9041, 2542376.3794, 
        2542042.1049, 2541679.6057, 2541467.5463, 2541190.3413, 
        2540905.4868, 2540780.0052, 2540631.2105, 2540525.5178, 
        2540373.1915, 2540070.0945, 2539482.8831, 2539046.8166, 
        2537839.0783, 2537724.5041, 2537640.0879, 2537539.1494, 
        2537377.2627, 2537091.1622, 2536893.6567, 2536619.2583, 
        2536123.8235, 2535849.6827, 2535679.4848, 2535292.3446, 
        2534815.1725, 2534426.1384, 2534186.3689, 2533976.9259, 
        2533708.7239, 2533530.0804, 2533205.0132, 2532931.801, 
        2532675.3421, 2532349.1915, 2532184.0268, 2531985.8156, 
        2531722.8626, 2531466.0844, 2531342.3258, 2531359.46, 
        2531296.537, 2531224.2141, 2531098.6506, 2530945.319, 
        2530557.4524, 2530278.6875, 2529895.6515, 2529626.4184, 
        2529270.0938, 2529090.8971, 2528794.2838), .Dim = c(103L, 
        2L), class = c("XY", "LINESTRING", "sfg"))), class = c("sfc_LINESTRING", 
    "sfc"), precision = 0, bbox = structure(c(xmin = 745439.842100001, 
    ymin = 2528794.2838, xmax = 762577.650000001, ymax = 2549587.0462
    ), class = "bbox"), crs = structure(list(epsg = 32645L, proj4string = "+proj=utm +zone=45 +datum=WGS84 +units=m +no_defs"), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(Transporta = NA_integer_, 
Transpor_1 = NA_integer_, Transpor_2 = NA_integer_, Transpor_3 = NA_integer_, 
TransportS = NA_integer_, Transpor_4 = NA_integer_, Transpor_5 = NA_integer_, 
Transpor_6 = NA_integer_, Transpor_7 = NA_integer_, Transpor_8 = NA_integer_, 
Transpor_9 = NA_integer_, class = NA_integer_), .Label = c("constant", 
"aggregate", "identity"), class = "factor"), row.names = 1L, class = c("sf", 
"data.frame"))

and

structure(list(width = structure(c(1L, 2L, 5L, 3L, 4L), .Label = c("2.4716396025460600", 
"2.6281478054445400", "3.0063089766321100", "3.3519104918569500", 
"4.263698279008270"), class = "factor"), lat = c(24.7094195311052, 
24.7094195311052, 24.7094195311052, 24.7094195311052, 24.7094195311052
), lon = c(87.9137151118854, 87.9139846064706, 88.1161055453975, 
88.1996488668206, 88.2007268451616), geometry = structure(list(
    structure(c(24.7094195311052, 87.9137151118854), class = c("XY", 
    "POINT", "sfg")), structure(c(24.7094195311052, 87.9139846064706
    ), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.1161055453975), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.1996488668206), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.2007268451616), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT", 
"sfc"), precision = 0, bbox = structure(c(xmin = 24.7094195311052, 
ymin = 87.9137151118854, xmax = 24.7094195311052, ymax = 88.2007268451616
), class = "bbox"), crs = structure(list(epsg = NA_integer_, 
    proj4string = NA_character_), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(width = NA_integer_, 
lat = NA_integer_, lon = NA_integer_), .Label = c("constant", 
"aggregate", "identity"), class = "factor"), row.names = c(NA, 
5L), class = c("sf", "data.frame"))

Attempted fixes with st_crs() and st_transform()

I've tried setting the epsg and proj4string attributes of the second to those of the first, but it didn't fix the graph or the wrong geometry values:

# make a copy of second_sf
second_sf_transformed <- second_sf

# make EPSG of the copy of the second the same as the first
st_crs(second_sf_transformed) <- st_crs(first_sf)

# make the proj4string of the copy of the second the same as the first
second_sf_transformed <- st_transform(second_sf_transformed, '+proj=utm +zone=45 +datum=WGS84 +units=m +no_defs')

Suspected issue

There is a transform that I need to be doing on the data from the csv so that I can analyze both shapefiles together, but I don't know what it is. Does anyone know?

Update

Thanks to @dww I got the coordinates for the shapefile right, but the csv data still won't graph correctly.

Transforms

# make EPSG of the copy of the second the same as the first (keeps ggplot() from complaining)
st_crs(second_sf_transformed) <- st_crs(first_sf)

# reproject data
first_sf_transformed <- st_transform(first_sf, "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0" )

Dputs after transforms

shp data

structure(list(Transporta = 31159.5791334, Transpor_1 = 1, Transpor_2 = 2, 
    Transpor_3 = 0, TransportS = 1, Transpor_4 = 120.8226354, 
    Transpor_5 = 40.37383139, Transpor_6 = 0.334157844, Transpor_7 = 0.6603, 
    Transpor_8 = 0.8694, Transpor_9 = structure(NA_integer_, .Label = character(0), class = "factor"), 
    class = structure(1L, .Label = "0", class = "factor"), geometry = structure(list(
        structure(c(89.3951268474236, 89.3961571846398, 89.3969809093774, 
        89.3984785916188, 89.3997082546062, 89.4003506946848, 
        89.4011182562287, 89.401960703154, 89.4040200160893, 
        89.4059482811558, 89.4078952678924, 89.4089436459468, 
        89.4101043491955, 89.4107575329244, 89.4126439952916, 
        89.4145225108038, 89.4164320555286, 89.4177986910559, 
        89.4193712573017, 89.4223458004554, 89.4244259337919, 
        89.4264665255084, 89.4297614260539, 89.4321951598272, 
        89.4353028498903, 89.4383730983178, 89.439117259213, 
        89.4451869671167, 89.4471614499647, 89.4483680787049, 
        89.4494650144704, 89.4504888199855, 89.4520976579026, 
        89.4544012217856, 89.4568411070362, 89.4583501885189, 
        89.4598493329306, 89.4605074942507, 89.4606537526005, 
        89.4622309990771, 89.4630394820499, 89.4638479664175, 
        89.4654649342105, 89.4678903853704, 89.4692464104709, 
        89.4710746363703, 89.4723543932316, 89.4757057279109, 
        89.4773226951387, 89.4782047141727, 89.4789725685317, 
        89.4802157622041, 89.4813126972553, 89.4824827608323, 
        89.4827125868794, 89.4829581002389, 89.4821902450637, 
        89.4821735976744, 89.4824430933553, 89.4826290191, 89.4835431323413, 
        89.4859465218357, 89.4916059082089, 89.4964138369265, 
        89.5122097031426, 89.5138551060626, 89.5142207513114, 
        89.5144035734198, 89.5144767019004, 89.5153176863904, 
        89.5158661532172, 89.5164511855445, 89.5180234596532, 
        89.5187547490365, 89.518824862045, 89.5193638514563, 
        89.5196333452499, 89.5190943561545, 89.5185553667664, 
        89.5182858719894, 89.5185553670613, 89.5188248615575, 
        89.5207113235949, 89.5224477642401, 89.5240200382999, 
        89.5251169729864, 89.525628876173, 89.5257751340317, 
        89.5259579570345, 89.5273474078494, 89.5283712135035, 
        89.5495054976574, 89.5505658674762, 89.5508949488069, 
        89.5508949489754, 89.5512240291922, 89.5517724970054, 
        89.5521747065285, 89.5531619475244, 89.5543685767302, 
        89.556708704603, 89.557632082246, 89.5587100597639, 23.0366666407601, 
        23.0360541520806, 23.0353053114777, 23.0336204191383, 
        23.0323547265548, 23.0312054063198, 23.0299323767221, 
        23.0288278360431, 23.0271616644131, 23.0255142147017, 
        23.0238106014021, 23.0226311762231, 23.0207029106886, 
        23.0194189874174, 23.0164545471189, 23.013663804625, 
        23.0115108869883, 23.0103501834476, 23.0093954108462, 
        23.0081002152887, 23.0074297034657, 23.0071301670893, 
        23.0076543553864, 23.0080662184381, 23.0083657545691, 
        23.0085904068476, 23.0084336181262, 23.008397053537, 
        23.0087626986679, 23.009311166109, 23.0099693276515, 
        23.0105177951238, 23.0110296978796, 23.0112490852379, 
        23.0107951600909, 23.0099693278917, 23.0090917796211, 
        23.007994844059, 23.006824780279, 23.0002848724084, 22.9978594199624, 
        22.9951644743623, 22.9930085179719, 22.9911220557635, 
        22.9902610598549, 22.9879574957376, 22.9860561421864, 
        22.9811507560206, 22.9784558101354, 22.9760740319897, 
        22.9736607748343, 22.9702602760285, 22.9672254217359, 
        22.9639346170366, 22.9620166398532, 22.9595103113266, 
        22.956950796692, 22.9558182646749, 22.9544707918072, 
        22.9535137332223, 22.9521242822197, 22.949350394888, 
        22.9439605028831, 22.939948302187, 22.928796127859, 22.9277357574655, 
        22.9269679028182, 22.9260537905528, 22.9245912104683, 
        22.9219951306966, 22.9202034702482, 22.9177170836491, 
        22.9132196495687, 22.9107332636081, 22.9091957023515, 
        22.9056922716249, 22.9013803590159, 22.8978769292378, 
        22.8957209717259, 22.893834509633, 22.8914090590013, 
        22.8897920924525, 22.8868276509463, 22.8843336909749, 
        22.8819935625524, 22.8790318381511, 22.8775326931856, 
        22.87574103285, 22.8733643396172, 22.8710242115184, 22.8698907125254, 
        22.8697078900248, 22.8691228573515, 22.8684646968471, 
        22.8673311965241, 22.8659417452486, 22.8624315537002, 
        22.8599086028934, 22.8564349744818, 22.8539851531464, 
        22.8507309120264, 22.8490984092694, 22.846403462933), .Dim = c(103L, 
        2L), class = c("XY", "LINESTRING", "sfg"))), class = c("sfc_LINESTRING", 
    "sfc"), precision = 0, bbox = structure(c(xmin = 89.3951268474236, 
    ymin = 22.846403462933, xmax = 89.5587100597639, ymax = 23.0366666407601
    ), class = "bbox"), crs = structure(list(epsg = 4326L, proj4string = "+proj=longlat +datum=WGS84 +no_defs"), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(Transporta = NA_integer_, 
Transpor_1 = NA_integer_, Transpor_2 = NA_integer_, Transpor_3 = NA_integer_, 
TransportS = NA_integer_, Transpor_4 = NA_integer_, Transpor_5 = NA_integer_, 
Transpor_6 = NA_integer_, Transpor_7 = NA_integer_, Transpor_8 = NA_integer_, 
Transpor_9 = NA_integer_, class = NA_integer_), .Label = c("constant", 
"aggregate", "identity"), class = "factor"), row.names = 1L, class = c("sf", 
"data.frame"))

csv data

structure(list(width = structure(c(1L, 2L, 5L, 3L, 4L), .Label = c("2.4716396025460600", 
"2.6281478054445400", "3.0063089766321100", "3.3519104918569500", 
"4.263698279008270"), class = "factor"), lat = c(24.7094195311052, 
24.7094195311052, 24.7094195311052, 24.7094195311052, 24.7094195311052
), lon = c(87.9137151118854, 87.9139846064706, 88.1161055453975, 
88.1996488668206, 88.2007268451616), geometry = structure(list(
    structure(c(24.7094195311052, 87.9137151118854), class = c("XY", 
    "POINT", "sfg")), structure(c(24.7094195311052, 87.9139846064706
    ), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.1161055453975), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.1996488668206), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.2007268451616), class = c("XY", "POINT", "sfg"))), n_empty = 0L, crs = structure(list(
    epsg = 32645L, proj4string = "+proj=utm +zone=45 +datum=WGS84 +units=m +no_defs"), class = "crs"), class = c("sfc_POINT", 
"sfc"), precision = 0, bbox = structure(c(xmin = 24.7094195311052, 
ymin = 87.9137151118854, xmax = 24.7094195311052, ymax = 88.2007268451616
), class = "bbox"))), row.names = c(NA, -5L), sf_column = "geometry", agr = structure(c(width = NA_integer_, 
lat = NA_integer_, lon = NA_integer_), class = "factor", .Label = c("constant", 
"aggregate", "identity")), class = c("sf", "data.frame"))

The problem seems to be that the sf object created from the csv data seems to have lat and lon inverted in the geometry column:


csv_in <- structure(list(width = structure(c(1L, 2L, 5L, 3L, 4L), .Label = c("2.4716396025460600", 
"2.6281478054445400", "3.0063089766321100", "3.3519104918569500", 
"4.263698279008270"), class = "factor"), lat = c(24.7094195311052, 
24.7094195311052, 24.7094195311052, 24.7094195311052, 24.7094195311052
), lon = c(87.9137151118854, 87.9139846064706, 88.1161055453975, 
88.1996488668206, 88.2007268451616), geometry = structure(list(
    structure(c(24.7094195311052, 87.9137151118854), class = c("XY", 
    "POINT", "sfg")), structure(c(24.7094195311052, 87.9139846064706
    ), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.1161055453975), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.1996488668206), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.2007268451616), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT", 
"sfc"), precision = 0, bbox = structure(c(xmin = 24.7094195311052, 
ymin = 87.9137151118854, xmax = 24.7094195311052, ymax = 88.2007268451616
), class = "bbox"), crs = structure(list(epsg = NA_integer_, 
    proj4string = NA_character_), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(width = NA_integer_, 
lat = NA_integer_, lon = NA_integer_), .Label = c("constant", 
"aggregate", "identity"), class = "factor"), row.names = c(NA, 
5L), class = c("sf", "data.frame"))


Simple feature collection with 5 features and 3 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 24.70942 ymin: 87.91372 xmax: 24.70942 ymax: 88.20073
epsg (SRID):    NA
proj4string:    NA
               width      lat      lon                  geometry
1 2.4716396025460600 24.70942 87.91372 POINT (24.70942 87.91372)
2 2.6281478054445400 24.70942 87.91398 POINT (24.70942 87.91398)
3  4.263698279008270 24.70942 88.11611 POINT (24.70942 88.11611)
4 3.0063089766321100 24.70942 88.19965 POINT (24.70942 88.19965)
5 3.3519104918569500 24.70942 88.20073 POINT (24.70942 88.20073)

(note that in the geometry column, the first column is longitude).

You can fix this either by correcting the import script and then assigning the CRS with sf::st_crs(csv_in) <- 4326 , or using something like:

csv_in <- sf::st_as_sf(sf::st_drop_geometry(csv_in), coords = c("lon","lat"), remove= FALSE)
sf::st_crs(csv_in) <- 4326
csv_in

Simple feature collection with 5 features and 3 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 87.91372 ymin: 24.70942 xmax: 88.20073 ymax: 24.70942
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
               width      lat      lon                  geometry
1 2.4716396025460600 24.70942 87.91372 POINT (87.91372 24.70942)
2 2.6281478054445400 24.70942 87.91398 POINT (87.91398 24.70942)
3  4.263698279008270 24.70942 88.11611 POINT (88.11611 24.70942)
4 3.0063089766321100 24.70942 88.19965 POINT (88.19965 24.70942)
5 3.3519104918569500 24.70942 88.20073 POINT (88.20073 24.70942)

(note that now the coordinates in the geometry column are "inverted").

Trying to plot this gives:

ggplot(csv_in) + geom_sf() + geom_sf(data = shp_in)

, that appers to be ok

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM