简体   繁体   中英

Trying to extract one column that seems to be JSON from a pandas dataframe in Python , how do I achieve this?

I have a dataset that I loaded in a pandas dataframe with one column that seems to be JSON format (not sure) and I want to extract the information for this column and put them in other columns of the same dataframe.

I've tried read_json , normalization and other python function but I can't achieve my goal ...

Here's what I tried :

x = {'latitude': '47.61219025', 'needs_recoding': False, 'human_address': '{""address"":""405 OLIVE WAY"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33799744'}
print (x.get('latitude'))
print (x.get('longitude'))  this works for one line only.

Also tried this :

s = data2015.groupby('OSEBuildingID')['Location'].apply(lambda x: x.tolist())
print(s)
pd.read_json(s,typ='series',orient='records')

but I get this error :

ValueError: Invalid file path or buffer object type

loading the dataframe :

data2015 = pd.read_csv(filepath_or_buffer=r'C:\Users\mehdi\OneDrive\Documents\OpenClassRooms\Projet 3\2015-building-energy-benchmarking\2015-building-energy-benchmarking.csv', delimiter=",",low_memory=False)

example of the file content :

OSEBuildingID,DataYear,BuildingType,PrimaryPropertyType,PropertyName,TaxParcelIdentificationNumber,Location,CouncilDistrictCode,Neighborhood,YearBuilt,NumberofBuildings,NumberofFloors,PropertyGFATotal,PropertyGFAParking,PropertyGFABuilding(s),ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA,YearsENERGYSTARCertified,ENERGYSTARScore,SiteEUI(kBtu/sf),SiteEUIWN(kBtu/sf),SourceEUI(kBtu/sf),SourceEUIWN(kBtu/sf),SiteEnergyUse(kBtu),SiteEnergyUseWN(kBtu),SteamUse(kBtu),Electricity(kWh),Electricity(kBtu),NaturalGas(therms),NaturalGas(kBtu),OtherFuelUse(kBtu),GHGEmissions(MetricTonsCO2e),GHGEmissionsIntensity(kgCO2e/ft2),DefaultData,Comment,ComplianceStatus,Outlier
1,2015,NonResidential,Hotel,MAYFLOWER PARK HOTEL,659000030,"{'latitude': '47.61219025', 'needs_recoding': False, 'human_address': '{""address"":""405 OLIVE WAY"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33799744'}",7,DOWNTOWN,1927,1,12,88434,0,88434,Hotel,Hotel,88434,,,,,,65,78.90,80.30,173.50,175.10,6981428,7097539,2023032,1080307,3686160,12724,1272388,0,249.43,2.64,No,,Compliant,
2,2015,NonResidential,Hotel,PARAMOUNT HOTEL,659000220,"{'latitude': '47.61310583', 'needs_recoding': False, 'human_address': '{""address"":""724 PINE ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33335756'}",7,DOWNTOWN,1996,1,11,103566,15064,88502,"Hotel, Parking, Restaurant",Hotel,83880,Parking,15064,Restaurant,4622,,51,94.40,99.00,191.30,195.20,8354235,8765788,0,1144563,3905411,44490,4448985,0,263.51,2.38,No,,Compliant,
3,2015,NonResidential,Hotel,WESTIN HOTEL,659000475,"{'latitude': '47.61334897', 'needs_recoding': False, 'human_address': '{""address"":""1900 5TH AVE"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33769944'}",7,DOWNTOWN,1969,1,41,961990,0,961990,"Hotel, Parking, Swimming Pool",Hotel,757243,Parking,100000,Swimming Pool,0,,18,96.60,99.70,242.70,246.50,73130656,75506272,19660404,14583930,49762435,37099,3709900,0,2061.48,1.92,Yes,,Compliant,
5,2015,NonResidential,Hotel,HOTEL MAX,659000640,"{'latitude': '47.61421585', 'needs_recoding': False, 'human_address': '{""address"":""620 STEWART ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33660889'}",7,DOWNTOWN,1926,1,10,61320,0,61320,Hotel,Hotel,61320,,,,,,1,460.40,462.50,636.30,643.20,28229320,28363444,23458518,811521,2769023,20019,2001894,0,1936.34,31.38,No,,Compliant,High Outlier
8,2015,NonResidential,Hotel,WARWICK SEATTLE HOTEL,659000970,"{'latitude': '47.6137544', 'needs_recoding': False, 'human_address': '{""address"":""401 LENORA ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98121""}', 'longitude': '-122.3409238'}",7,DOWNTOWN,1980,1,18,119890,12460,107430,"Hotel, Parking, Swimming Pool",Hotel,123445,Parking,68009,Swimming Pool,0,,67,120.10,122.10,228.80,227.10,14829099,15078243,0,1777841,6066245,87631,8763105,0,507.7,4.02,No,,Compliant,
9,2015,Nonresidential COS,Other,WEST PRECINCT (SEATTLE POLICE),660000560,"{'latitude': '47.6164389', 'needs_recoding': False, 'human_address': '{""address"":""810 VIRGINIA ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33676431'}",7,DOWNTOWN,1999,1,2,97288,37198,60090,Police Station,Police Station,88830,,,,,,,135.70,146.90,313.50,321.60,12051984,13045258,0,2130921,7271004,47813,4781283,0,304.62,2.81,No,,Compliant,
10,2015,NonResidential,Hotel,CAMLIN WORLDMARK HOTEL,660000825,"{'latitude': '47.6141141', 'needs_recoding': False, 'human_address': '{""address"":""1619 9TH AVE"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33274086'}",7,DOWNTOWN,1926,1,11,83008,0,83008,Hotel,Hotel,81352,,,,,,25,76.90,79.60,149.50,158.20,6252842,6477493,0,785342,2679698,35733,3573255,0,208.46,2.37,No,,Compliant,
11,2015,NonResidential,Other,PARAMOUNT THEATER,660000955,"{'latitude': '47.61290234', 'needs_recoding': False, 'human_address': '{""address"":""901 PINE ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33130949'}",7,DOWNTOWN,1926,1,8,102761,0,102761,Other - Entertainment/Public Assembly,Other - Entertainment/Public Assembly,102761,,,,,,,62.50,71.80,152.20,160.40,6426022,7380086,2003108,1203937,4108004,3151,315079,0,199.99,1.77,No,,Compliant,

The dataframe :

I would like to have at least another dataframe with the columns : Latitude, needs_recoding, human_address,and longitude.

There might be a better way of doing this, but I just iterated through the rows and parsed that json string into the indivual data parts, and put back together into a dataframe. You could then just use .to_csv() to save it:

import pandas as pd
import json
import ast

data2015 = pd.read_csv('C:/test.csv', delimiter=",",low_memory=False)

results = pd.DataFrame()
for idx, row in data2015.iterrows():

    data_dict = ast.literal_eval(row['Location'])
    lat = data_dict['latitude']
    lon = data_dict['longitude']
    need_recode = data_dict['needs_recoding'] 

    normalize = pd.Series(json.loads(data_dict['human_address']))
    row = row.drop('Location')

    cols = list(row.index) + ['latitude', 'longitude', 'need_recoding'] + list(normalize.index)
    temp_df = pd.DataFrame([list(row) + [lat, lon, need_recode] + list(normalize)], columns = cols )
    results = results.append(temp_df).reset_index(drop=True)

Output:

print (results.to_string())
   OSEBuildingID  DataYear        BuildingType PrimaryPropertyType                    PropertyName  TaxParcelIdentificationNumber  CouncilDistrictCode Neighborhood  YearBuilt  NumberofBuildings  NumberofFloors  PropertyGFATotal  PropertyGFAParking  PropertyGFABuilding(s)              ListOfAllPropertyUseTypes                 LargestPropertyUseType  LargestPropertyUseTypeGFA SecondLargestPropertyUseType  SecondLargestPropertyUseTypeGFA ThirdLargestPropertyUseType  ThirdLargestPropertyUseTypeGFA  YearsENERGYSTARCertified  ENERGYSTARScore  SiteEUI(kBtu/sf)  SiteEUIWN(kBtu/sf)  SourceEUI(kBtu/sf)  SourceEUIWN(kBtu/sf)  SiteEnergyUse(kBtu)  SiteEnergyUseWN(kBtu)  SteamUse(kBtu)  Electricity(kWh)  Electricity(kBtu)  NaturalGas(therms)  NaturalGas(kBtu)  OtherFuelUse(kBtu)  GHGEmissions(MetricTonsCO2e)  GHGEmissionsIntensity(kgCO2e/ft2) DefaultData  Comment ComplianceStatus       Outlier     latitude      longitude  need_recoding          address     city state    zip
0              1      2015      NonResidential               Hotel            MAYFLOWER PARK HOTEL                      659000030                    7     DOWNTOWN       1927                  1              12             88434                   0                   88434                                  Hotel                                  Hotel                      88434                          NaN                              NaN                         NaN                             NaN                       NaN             65.0              78.9                80.3               173.5                 175.1              6981428                7097539         2023032           1080307            3686160               12724           1272388                   0                        249.43                               2.64          No      NaN        Compliant           NaN  47.61219025  -122.33799744          False    405 OLIVE WAY  SEATTLE    WA  98101
1              2      2015      NonResidential               Hotel                 PARAMOUNT HOTEL                      659000220                    7     DOWNTOWN       1996                  1              11            103566               15064                   88502             Hotel, Parking, Restaurant                                  Hotel                      83880                      Parking                          15064.0                  Restaurant                          4622.0                       NaN             51.0              94.4                99.0               191.3                 195.2              8354235                8765788               0           1144563            3905411               44490           4448985                   0                        263.51                               2.38          No      NaN        Compliant           NaN  47.61310583  -122.33335756          False      724 PINE ST  SEATTLE    WA  98101
2              3      2015      NonResidential               Hotel                    WESTIN HOTEL                      659000475                    7     DOWNTOWN       1969                  1              41            961990                   0                  961990          Hotel, Parking, Swimming Pool                                  Hotel                     757243                      Parking                         100000.0               Swimming Pool                             0.0                       NaN             18.0              96.6                99.7               242.7                 246.5             73130656               75506272        19660404          14583930           49762435               37099           3709900                   0                       2061.48                               1.92         Yes      NaN        Compliant           NaN  47.61334897  -122.33769944          False     1900 5TH AVE  SEATTLE    WA  98101
3              5      2015      NonResidential               Hotel                       HOTEL MAX                      659000640                    7     DOWNTOWN       1926                  1              10             61320                   0                   61320                                  Hotel                                  Hotel                      61320                          NaN                              NaN                         NaN                             NaN                       NaN              1.0             460.4               462.5               636.3                 643.2             28229320               28363444        23458518            811521            2769023               20019           2001894                   0                       1936.34                              31.38          No      NaN        Compliant  High Outlier  47.61421585  -122.33660889          False   620 STEWART ST  SEATTLE    WA  98101
4              8      2015      NonResidential               Hotel           WARWICK SEATTLE HOTEL                      659000970                    7     DOWNTOWN       1980                  1              18            119890               12460                  107430          Hotel, Parking, Swimming Pool                                  Hotel                     123445                      Parking                          68009.0               Swimming Pool                             0.0                       NaN             67.0             120.1               122.1               228.8                 227.1             14829099               15078243               0           1777841            6066245               87631           8763105                   0                        507.70                               4.02          No      NaN        Compliant           NaN   47.6137544   -122.3409238          False    401 LENORA ST  SEATTLE    WA  98121
5              9      2015  Nonresidential COS               Other  WEST PRECINCT (SEATTLE POLICE)                      660000560                    7     DOWNTOWN       1999                  1               2             97288               37198                   60090                         Police Station                         Police Station                      88830                          NaN                              NaN                         NaN                             NaN                       NaN              NaN             135.7               146.9               313.5                 321.6             12051984               13045258               0           2130921            7271004               47813           4781283                   0                        304.62                               2.81          No      NaN        Compliant           NaN   47.6164389  -122.33676431          False  810 VIRGINIA ST  SEATTLE    WA  98101
6             10      2015      NonResidential               Hotel          CAMLIN WORLDMARK HOTEL                      660000825                    7     DOWNTOWN       1926                  1              11             83008                   0                   83008                                  Hotel                                  Hotel                      81352                          NaN                              NaN                         NaN                             NaN                       NaN             25.0              76.9                79.6               149.5                 158.2              6252842                6477493               0            785342            2679698               35733           3573255                   0                        208.46                               2.37          No      NaN        Compliant           NaN   47.6141141  -122.33274086          False     1619 9TH AVE  SEATTLE    WA  98101
7             11      2015      NonResidential               Other               PARAMOUNT THEATER                      660000955                    7     DOWNTOWN       1926                  1               8            102761                   0                  102761  Other - Entertainment/Public Assembly  Other - Entertainment/Public Assembly                     102761                          NaN                              NaN                         NaN                             NaN                       NaN              NaN              62.5                71.8               152.2                 160.4              6426022                7380086         2003108           1203937            4108004                3151            315079                   0                        199.99                               1.77          No      NaN        Compliant           NaN  47.61290234  -122.33130949          False      901 PINE ST  SEATTLE    WA  98101

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM