简体   繁体   中英

problem merging list and dataframe in Python

I have CSV files that I want to merge with list of struct(class) I made.

In the CSV I have field 'sector' and another field with information about this sector.

The array type is of a class I made with fields: name, x, y where x,y is the location that belong to this name.

This is how I defined the list(I generated it from CSV file as well which each antenna appear many time with different parameters so I extracted only those I need)

# ant_file is the CSV with all the antennas, ant_list_name is the list with 
# only antennas name and ant_list_tot is the list with the name and also x,y 
# fields
for rowA in range(size_ant_file):
    rec = ant_file.iloc[rowA]['name']
    if rec not in ant_lis_name:
        ant_lis_name.append(rec)
        A = Antenna(ant_file.iloc[rowA]['name'], ant_file.iloc[rowA]['x'],
                    ant_file.iloc[rowA]['y'])
        ant_list_tot.append(A)

print(antenna_list)

[Antenna(name='UWE33', x=34.9, y=31.9), Antenna(name='UTN00', x=34.8, 
y=32.1), Antenna(name='UWE02', x=34.8, y=32.1)]

I tried to do it with double for loop:

@dataclass
class Antenna:
    name: str
     x: float
     y: float

# records is the csv file  and antenna_list is the list of type Antenna
for index in range(len(records)):
    rec = records.iloc[index]['sector']
    for i in range(len(antenna_list)):
        if rec == antenna_list[i].name:
             lat = antenna_list[i].x
             lon = antenna_list[i].y
             records.at[index, 'x'] = lat
             records.at[index, 'y'] = lon
             break

The result CSV file is partly right and at the end there are rows with all fields correctly except x and y fields which are 0 and some rows with x and y values but without the information of the original fields.

It seems like there is a big shift of rows but I can't understand why.

  • I checked that there are no missing values

example:

records.csv at the begining:(date,hour and user_id are random number and its not important)

sector   date       hour   user_id  x   y       
 abc     1.1.19    20:00     123    0   0
 dfs     5.8.17    12:40     876    0   0
 ngh     6.9.19    08:12     962    0   0
 yjt     10.10.16  17:18     492    0   0
 abc     6.8.16    22:10     985    0   0
 dfs     7.1.15    19:15     542    0   0

antenna_list in the form of (name,x,y): (also here, x and y is random number right now and its not important)

antenna_list[0] = (abc,12,16)
antenna_list[1] = (dfs,6,20)
antenna_list[2] = (ngh,13,98)
antenna_list[3] = (yjt,18,41)

the result I want to see is:

sector   date       hour   user_id  x   y       
 abc     1.1.19    20:00     123    12  16
 dfs     5.8.17    12:40     876    6   20
 ngh     6.9.19    08:12     962    13  98
 yjt     10.10.16  17:18     492    18  41
 abc     6.8.16    22:10     985    12  16
 dfs     7.1.15    19:15     542     6  20

but the real result is:

sector   date       hour   user_id  x   y       
 abc     1.1.19    20:00     123    12  16
 dfs     5.8.17    12:40     876    6   20
 ngh     6.9.19    08:12     962    0   0
 yjt     10.10.16  17:18     492    0   0 
 abc     6.8.16    22:10     985    0   0
 dfs     7.1.15    19:15     542    0   0
                                    13  98
                                    18  41
                                    12  16
                                    6   20

TIA

If you save antenna_list as two dicts,

antenna_dict_x = {'abc':12, 'dfs':6, 'ngh':13, 'yjt':18}
antenna_dict_y = {'abc':16, 'dfs':20, 'ngh':98, 'yjt':41}

then creating two columns should be an easy map,

data['x']=data['sector'].map(antenna_dict_x)
data['y']=data['sector'].map(antenna_dict_y)

So if you do:

import pandas as pd

class Antenna():
    def __init__(self, name, x, y):
        self.name = name
        self.x = x
        self.y = y

antenna_list = [Antenna('abc',12,16), Antenna('dfs',6,20), Antenna('ngh',13,98), Antenna('yjt',18,41)]
records = pd.read_csv('something.csv')
for index in range(len(records)):
    rec = records.iloc[index]['sector']
    for i in range(len(antenna_list)):
        if rec == antenna_list[i].name:
             lat = antenna_list[i].x
             lon = antenna_list[i].y
             records.at[index, 'x'] = lat
             records.at[index, 'y'] = lon
             break

print(records)

you get:

  sector      date   hour  user_id   x   y
0    abc    1.1.19  20:00      123  12  16
1    dfs    5.8.17  12:40      876   6  20
2    ngh    6.9.19   8:12      962  13  98
3    yjt  10.10.16  17:18      492  18  41
4    abc    6.8.16  22:10      985  12  16
5    dfs    7.1.15  19:15      542   6  20

Which is what you were expecting. Also, if you do:

import pandas as pd
from dataclasses import dataclass

@dataclass
class Antenna:
    name: str
    x: float
    y: float


antenna_list = [Antenna('abc',12,16), Antenna('dfs',6,20), Antenna('ngh',13,98), Antenna('yjt',18,41)]
records = pd.read_csv('something.csv')
for index in range(len(records)):
    rec = records.iloc[index]['sector']
    for i in range(len(antenna_list)):
        if rec == antenna_list[i].name:
             lat = antenna_list[i].x
             lon = antenna_list[i].y
             records.at[index, 'x'] = lat
             records.at[index, 'y'] = lon
             break

print(records)

you get:

 sector      date   hour  user_id   x   y
0    abc    1.1.19  20:00      123  12  16
1    dfs    5.8.17  12:40      876   6  20
2    ngh    6.9.19   8:12      962  13  98
3    yjt  10.10.16  17:18      492  18  41
4    abc    6.8.16  22:10      985  12  16
5    dfs    7.1.15  19:15      542   6  20

Which is, again, what you were expecting. You did not post how you created the antenna list, but I assume that is where your error is.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM