简体   繁体   中英

Create a list of lists using two Pandas dataframes

I have these two dataframe: 1) data here are grouped by station_id( from 1 up to 98) and time( a data every hour from 27-01-2020 to 26-05-2020)

在此处输入图像描述

  1. In the second dataframe I have a latitude and longitude value for every station_id.

在此处输入图像描述

My aim is to create a list of list in this format:

     latitude           longitude      flow   hour  month  day
[[53.37947845458979, -1.46990168094635, 278.0, 0.0, 1.0, 27.0], 
 [53.379791259765604, -1.46999669075012, 122.0, 0.0, 1.0, 27.0], 
 [53.380035400390604, -1.47001004219055, 58.0, 0.0, 1.0, 27.0], ...]

In order to have a list [latitude, longitude, flow, month, day] for every row in the first dataframe. I tried with the following code:

import pandas as pd
import datetime as dt

df = pd.read_csv("readings_by_hour.csv")
df['time'] = pd.to_datetime(df['time'])
df1 = pd.read_csv("stations_info.csv")

i = 0
a = []
b = []
count = df1['station_id'].count()

while i < count:
    if df['station_id'][i] == df1['station_id'][i]:
        a = print(df1['latitude'][i] + ", " + df1['longitude'][i] + ", " + df['flow'][i] + ", " + df['time'].dt.hour + ", " + df['time'].dt.month + ", " + df['time'].dt.day)
        b += [a]
        i += 1

print(b)

but it seems it doesn't work, indeed didn't giving any output though it didn't give any error.

You could merge the two dataframes on the station_id column, then create your list of lists like so:

merged_df = pd.merge(df, df1, left_on = 'station_id', right_on = 'station_id')

list_of_lists =[] 
  
# Iterate over each row 
for index, row in merged_df.iterrows():

    # Create list for the current row 
    rowlist =[row.latitude, row.longitude, row.flow, row.hour, row.month, row.day] 
      
    # append the list to the final list 
    list_of_lists.append(rowlist) 

You can use the datetime module to extract the month, day, hour from the Date column

See the pandas docs on pd.merge for more info: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

In the given code, you are trying to assign return value of print function to a and then adding it to b . Here, the value of a is null . So, when you try to print the value, you will get empty string.

I have made corrections so that it works. Hope it helps..

while i < count:
    if df['station_id'][i] == df1['station_id'][i]:
        a = [df1['latitude'][i],df1['longitude'][i], df['flow'][i], df['time'][i].hour,df['time'][i].month,df['time'][i].day]
        b.append(a)
        i += 1

print(b)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM