简体   繁体   中英

How can I join selected columns of multiple csv files into one data frame? Jupyter

I am slightly confused because the following seems to work:

raw_data_df = pd.DataFrame()


temp = pd.read_csv('/Users/bob/desktop/Research_data/tobii/42r-export.csv', sep = ',', encoding = 'latin-1')
raw_data_df['1'] = temp['Gaze point X']
raw_data_df['2'] = temp['Gaze point Y']

However the following does not work:

for i in files:
  temp = pd.read_csv(path + i , sep = ',', encoding = 'latin-1')
  print(temp['Gaze point X'])
  raw_data_df[i+"x"] = temp['Gaze point X']
  raw_data_df[i+"y"] = temp['Gaze point Y']

where files is

path = "/Users/bob/desktop/Research_data/tobii/"
files = [f for f in listdir(path) if isfile(join(path,f))]

Instead of returning a pandas data frame where column names are i+"x" or i+"y" i get a list of lists.

here is a sample of what is outputted with raw_data_df

132660     857
132661     846
Name: Gaze point X, Length: 132662, dtype: int64
0      1206
1      1204
2      1205
3      1205

How can I join selected columns of multiple csv files into one data frame?

I don't think there's any need to initialise an empty dataframe. You can iterate over your files, load only the columns you need (using usecols ), and then concatenate all dataframes at the end.

Furthermore, when concatenating path artifacts, use os.path.join .

import os

cols = ['Gaze point X', 'Gaze point Y']

df_list = []
for f in files:
    temp = pd.read_csv(
         os.path.join(path, f), sep=',', encoding='latin-1', usecols=cols
    )
    temp.columns = [f + i for i in ['x', 'y']]
    df_list.append(temp)

Now, just concatenate the dataframes with pd.concat .

df = pd.concat(df_list, axis=1)

sorry there was a ds.store file inside the files I was searching that was throughing everything off. I just deleted it and it is working.

Building on @COLDSPEED's solution , you could use a list comprehension:

def rename_cols(df, f):
    df.columns = [f + i for i in ['x', 'y']]
    return df

df = pd.concat([rename_cols(pd.read_csv(os.path.join(path, f),
                sep=',', encoding='latin-1', usecols=cols), f) for f in files],
                ignore_index=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM