How can I join selected columns of multiple csv files into one data frame? Jupyter

Question

I am slightly confused because the following seems to work:

raw_data_df = pd.DataFrame()


temp = pd.read_csv('/Users/bob/desktop/Research_data/tobii/42r-export.csv', sep = ',', encoding = 'latin-1')
raw_data_df['1'] = temp['Gaze point X']
raw_data_df['2'] = temp['Gaze point Y']

However the following does not work:

for i in files:
  temp = pd.read_csv(path + i , sep = ',', encoding = 'latin-1')
  print(temp['Gaze point X'])
  raw_data_df[i+"x"] = temp['Gaze point X']
  raw_data_df[i+"y"] = temp['Gaze point Y']

where files is

path = "/Users/bob/desktop/Research_data/tobii/"
files = [f for f in listdir(path) if isfile(join(path,f))]

Instead of returning a pandas data frame where column names are i+"x" or i+"y" i get a list of lists.

here is a sample of what is outputted with raw_data_df

132660     857
132661     846
Name: Gaze point X, Length: 132662, dtype: int64
0      1206
1      1204
2      1205
3      1205

How can I join selected columns of multiple csv files into one data frame?

Answer 1

I don't think there's any need to initialise an empty dataframe. You can iterate over your files, load only the columns you need (using usecols ), and then concatenate all dataframes at the end.

Furthermore, when concatenating path artifacts, use os.path.join .

import os

cols = ['Gaze point X', 'Gaze point Y']

df_list = []
for f in files:
    temp = pd.read_csv(
         os.path.join(path, f), sep=',', encoding='latin-1', usecols=cols
    )
    temp.columns = [f + i for i in ['x', 'y']]
    df_list.append(temp)

Now, just concatenate the dataframes with pd.concat .

df = pd.concat(df_list, axis=1)

Answer 2

sorry there was a ds.store file inside the files I was searching that was throughing everything off. I just deleted it and it is working.

Answer 3

Building on @COLDSPEED's solution , you could use a list comprehension:

def rename_cols(df, f):
    df.columns = [f + i for i in ['x', 'y']]
    return df

df = pd.concat([rename_cols(pd.read_csv(os.path.join(path, f),
                sep=',', encoding='latin-1', usecols=cols), f) for f in files],
                ignore_index=True)

How can I join selected columns of multiple csv files into one data frame? Jupyter

Question

3 answers

solution1
0 ACCPTED 2018-02-08 03:26:44

solution2
0 2018-02-08 03:38:39

solution3
0 2018-02-08 04:04:51

How can I join selected columns of multiple csv files into one data frame? Jupyter

Question

3 answers

solution1 0 ACCPTED 2018-02-08 03:26:44

solution2 0 2018-02-08 03:38:39

solution3 0 2018-02-08 04:04:51

solution1
0 ACCPTED 2018-02-08 03:26:44

solution2
0 2018-02-08 03:38:39

solution3
0 2018-02-08 04:04:51