Pull out specific columns from multiple CSV files in a directory in Python

Question

I have around 200 CSV files in a directory that contain different columns, but some have data that I want to pull out. One column I'm looking to pull is called "Programme" (the order of rows is different, but the name is the same), and the other column contains "would recommend" (not all are worded the same, but they will all contain that wording). Ultimately, I want to pull out all of the rows under these columns for each CSV and append them to a dataframe which just contains those 2 columns. I have tried just doing this with one CSV first and can't get it to work. Here is what I have attempted:

import pandas as pd
from io import StringIO

df =  pd.read_csv("test.csv")

dfout = pd.DataFrame(columns=['Programme', 'Recommends'])

for file in [df]:
    dfn = pd.read_csv(file)
    matching = [s for s in dfn.columns if "would recommend" in s]
    if matching:
        dfn = dfn.rename(columns={matching[0]:'Recommends'})
        dfout = pd.concat([dfout, dfn], join="inner")

print(dfout)

I get the following error messages, so I believe it's a formatting issue (it doesn't like the pandas df?): ValueError(msg.format(_type=type(filepath_or_buffer))) ValueError: Invalid file path or buffer object type: <class 'pandas.core.frame.DataFrame'>

When I try this:

csv1 = StringIO("""Programme,"Overall, I am satisfied with the quality of the programme",I would recommend the company to a friend or colleague,Please comment on any positive aspects of your experience of this programme
Nursing,4,4,IMAGE
Nursing,1,3,very good
Nursing,4,5,I enjoyed studying tis programme""")

csv2 = StringIO("""Programme,I would recommend the company to a friend,The programme was well organised and running smoothly,It is clear how students' feedback on the programme has been acted on
IT,4,2,4
IT,5,5,5
IT,5,4,5""")

dfout = pd.DataFrame(columns=['Programme', 'Recommends'])

for file in [csv1,csv2]:
    dfn = pd.read_csv(file)
    matching = [s for s in dfn.columns if "would recommend" in s]
    if matching:
        dfn = dfn.rename(columns={matching[0]:'Recommends'})
        dfout = pd.concat([dfout, dfn], join="inner")

print(dfout)

This works fine but I need to read in the CSV files. Any ideas?

Expected Output from above example:

Answer 1

The below works:

import pandas as pd
import glob

dfOut = []

for myfile in glob.glob("*.csv"):
    tmp = pd.read_csv(myfile, encoding='latin-1')
    
    matching = [s for s in tmp.columns if "would recommend" in s]
    if len(matching) > 0:
        tmp.rename(columns={matching[0]: 'Recommend'}, inplace=True)
        tmp = tmp[['Subunit', 'Recommend']]
        dfOut.append(tmp)
        
df = pd.concat(dfOut)

Pull out specific columns from multiple CSV files in a directory in Python

Question

1 answers

solution1
0 ACCPTED 2020-10-07 13:47:14

Pull out specific columns from multiple CSV files in a directory in Python

Question

1 answers

solution1 0 ACCPTED 2020-10-07 13:47:14

solution1
0 ACCPTED 2020-10-07 13:47:14