I'm currently trying to do the following:
sampleCSVFile in my mypath:
id itemName NonImportantEntries Entries SomeOtherEntries
1 item1 27 111 163
2 item2 16 22 98
My Code so far:
import pandas as pd
from os import listdir
from os.path import isfile, join
#readMyDirectoryForFiles
mypath= <myDirectoryPath>
list_of_files = [f for f in listdir(mypath) if isfile(join(mypath, f))]
#GenerateAllFilesInDirList
list_of_files = [mypath+x for x in list_of_files]
#LoadAllFilesToPandasDFs
dataframes = [pd.read_csv(fi) for fi in list_of_files]
#count id/itemName Occurences Across All Files and sum up value in column "entries", but also check for highest/lowest value in column "entries" and record it for the output file -> this step is really hard for me.
#tried the below but received -> AttributeError: 'list' object has no attribute 'groupby' , also not sure where to start and how proceed with capturing highest/lowest number for "entries" across all the files.
count_occurences = dataframes.groupby(['id', 'itemName']).sum()['entries']
...
#save file in the following format
df.to_csv('finalFile.csv', index = False)
Expected finalFile.csv format:
id itemName noOfOccurencessInFiles TotalEntriesFromAllFiles HighestEntriesNumber LowestEntriesNumber
1 item1 11 3500 1200 0
Use concat
and add aas_index=False
for columns from id, itemName
:
final = pd.concat(dataframes).groupby(['id', 'itemName'], as_index=False)['entries'].sum()
fnal.to_csv('finalFile.csv', index = False)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.