I have multiple large CSV files. These CSV files have few column differences. For feeding them to AWS QuickSight for data visualization, I want to unify the structure of these CSV files. I think for doing this I have two ways:
What is the best tool for doing this?
Is there any tool that can show structural difference of two CSV file? If I find out which columns are missing I can also add them manually.
With pandas I can combine the CSV files, but in the way I know, I should name all the columns (code below) and this is not useful.
import pandas as pd
df1 = pd.DataFrame({'column1': [1,2],
'column2': [3,4],
})
df2 = pd.DataFrame({'column1': [5,6],
'column3': [7,8],
})
pd.concat([df1,df2],ignore_index=True)
Result:
column1 column2 column3
0 1 3.0 NaN
1 2 4.0 NaN
2 5 NaN 7.0
3 6 NaN 8.0
I cannot tell you what "the best" tool is; that's subjective with many dependencies.
I can tell you that miller should probably be on your short list for tools to consider for working with CSV data. Also see the miller
GitHub site. One last thing: the author is super-helpful.
I have it on good authority that the following will do the job:
mlr --csv reshape -r "^A" -o item,value then reshape -s item,value \ then unsparsify --fill-with "" *.csv > result.csv
Some notes about the command:
As much as I enjoyed using miller to solve this problem, I also did enjoy using panda to combine CSV files. Here is the python code to achieve it:
import pandas as pd
import glob
import os
path = os.getcwd() # use your path
# read all the files' name
all_files = glob.glob(os.path.join(path , "*.csv"))
li = []
# read csv files and create a DataFrame list
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
# join all the DataFrames
result_frame = pd.concat(li, ignore_index=True)
# export the result to a csv file
result_frame.to_csv('endresult.csv')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.