I have 4 CSV files with \\t
or tab
as delimiter.
alok@alok-HP-Laptop-14s-cr1:~/tmp/krati$ for file in sample*.csv; do echo $file; cat $file; echo ; done
sample1.csv
ProbeID p_code intensities
B1_1_3 6170 2
B2_1_3 6170 2.2
B3_1_4 6170 2.3
12345 6170 2.4
1234567 6170 2.5
sample2.csv
ProbeID p_code intensities
B1_1_3 5320 3
B2_1_3 5320 3.2
B3_1_4 5320 3.3
12345 5320 3.4
1234567 5320 3.5
sample3.csv
ProbeID p_code intensities
B1_1_3 1234 4
B2_1_3 1234 4.2
B3_1_4 1234 4.3
12345 1234 4.4
1234567 1234 4.5
sample4.csv
ProbeID p_code intensities
B1_1_3 3120 5
B2_1_3 3120 5.2
B3_1_4 3120 5.3
12345 3120 5.4
1234567 3120 5.5
All 4 files have same headers.
ProbeID
is same across all files, order is also same. Each file have same p_code
across single CSV file.
I have to merge all these CSV files into one in this format.
alok@alok-HP-Laptop-14s-cr1:~/tmp/krati$ cat output1.csv
ProbeID 6170 5320 1234 3120
B1_1_3 2 3 4 5
B2_1_3 2.2 3.2 4.2 5.2
B3_1_4 2.3 3.3 4.3 5.3
12345 2.4 3.4 4.4 5.4
1234567 2.5 3.5 4.5 5.5
In this output file columns are dynamic based on p_code
value.
I can do this easily in Python using dictionary. How can I produce such output using Pandas ?
We can achieve this using pandas.concat
and DataFrame.pivot_table
:
import os
import pandas as pd
df = pd.concat(
[pd.read_csv(f, sep="\t") for f in os.listdir() if f.endswith(".csv") and f.startswith("sample")],
ignore_index=True
)
df = df.pivot_table(index="ProbeID", columns="p_code", values="intensities", aggfunc="sum")
print(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.