I am reading a csv file into pandas. This csv file consists of four columns and some rows, but does not have a header row, which I want to add. I have been trying the following:
Cov = pd.read_csv("path/to/file.txt", sep='\t')
Frame = pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"])
Frame.to_csv("path/to/file.txt", sep='\t')
But when I apply the code, I get the following Error:
ValueError: Shape of passed values is (1, 1), indices imply (4, 1)
What exactly does the error mean? And what would be a clean way in python to add a header row to my csv file/pandas df?
You can use names
directly in the read_csv
names : array-like, default None List of column names to use. If file contains no header row, then you should explicitly pass header=None
Cov = pd.read_csv("path/to/file.txt",
sep='\t',
names=["Sequence", "Start", "End", "Coverage"])
Alternatively you could read you csv with header=None
and then add it with df.columns
:
Cov = pd.read_csv("path/to/file.txt", sep='\t', header=None)
Cov.columns = ["Sequence", "Start", "End", "Coverage"]
col_Names=["Sequence", "Start", "End", "Coverage"]
my_CSV_File= pd.read_csv("yourCSVFile.csv",names=col_Names)
having done this, just check it with[well obviously I know, u know that. But still...
my_CSV_File.head()
Hope it helps ... Cheers
To fix your code you can simply change [Cov]
to Cov.values
, the first parameter of pd.DataFrame
will become a multi-dimensional numpy
array:
Cov = pd.read_csv("path/to/file.txt", sep='\t')
Frame=pd.DataFrame(Cov.values, columns = ["Sequence", "Start", "End", "Coverage"])
Frame.to_csv("path/to/file.txt", sep='\t')
But the smartest solution still is use pd.read_excel
with header=None
and names=columns_list
.
Simple And Easy Solution:
import pandas as pd
df = pd.read_csv("path/to/file.txt", sep='\t')
fileHeader = ["Sequence", "Start", "End", "Coverage"]
df.columns = headers
NOTE: Make sure your header length and CSV file header length should not mismatch.
Since this is mentioned that we are reading from a csv, so the delimiter should be ','[as default, not need to mention]' and the given file has no header so
header=None`
Sample Code:
import pandas as pd
data = pd.read_csv('path/to/file.txt',header=None)
data.columns = ["Sequence", "Start", "End", "Coverage"]
print(data.head()) #Print the first rows
When reading a file without headers, existing answers correctly say that header=
parameter should be set to None
, but none explain why. It's because by default, header=0
, which means the first row of the file is inferred as the header. For example, the following code overwrites the first row with col_names
because the first row was read as the header and it was replaced by col_names
.
Note that it's assumed that the columns are separated by a space ' '
here.
col_names = ["Sequence", "Start", "End", "Coverage"]
df = pd.read_csv("path/to/file.txt", sep=' ') # <--- wrong
df.columns = col_names
To get the correct output, you'll need to set header=None
:
df = pd.read_csv("path/to/file.txt", sep=' ', header=None) # <--- OK
df.columns = col_names
or use names=
parameter to assign column names in one function call:
df = pd.read_csv("path/to/file.txt", sep=' ', names=col_names) # <--- OK
header=None
way is often preferred if the number of columns is not known (because it is vital that len(col_names)
is equal to the number of columns inferred from the file) or if the specific column names are not important. For example, calling add_prefix()
after read_csv
can add prefix to the default column names:
df = pd.read_csv("path/to/file.txt", sep=' ', header=None).add_prefix('col')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.