I have a Pandas DataFrame like this:
[6 rows x 5 columns]
name timestamp value1 state value2
Cs01 1.514483e+09 19.516 0 9.999954e-01
Cs02 1.514483e+09 20.055 0 9.999363e-01
Cs03 1.514483e+09 20.054 0 9.999970e-01
Cs01 1.514483e+09 20.055 0 9.999949e-01
Cs01 1.514483e+09 10.907 0 9.963121e-01
Cs02 1.514483e+09 10.092 0 1.548312e-02
is it possible with the read_csv function skip all the rows that does not start with the name "Cs01"?
Thank you
The simpliest is filter all rows:
df = pd.read_csv('file')
df = df[df['name'].str.startswith('Cs01')]
print (df)
name timestamp value1 state value2
0 Cs01 1.514483e+09 19.516 0 0.999995
3 Cs01 1.514483e+09 20.055 0 0.999995
4 Cs01 1.514483e+09 10.907 0 0.996312
Another solution is get all rows not contains Cs01
in preprocessing and use parameter skiprows
in read_csv
:
exclude = [i for i, line in enumerate(open('file.csv')) if not line.startswith('Cs01')]
print (exclude)
[0, 2, 3, 6]
df = pd.read_csv('file.csv', skiprows = exclude[1:])
print (df)
name timestamp value1 state value2
0 Cs01 1.514483e+09 19.516 0 0.999995
1 Cs01 1.514483e+09 20.055 0 0.999995
2 Cs01 1.514483e+09 10.907 0 0.996312
One method would be to read the file in chunks and then filter the lines out in the chunks, it's possible this will be faster if you have a large file with a lot of unwanted rows as reading in the entire df and then filtering may be non-performant:
In[17]:
t="""name timestamp value1 state value2
Cs01 1.514483e+09 19.516 0 9.999954e-01
Cs02 1.514483e+09 20.055 0 9.999363e-01
Cs03 1.514483e+09 20.054 0 9.999970e-01
Cs01 1.514483e+09 20.055 0 9.999949e-01
Cs01 1.514483e+09 10.907 0 9.963121e-01
Cs02 1.514483e+09 10.092 0 1.548312e-02"""
d = pd.read_csv(io.StringIO(t), delim_whitespace=True, chunksize=2)
dfs = pd.concat([x[x['name'].str.startswith('Cs01')] for x in d])
dfs
Out[17]:
name timestamp value1 state value2
0 Cs01 1.514483e+09 19.516 0 0.999995
3 Cs01 1.514483e+09 20.055 0 0.999995
4 Cs01 1.514483e+09 10.907 0 0.996312
Here the chunksize
param specifies the number of lines to read, you can set this to some arbritrary size, you then do a list comprehension and filter on each chunk and then call concat
to produce a single df
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.