I'm using python to extract tables from some PDFs with tabula. Every table is then converted to a Pandas DataFrame, and I have to perform some analysis on them. I want to iterate every column to see if they contain a particular string, but I noticed an unexpected behavior in one particular df (at least I'm not able to understand what's going on).
This are the columns of the DataFrame, obtained with df.columns
( df
is the name of the name of the DataFrame):
Index(['cognome:xxxxnome:xxxxxprovenienza: esterno\r\rcodice fiscale: xxxxx\rdata valutazione neuropsicologica: 25/03/2021\rdata di nascita: 08/09/1955\retà (anni compiuti): 65\rsesso: m\rnumero anni di scolarità: 13', 'unnamed: 0'], dtype='object')
So, from what I see here, the name of the 0-th column should be
'cognome:xxxxnome:xxxxxprovenienza: esterno\r\rcodice fiscale: xxxxx\rdata valutazione neuropsicologica: 25/03/2021\rdata di nascita: 08/09/1955\retà (anni compiuti): 65\rsesso: m\rnumero anni di scolarità: 13'
What I don't understand is that, if I try to iterate through the columns of df
, this is what happens:
for i, col in enumerate(list(df.columns)):
print(f'{i}-th loop, column name = {col}')
Output:
numero anni di scolarità: 13ogica: 25/03/2021xxxxprovenienza: esterno 1-th loop, column name = unnamed: 0
So here are my questions:
col
for the 0-th loop is different from the 0-th element of df.columns
?Some more details about df:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 0 entries
Data columns (total 2 columns):
# Column
Non-Null Count Dtype
--- ------
-------------- -----
numero anni di scolarità: 13 0 non-null float64
1 unnamed: 0
0 non-null float64
dtypes: float64(2)
memory usage: 0.0 bytes
I'm using Jupyter Notebbok with Pandas version 1.2.0
The problem is with the carriage returns \r which your column name is full of. When you print the string, every time a \r is seen, you start from the beginning of the line, overwriting character by character. So the index 0 gets printed, but then overwritten.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.