Pandas 枚举列意外行为

Question

I'm using python to extract tables from some PDFs with tabula.我正在使用 python 从一些带有表格的 PDF 中提取表格。 Every table is then converted to a Pandas DataFrame, and I have to perform some analysis on them.然后将每个表转换为 Pandas DataFrame，我必须对它们进行一些分析。 I want to iterate every column to see if they contain a particular string, but I noticed an unexpected behavior in one particular df (at least I'm not able to understand what's going on).我想迭代每一列以查看它们是否包含特定的字符串，但我注意到一个特定的 df 出现意外行为（至少我无法理解发生了什么）。

This are the columns of the DataFrame, obtained with df.columns ( df is the name of the name of the DataFrame):这是使用 df.columns 获得的df.columns的列（ df是 DataFrame 的名称）：

 Index(['cognome:xxxxnome:xxxxxprovenienza: esterno\r\rcodice fiscale: xxxxx\rdata valutazione neuropsicologica: 25/03/2021\rdata di nascita: 08/09/1955\retà (anni compiuti): 65\rsesso: m\rnumero anni di scolarità: 13', 'unnamed: 0'], dtype='object')

So, from what I see here, the name of the 0-th column should be所以，从我在这里看到的，第 0 列的名称应该是

'cognome:xxxxnome:xxxxxprovenienza: esterno\r\rcodice fiscale: xxxxx\rdata valutazione neuropsicologica: 25/03/2021\rdata di nascita: 08/09/1955\retà (anni compiuti): 65\rsesso: m\rnumero anni di scolarità: 13'

What I don't understand is that, if I try to iterate through the columns of df , this is what happens:我不明白的是，如果我尝试遍历df的列，会发生以下情况：

for i, col in enumerate(list(df.columns)):
    print(f'{i}-th loop, column name = {col}')

Output: Output：

 numero anni di scolarità: 13ogica: 25/03/2021xxxxprovenienza: esterno 1-th loop, column name = unnamed: 0

So here are my questions:所以这是我的问题：

Why the index of the 0-th loop is not printed?为什么不打印第 0 个循环的索引？
Why the printed value of col for the 0-th loop is different from the 0-th element of df.columns ?为什么第 0 个循环的col打印值与df.columns的第 0 个元素不同？

Some more details about df:有关 df 的更多详细信息：

 <class 'pandas.core.frame.DataFrame'> 
 Int64Index: 0 entries 
 Data columns (total 2 columns):  
 #   Column                                
 Non-Null Count  Dtype  
 ---  ------
 --------------  -----   
 numero anni di scolarità: 13  0 non-null      float64  
 1   unnamed: 0                                               
 0 non-null      float64 
 dtypes: float64(2) 
 memory usage: 0.0 bytes

I'm using Jupyter Notebbok with Pandas version 1.2.0我正在使用带有 Pandas 版本 1.2.0 的 Jupyter Notebbok

Answer 1

The problem is with the carriage returns \r which your column name is full of.问题在于您的列名已满的回车 \r 。 When you print the string, every time a \r is seen, you start from the beginning of the line, overwriting character by character.打印字符串时，每次看到 \r 时，都会从行首开始，逐个字符地覆盖。 So the index 0 gets printed, but then overwritten.所以索引 0 被打印出来，但随后被覆盖。

Pandas 枚举列意外行为

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-05-14 14:32:14

Pandas 枚举列意外行为

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-05-14 14:32:14

解决方案1
1 已采纳 2021-05-14 14:32:14