使用 Python/Pandas 读取文件

Question

I have a tab-delimited file with data like:我有一个制表符分隔的文件，其中包含以下数据：

id  Name    address dept    sal
1   abc "bangalore,
        Karnataka,
        Inida"  10  500
2   xyz "Hyderabad
         Inida" 20  500

Here the columns are id , Name , address , dept , and sal .这里的列是id 、 Name 、 address 、 dept和sal 。

The issue is with address columns that can contain a new line character.问题在于可以包含换行符的地址列。 I tried different methods to read the file using Pandas and Python but instead of two rows, I am getting multiple rows as output.我尝试了不同的方法来使用 Pandas 和 Python 读取文件，但我得到的不是两行，而是多行 output。

Here are the few commands I tried:以下是我尝试的几个命令：

file1 = open('C:/dummy/dummy.csv', 'r')

lines = file1.readlines()

for i in lines:

    print(i)

and和

df = pd.read_csv("C:/dummy/dummy.csv",sep='\t',quotechar='"')

Can anyone please help?有人可以帮忙吗？

Answer 1

df = pd.read_csv("C:/dummy/dummy.csv",sep='\t',quotechar='"')

The corresponding output is, in case the columns are tab-delimited in the csv-file, as you say相应的 output 是，如果列在 csv 文件中以制表符分隔，如您所说

   id Name                            address  dept  sal
0   1  abc  bangalore,\r\nKarnataka,\r\nInida    10  500
1   2  xyz                 Hyderabad\r\nInida    20  500

If you like to remove the CR-LF within the string, you can remove them via post-processing.如果您想删除字符串中的 CR-LF，您可以通过后处理删除它们。 Additionally you could define the index-column via此外，您可以通过定义索引列

df = pd.read_csv("C:/dummy/dummy.csv",sep='\t',quotechar='"',index_col=0)

What is your desired/expected output?您想要/期望的 output 是什么？

使用 Python/Pandas 读取文件

问题描述

1 个解决方案

解决方案1
3 2022-09-23 07:41:34

使用 Python/Pandas 读取文件

问题描述

1 个解决方案

解决方案1 3 2022-09-23 07:41:34

解决方案1
3 2022-09-23 07:41:34