简体   繁体   English

从CSV文件中提取Unicode数据

[英]Extract Unicode data from CSV file

I have a CSV file like this encoding UTF-8 我有一个像这样编码为UTF-8的CSV文件

# id    english_word    part_of_speech  malayalam_definition
174569  .net    n   പുത്തന്‍ കമ്പ്യൂട്ടര്‍ സാങ്കേതികത ഭാഷ
116102  A bad patch n   കുഴപ്പം പിടിച്ച സമയം
115869  A bed of nails  n   പ്രയാസപ്പെടുന്ന അവസ്ഥ
200587  A bed of nails  idm ശരശയ്യ
115768  A bed of roses  n   സുഖകരമായ അവസ്ഥ
115767  A bed of roses  n   പൂമെത്ത
113832  A bed of thorn  n   അസുഖകരമായ അവസ്ഥ
113665  A bed roses n   പൂമെത്ത

I have to extract all Unicode data from the file having n tag 我必须从具有n标签的文件中提取所有Unicode数据

import csv
with open('some.csv', newline='\t', encoding='utf-8') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

This is the code I have but it is not working The code is not producing output, it does not . 这是我拥有的代码,但无法正常工作该代码未产生输出,但没有产生输出。 Any suggestions ? 有什么建议么 ?

Python 2.7 Python 2.7

You have to read the csv file before you iterate in it with for row in f . 您必须先读取csv文件,然后使用for row in f对其进行迭代。

First, import the csv package: 首先,导入csv包:

import csv

After, read the csv file: 之后,读取csv文件:

with open('mycsv.csv','r') as f:
  with open('n.csv','w') as new_file:
    file_read = csv.reader(f,delimiter=';')
    for row in file_read:
      if not extract_n(row):
        new_file.write(row)

The delimiter field can be a semicolon, comma, o whatever you have. 分隔符字段可以是分号,逗号或其他任何内容。

In the original code , " n" in row does not match anything because there is a tab character before the n . 原始代码中" n" in row中的" n" in row不匹配任何内容,因为在n之前有一个制表符。 If there is always a tab, try "\\tn" in row instead. 如果始终有一个选项卡,请尝试"\\tn" in row

Now, the problem is that your code is a Python 3 version. 现在,问题在于您的代码是Python 3版本。 In Python 2.7, the open function does not take a newline argument, hence the TypeError . 在Python 2.7中, open函数不使用newline ,因此不带TypeError

This should work with a tab-delimited file: 这应该与制表符分隔的文件一起使用:

import csv
with open('some.csv', 'rb') as f:
    reader = csv.reader(f, delimiter='\t')
    for row in reader:
        if "n" in row:
            print(row)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM