简体   繁体   English


[英]special characters in strings: concatenation in Awk

I've been trying to load a csv file into mysql, and keep getting the data truncated warning for the last field in the csv. 我一直试图将一个csv文件加载到mysql中,并不断获取csv中最后一个字段的数据截断警告。

The data is prepped with python, and I make sure that the string of the last field has length 13 (the declared field length in CREATE TABLE): 数据是用python准备的,并且我确保最后一个字段的字符串的长度为13(在CREATE TABLE中声明的字段长度):

cleanField( row[ 17 ] )[0:12]

Any which way I measure len(cleanField( row[ 17 ] )[0:12]) , it's 13. When I print it out using $ cat customer.csv | awk -F"," '(NR==3621789){ print $17 }' 我测量len(cleanField( row[ 17 ] )[0:12])的任何方法都为13。当我使用$ cat customer.csv | awk -F"," '(NR==3621789){ print $17 }'打印出来时, $ cat customer.csv | awk -F"," '(NR==3621789){ print $17 }' , one of the rows in the mysql warning, I still see a 13-char string. $ cat customer.csv | awk -F"," '(NR==3621789){ print $17 }' ,mysql警告中的行之一,我仍然看到一个13字符的字符串。

But when I try the following, there seems to be a hint of hidden character. 但是,当我尝试以下操作时,似乎有一些隐藏字符。 Any advice? 有什么建议吗? Thanks. 谢谢。

$ cat customer.csv | awk -F"," '(NR==3621789){ print "<" $17 ">" }'

Here's cleanField: 这是cleanField:

def cleanField(x):
    x = re.sub( ' +' , ' ' , x )
    except UnicodeDecodeError:
        x = unicode( x , "UTF-8")
        x = unicodedata.normalize('NFKD', x ).encode('ascii', 'ignore')
    # " ".join(x.split())
    return x.replace(',','').replace('"','').replace("'",'').replace('\t','').replace('\n','').replace('\\','').replace('\s','')

string[0:12] should always be 12 characters. string [0:12]应该始终为12个字符。 Maybe you'd better step through your program with pudb or similar. 也许您最好使用pudb或类似程序逐步完成程序。

dstromberg@zareason ~ $ /usr/local/pypy-1.9/bin/pypy
Python 2.7.2 (341e1e3821ff, Jun 07 2012, 15:40:31)
[PyPy 1.9.0 with GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``how to construct the blackhole
interpreter: we reuse the tracing one, add lots of ifs and pray''
>>>> print '01234567890123456789'[0:12]
>>>> print(len('01234567890123456789'[0:12]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM