简体   繁体   English

从长字符串创建新列分为300个子字符串?

[英]Creating new columns from long strings split into 300 substrings?

I have a column containing 1200 character strings. 我有一个包含1200个字符串的列。 In each one, every four character group is hexadecimal for a number. 在每一个中,每四个字符组对于一个数字是十六进制的。 ie 300 numbers in hexadecimal crammed into a 1200 character string, in every row. 即每行中300个十六进制数字塞进一个1200字符的字符串。 I need to get each number out into decimal, and into its own column (300 new columns) named 1-300. 我需要将每个数字输入到十进制数,并进入自己的列(300个新列),命名为1-300。 Here's what I've figured out so far: 这是我到目前为止所发现的:

  Data.frame:
                      BigString
                 [1]  0043003E803C0041004A...(etc...)

Here's what I've done so far: 这是我到目前为止所做的:

decimal.fours <- function(x) {
    strtoi(substring(BigString[x], seq(1,1197,4), seq(4,1197,4)), 16L)
}
decimal.fours(1)
[1] 283   291   239   177 ...

But now I'm stuck. 但是现在我被卡住了。 How can I output these individual number, (and the remaining 296, into new columns? I have fifty total rows/strings. It would be great to do them all at once, ie 300 new columns, containing split up substrings from 50 strings. 如何将这些单独的数字(以及剩下的296)输出到新的列中?我总共有50行/字符串。一次完成它们会很棒,即300个新列,包含50个字符串的拆分子串。

You can use read.fwf which read in files with fixed width for each column: 您可以使用read.fwf读取每列固定宽度的文件:

# an example vector of big strings
BigString = c("0043003E803C0041004A", "0043003E803C0041004A", "0043003E803C0041004A")

n = 5                  # n is the number of columns for your result(300 for your real case)
as.data.frame(
      lapply(read.fwf(file = textConnection(BigString), 
                      widths = rep(4, n), 
                      colClasses = "character"), 
             strtoi, base = 16))

#  V1 V2    V3 V4 V5
#1 67 62 32828 65 74
#2 67 62 32828 65 74
#3 67 62 32828 65 74

If you'd like to keep the decimal.hours function, you can modify it as follows and call lapply to convert your bigStrings to list of integers which can be further converted to data.frame with do.call(rbind, ...) pattern: 如果你想保留decimal.hours函数,可以按如下方式修改它,并调用lapply将bigStrings转换为整数列表,可以使用do.call(rbind, ...)进一步转换为data.frame do.call(rbind, ...)图案:

decimal.fours <- function(x) {
    strtoi(substring(x, seq(1,1197,4), seq(4,1197,4)), 16L)
}

do.call(rbind, lapply(BigString, decimal.fours))

just a try using base-R 只需尝试使用base-R

BigString = c("0043003E803C0041004A", "0043003E803C0041004A", "0043003E803C0041004A")
df = data.frame(BigString)


t(sapply(df$BigString, function(x) strtoi(substring(x, seq(1, 297, 4)[1:5],
                                                    seq(4, 300, 4)[1:5]), base = 16)))
#     [,1] [,2]  [,3] [,4] [,5]
#[1,]   67   62 32828   65   74
#[2,]   67   62 32828   65   74
#[3,]   67   62 32828   65   74

# you can set the columns together at the end using `paste0("new_col", 1:300)` 
# [1:5] was just used for this example, because i had strings of length 20cahr

Obligatory tidyverse example: 强制性的tidyverse示例:

library(tidyverse)

Setup some data 设置一些数据

set.seed(1492)

bet <- c(0:9, LETTERS[1:6]) # alphabet for hex digit sequences
i <- 8                      # number of rows
n <- 10                     # number of 4-hex-digit sequences

df <- data_frame(
   some_other_col=LETTERS[1:i],
   big_str=map_chr(1:i, ~sample(bet, 4*n, replace=TRUE) %>% paste0(collapse=""))
)

df
## # A tibble: 8 × 2
##   some_other_col                                  big_str
##            <chr>                                    <chr>
## 1              A 432100D86CAA388C15AEA6291E985F2FD3FB6104
## 2              B BC2673D112925EBBB3FD175837AF7176C39B4888
## 3              C B4E99FDAABA47515EADA786715E811EE0502ABE8
## 4              D 64E622D7037D35DE6ADC40D0380E1DC12D753CBC
## 5              E CF7CDD7BBC610443A8D8FCFD896CA9730673B181
## 6              F ED86AEE8A7B65F843200B823CFBD17E9F3CA4EEF
## 7              G 2B9BCB73941228C501F937DA8E6EF033B5DD31F6
## 8              H 40823BBBFDF9B14839B7A95B6E317EBA9B016ED5

Do the manipulation 操纵

read_fwf(paste0(df$big_str, collapse="\n"),
         fwf_widths(rep(4, n)),
         col_types=paste0(rep("c", n), collapse="")) %>%
  mutate_all(strtoi, base=16) %>%
  bind_cols(df) %>%
  select(some_other_col, everything(), -big_str)
## # A tibble: 8 × 11
##   some_other_col    X1    X2    X3    X4    X5    X6    X7    X8    X9
##            <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1              A 17185   216 27818 14476  5550 42537  7832 24367 54267
## 2              B 48166 29649  4754 24251 46077  5976 14255 29046 50075
## 3              C 46313 40922 43940 29973 60122 30823  5608  4590  1282
## 4              D 25830  8919   893 13790 27356 16592 14350  7617 11637
## 5              E 53116 56699 48225  1091 43224 64765 35180 43379  1651
## 6              F 60806 44776 42934 24452 12800 47139 53181  6121 62410
## 7              G 11163 52083 37906 10437   505 14298 36462 61491 46557
## 8              H 16514 15291 65017 45384 14775 43355 28209 32442 39681
## # ... with 1 more variables: X10 <int>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM